add papers

swkim101 · Aug 11, 2024 · dff2820 · dff2820
1 parent 2f98789
commit dff2820
Show file tree

Hide file tree

Showing 3,548 changed files with 7,136 additions and 14,889 deletions.
diff --git a/README.md b/README.md
@@ -110,6 +110,19 @@ The crawler sometimes misses paper from the first source if semantic scholar ret
 
 For the second source, the crawler sometimes confuses paper talk and keynote talk (and others). So, search results sometimes contain *not* papers (see [3b6c738](https://github.com/swkim101/cspapers.org/commit/3b6c7386b685b72a18cb4074aa69a71570d50134)). The Google scholar button can help to verify this.
 
+Also, semantic scholar somtimes shows different for web and api calls as shown below.
+
+```
+$ curl https://api.semanticscholar.org/graph/v1/paper/b0db907d372e2776a0c9e963a291e100033534a7?fields=title,abstract
+{'paperId': 'b0db907d372e2776a0c9e963a291e100033534a7', 'title': 'A correlation study between automated program repair and test-suite metrics', 'abstract': None}
+```
+
+However,https://www.semanticscholar.org/paper/A-correlation-study-between-automated-program-and-Yi-Tan/b0db907d372e2776a0c9e963a291e100033534a7 has an abstract ("Automated program repair is increas...")
+
+Most ICSE 2018 papers have this issue.
+
+Further, a crawler somtimes confuses posters and full papers. So, search results can contain posters.
+
 Reporting the wrong index is always welcome.
 
 ## Why not Google Scholar

diff --git a/data/2018/aaai/"Did I Say Something Wrong?": Towards a Safe Collaborative Chatbot b/data/2018/aaai/"Did I Say Something Wrong?": Towards a Safe Collaborative Chatbot
@@ -0,0 +1,4 @@
+
+
+ Chatbots have been a core measure of AI since Turing has presented his test for intelligence, and are also widely used for entertainment purposes. In this paper we present a platform that enables users to collaboratively teach a chatbot responses, using natural language. We present a method of collectively detecting malicious users and using the commands taught by these users to further mitigate activity of future malicious users.
+
diff --git a/...one: Inducing Multilingual Taxonomies From Wikipedia Using Character-Level Classification b/...one: Inducing Multilingual Taxonomies From Wikipedia Using Character-Level Classification
@@ -0,0 +1,4 @@
+
+
+ We propose a novel fully-automated approach towards inducing multilingual taxonomies from Wikipedia. Given an English taxonomy, our approach first leverages the interlanguage links of Wikipedia to automatically construct training datasets for the isa relation in the target language. Character-level classifiers are trained on the constructed datasets, and used in an optimal path discovery framework to induce high-precision, high-coverage taxonomies in other languages. Through experiments, we demonstrate that our approach significantly outperforms the state-of-the-art, heuristics-heavy approaches for six languages. As a consequence of our work, we release presumably the largest and the most accurate multilingual taxonomic resource spanning over 280 languages.
+
diff --git a/data/2018/aaai/3D Box Proposals From a Single Monocular Image of an Indoor Scene b/data/2018/aaai/3D Box Proposals From a Single Monocular Image of an Indoor Scene
@@ -0,0 +1,4 @@
+
+
+ Modern object detection methods typically rely on bounding box proposals as input. While initially popularized in the 2D case, this idea has received increasing attention for 3D bounding boxes. Nevertheless, existing 3D box proposal techniques all assume having access to depth as input, which is unfortunately not always available in practice. In this paper, we therefore introduce an approach to generating 3D box proposals from a single monocular RGB image. To this end, we develop an integrated, fully differentiable framework that inherently predicts a depth map, extracts a 3D volumetric scene representation and generates 3D object proposals. At the core of our approach lies a novel residual, differentiable truncated signed distance function module, which, accounting for the relatively low accuracy of the predicted depth map, extracts a 3D volumetric representation of the scene. Our experiments on the standard NYUv2 dataset demonstrate that our framework lets us generate high-quality 3D box proposals and that it outperforms the two-stage technique consisting of successively performing state-of-the-art depth prediction and depth-based 3D proposal generation.
+
diff --git a/data/2018/aaai/A Batch Learning Framework for Scalable Personalized Ranking b/data/2018/aaai/A Batch Learning Framework for Scalable Personalized Ranking
@@ -0,0 +1,4 @@
+
+
+ In designing personalized ranking algorithms, it is desirable to encourage a high precision at the top of the ranked list. Existing methods either seek a smooth convex surrogate for a non-smooth ranking metric or directly modify updating procedures to encourage top accuracy. In this work we point out that these methods do not scale well in a large-scale setting, and this is partly due to the inaccurate pointwise or pairwise rank estimation. We propose a new framework for personalized ranking. It uses batch-based rank estimators and smooth rank-sensitive loss functions. This new batch learning framework leads to more stable and accurate rank approximations compared to previous work. Moreover, it enables explicit use of parallel computation to speed up training. We conduct empirical evaluations on three item recommendation tasks, and our method shows a consistent accuracy improvement over current state-of-the-art methods. Additionally, we observe time efficiency advantages when data scale increases.
+
diff --git a/data/2018/aaai/A Bayesian Clearing Mechanism for Combinatorial Auctions b/data/2018/aaai/A Bayesian Clearing Mechanism for Combinatorial Auctions
@@ -0,0 +1,4 @@
+
+
+ We cast the problem of combinatorial auction design in a Bayesian framework in order to incorporate prior information into the auction process and minimize the number of rounds to convergence. We first develop a generative model of agent valuations and market prices such that clearing prices become maximum a posteriori estimates given observed agent valuations. This generative model then forms the basis of an auction process which alternates between refining estimates of agent valuations and computing candidate clearing prices. We provide an implementation of the auction using assumed density filtering to estimate valuations and expectation maximization to compute prices. An empirical evaluation over a range of valuation domains demonstrates that our Bayesian auction mechanism is highly competitive against the combinatorial clock auction in terms of rounds to convergence, even under the most favorable choices of price increment for this baseline.
+
diff --git a/data/2018/aaai/A Brief History and Recent Achievements in Bidirectional Search b/data/2018/aaai/A Brief History and Recent Achievements in Bidirectional Search
@@ -0,0 +1,4 @@
+
+
+ The state of the art in bidirectional search has changed significantly a very short time period; we now can answer questions about unidirectional and bidirectional search that until very recently we were unable to answer. This paper is designed to provide an accessible overview of the recent research in bidirectional search in the context of the broader efforts over the last 50 years. We give particular attention to new theoretical results and the algorithms they inspire for optimal and near-optimal node expansions when finding a shortest path.
+
diff --git a/...on of Inception Network With Attention Modulated Feature Fusion for Human Pose Estimation b/...on of Inception Network With Attention Modulated Feature Fusion for Human Pose Estimation
@@ -0,0 +1,4 @@
+
+
+ Accurate keypoint localization of human pose needs diversified features: the high level for contextual dependencies and the low level for detailed refinement of joints. However, the importance of the two factors varies from case to case, but how to efficiently use the features is still an open problem. Existing methods have limitations in preserving low level features, adaptively adjusting the importance of different levels of features, and modeling the human perception process. This paper presents three novel techniques step by step to efficiently utilize different levels of features for human pose estimation. Firstly, an inception of inception (IOI) block is designed to emphasize the low level features. Secondly, an attention mechanism is proposed to adjust the importance of individual levels according to the context. Thirdly, a cascaded network is proposed to sequentially localize the joints to enforce message passing from joints of stand-alone parts like head and torso to remote joints like wrist or ankle. Experimental results demonstrate that the proposed method achieves the state-of-the-art performance on both MPII and LSP benchmarks.
+
diff --git a/data/2018/aaai/A Cognitive Assistant for Visualizing and Analyzing Exoplanets b/data/2018/aaai/A Cognitive Assistant for Visualizing and Analyzing Exoplanets
@@ -0,0 +1,4 @@
+
+
+ We demonstrate an embodied cognitive agent that helps scientists visualize and analyze exo-planets and their host stars. The prototype is situated in a room equipped with a large display, microphones, cameras, speakers, and pointing devices. Users communicate with the agent via speech, gestures, and combinations thereof, and it responds by displaying content and generating synthesized speech. Extensive use of context facilitates natural interaction with the agent.
+
diff --git a/...rithm for the Online Joint Bid Budget Optimization of Pay-per-Click Advertising Campaigns b/...rithm for the Online Joint Bid Budget Optimization of Pay-per-Click Advertising Campaigns
@@ -0,0 +1,4 @@
+
+
+ Pay-per-click advertising includes various formats (e.g., search, contextual, and social) with a total investment of more than 140 billion USD per year. An advertising campaign is composed of some subcampaigns-each with a different ad-and a cumulative daily budget. The allocation of the ads is ruled exploiting auction mechanisms. In this paper, we propose, for the first time to the best of our knowledge, an algorithm for the online joint bid/budget optimization of pay-per-click multi-channel advertising campaigns. We formulate the optimization problem as a combinatorial bandit problem, in which we use Gaussian Processes to estimate stochastic functions, Bayesian bandit techniques to address the exploration/exploitation problem, and a dynamic programming technique to solve a variation of the Multiple-Choice Knapsack problem. We experimentally evaluate our algorithm both in simulation-using a synthetic setting generated from real data from Yahoo!-and in a real-world application over an advertising period of two months.
+
diff --git a/.../A Continuous Relaxation of Beam Search for End-to-End Training of Neural Sequence Models b/.../A Continuous Relaxation of Beam Search for End-to-End Training of Neural Sequence Models
@@ -0,0 +1,4 @@
+
+
+ Beam search is a desirable choice of test-time decoding algorithm for neural sequence models because it potentially avoids search errors made by simpler greedy methods. However, typical cross entropy training procedures for these models do not directly consider the behaviour of the final decoding method. As a result, for cross-entropy trained models, beam decoding can sometimes yield reduced test performance when compared with greedy decoding. In order to train models that can more effectively make use of beam search, we propose a new training procedure that focuses on the final loss metric (e.g. Hamming loss) evaluated on the output of beam search. While well-defined, this "direct loss" objective is itself discontinuous and thus difficult to optimize. Hence, in our approach, we form a sub-differentiable surrogate objective by introducing a novel continuous approximation of the beam search decoding procedure.In experiments, we show that optimizing this new training objective yields substantially better results on two sequence tasks (Named Entity Recognition and CCG Supertagging) when compared with both cross entropy trained greedy decoding and cross entropy trained beam decoding baselines.
+
diff --git a/data/2018/aaai/A Coverage-Based Utility Model for Identifying Unknown Unknowns b/data/2018/aaai/A Coverage-Based Utility Model for Identifying Unknown Unknowns
@@ -0,0 +1,4 @@
+
+
+ A classifier’s low confidence in prediction is often indicative of whether its prediction will be wrong; in this case, inputs are called known unknowns. In contrast, unknown unknowns (UUs) are inputs on which a classifier makes a high confidence mistake. Identifying UUs is especially important in safety-critical domains like medicine (diagnosis) and law (recidivism prediction). Previous work by Lakkaraju et al. (2017) on identifying unknown unknowns assumes that the utility of each revealed UU is independent of the others, rather than considering the set holistically. While this assumption yields an efficient discovery algorithm, we argue that it produces an incomplete understanding of the classifier’s limitations. In response, this paper proposes a new class of utility models that rewards how well the discovered UUs cover (or "explain") a sample distribution of expected queries. Although choosing an optimal cover is intractable, even if the UUs were known, our utility model is monotone submodular, affording a greedy discovery strategy. Experimental results on four datasets show that our method outperforms bandit-based approaches and achieves within 60.9% utility of an omniscient, tractable upper bound.
+
diff --git a/data/2018/aaai/A Deep Cascade Network for Unaligned Face Attribute Classification b/data/2018/aaai/A Deep Cascade Network for Unaligned Face Attribute Classification
@@ -0,0 +1,4 @@
+
+
+ Humans focus attention on different face regions when recognizing face attributes. Most existing face attribute classification methods use the whole image as input. Moreover, some of these methods rely on fiducial landmarks to provide defined face parts. In this paper, we propose a cascade network that simultaneously learns to localize face regions specific to attributes and performs attribute classification without alignment. First, a weakly-supervised face region localization network is designed to automatically detect regions (or parts) specific to attributes. Then multiple part-based networks and a whole-image-based network are separately constructed and combined together by the region switch layer and attribute relation layer for final attribute classification. A multi-net learning method and hint-based model compression is further proposed to get an effective localization model and a compact classification model, respectively. Our approach achieves significantly better performance than state-of-the-art methods on unaligned CelebA dataset, reducing the classification error by 30.9%.
+
diff --git a/data/2018/aaai/A Deep Generative Framework for Paraphrase Generation b/data/2018/aaai/A Deep Generative Framework for Paraphrase Generation
@@ -0,0 +1,4 @@
+
+
+ Paraphrase generation is an important problem in NLP, especially in question answering, information retrieval, information extraction, conversation systems, to name a few. In this paper, we address the problem of generating paraphrases automatically. Our proposed method is based on a combination of deep generative models (VAE) with sequence-to-sequence models (LSTM) to generate paraphrases, given an input sentence. Traditional VAEs when combined with recurrent neural networks can generate free text but they are not suitable for paraphrase generation for a given sentence. We address this problem by conditioning the both, encoder and decoder sides of VAE, on the original sentence, so that it can generate the given sentence's paraphrases. Unlike most existing models, our model is simple, modular and can generate multiple paraphrases, for a given sentence. Quantitative evaluation of the proposed method on a benchmark paraphrase dataset demonstrates its efficacy, and its performance improvement over the state-of-the-art methods by a significant margin, whereas qualitative human evaluation indicate that the generated paraphrases are well-formed, grammatically correct, and are relevant to the input sentence. Furthermore, we evaluate our method on a newly released question paraphrase dataset, and establish a new baseline for future research.
+
diff --git a/...ai/A Deep Model With Local Surrogate Loss for General Cost-Sensitive Multi-Label Learning b/...ai/A Deep Model With Local Surrogate Loss for General Cost-Sensitive Multi-Label Learning
@@ -0,0 +1,4 @@
+
+
+ Multi-label learning is an important machine learning problem with a wide range of applications. The variety of criteria for satisfying different application needs calls for cost-sensitive algorithms, which can adapt to different criteria easily. Nevertheless, because of the sophisticated nature of the criteria for multi-label learning, cost-sensitive algorithms for general criteria are hard to design, and current cost-sensitive algorithms can at most deal with some special types of criteria. In this work, we propose a novel cost-sensitive multi-label learning model for any general criteria. Our key idea within the model is to iteratively estimate a surrogate loss that approximates the sophisticated criterion of interest near some local neighborhood, and use the estimate to decide a descent direction for optimization. The key idea is then coupled with deep learning to form our proposed model. Experimental results validate that our proposed model is superior to existing cost-sensitive algorithms and existing deep learning models across different criteria.
+
diff --git a/.../2018/aaai/A Deep Ranking Model for Spatio-Temporal Highlight Detection From a 360◦ Video b/.../2018/aaai/A Deep Ranking Model for Spatio-Temporal Highlight Detection From a 360◦ Video
@@ -0,0 +1,4 @@
+
+
+ We address the problem of highlight detection from a 360◦ video by summarizing it both spatially and temporally. Given a long 360◦ video, we spatially select pleasantly-looking normal field-of-view (NFOV) segments from unlimited field of views (FOV) of the 360◦ video, and temporally summarize it into a concise and informative highlight as a selected subset of subshots. We propose a novel deep ranking model named as Composition View Score (CVS) model, which produces a spherical score map of composition per video segment, and determines which view is suitable for highlight via a sliding window kernel at inference. To evaluate the proposed framework, we perform experiments on the Pano2Vid benchmark dataset (Su, Jayaraman, and Grauman 2016) and our newly collected 360◦ video highlight dataset from YouTube and Vimeo. Through evaluation using both quantitative summarization metrics and user studies via Amazon Mechanical Turk, we demonstrate that our approach outperforms several state-of-the-art highlight detection methods.We also show that our model is 16 times faster at inference than AutoCam (Su, Jayaraman, and Grauman 2016), which is one of the first summarization algorithms of 360◦ videos.
+
diff --git a/data/2018/aaai/A Driving License for Intelligent Systems b/data/2018/aaai/A Driving License for Intelligent Systems
@@ -0,0 +1,4 @@
+
+
+ Artificial Intelligence (AI) is becoming increasingly important. Thus, sound knowledge about the principles of AI will be a crucial factor for future careers of young people as well as for the development of novel, innovative products. Addressing this challenge, we present an ambitious 3-year project focusing on developing and implementing a professional, internationally accepted, standardized training and certification system for AI which will also be recognized by the industry and educational institutions. The approach is based on already implemented and evaluated pilot projects in the area of AI education. The project’s main goal is to train and certify teachers and mentors as well as students and young people in basic and advanced AI topics, fostering AI literacy among this target audience.
+
diff --git a/data/2018/aaai/A Framework and Positive Results for IAR-answering b/data/2018/aaai/A Framework and Positive Results for IAR-answering
@@ -0,0 +1,4 @@
+
+
+ Inconsistency-tolerant semantics, like the IAR semantics, have been proposed as means to compute meaningful query answers over inconsistent Description Logic (DL) ontologies. So far query answering under the IAR semantics (IAR-answering) is known to be tractable only for arguably weak DLs like DL-Lite and the quite restricted EL⊥nr fragment of EL⊥. Towards providing a systematic study of IAR-answering, in the current paper we first present a general framework/algorithm for IAR-answering which applies to arbitrary DLs but need not terminate. Nevertheless, this framework allows us to develop a sufficient condition for tractability of IAR-answering and hence of termination of our algorithm. We then show that this condition is always satisfied by the arguably expressive DL DL-Litebool, providing the first positive result for IAR-answering over a non-Horn-DL. In addition, recent results show that this condition usually holds for real-world ontologies and techniques and algorithms for checking it in practice have also been studied recently; thus, overall our results are highly relevant in practice. Finally, we have provided a prototype implementation and a preliminary evaluation obtaining encouraging results.
+
diff --git a/...aai/A Framework for Evaluating Barriers to the Democratization of Artificial Intelligence b/...aai/A Framework for Evaluating Barriers to the Democratization of Artificial Intelligence
@@ -0,0 +1,4 @@
+
+
+ The "democratization" of AI has been taken up as a primary goal by several major tech companies. However, these efforts resemble earlier "freeware" and "open access" initiatives, and it is unclear how or whether they are informed by political conceptions of democratic governance. A political formulation of the democratization of AI is thus necessary. This paper presents a framework for the democratic governance of technology through intelligent trial and error (ITE) that can be utilized to evaluate barriers to the democratization of AI and suggest strategies for overcoming them.
+
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,4 @@


		Chatbots have been a core measure of AI since Turing has presented his test for intelligence, and are also widely used for entertainment purposes. In this paper we present a platform that enables users to collaboratively teach a chatbot responses, using natural language. We present a method of collectively detecting malicious users and using the commands taught by these users to further mitigate activity of future malicious users.
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,4 @@


		We propose a novel fully-automated approach towards inducing multilingual taxonomies from Wikipedia. Given an English taxonomy, our approach first leverages the interlanguage links of Wikipedia to automatically construct training datasets for the isa relation in the target language. Character-level classifiers are trained on the constructed datasets, and used in an optimal path discovery framework to induce high-precision, high-coverage taxonomies in other languages. Through experiments, we demonstrate that our approach significantly outperforms the state-of-the-art, heuristics-heavy approaches for six languages. As a consequence of our work, we release presumably the largest and the most accurate multilingual taxonomic resource spanning over 280 languages.
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,4 @@


		Modern object detection methods typically rely on bounding box proposals as input. While initially popularized in the 2D case, this idea has received increasing attention for 3D bounding boxes. Nevertheless, existing 3D box proposal techniques all assume having access to depth as input, which is unfortunately not always available in practice. In this paper, we therefore introduce an approach to generating 3D box proposals from a single monocular RGB image. To this end, we develop an integrated, fully differentiable framework that inherently predicts a depth map, extracts a 3D volumetric scene representation and generates 3D object proposals. At the core of our approach lies a novel residual, differentiable truncated signed distance function module, which, accounting for the relatively low accuracy of the predicted depth map, extracts a 3D volumetric representation of the scene. Our experiments on the standard NYUv2 dataset demonstrate that our framework lets us generate high-quality 3D box proposals and that it outperforms the two-stage technique consisting of successively performing state-of-the-art depth prediction and depth-based 3D proposal generation.
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,4 @@


		In designing personalized ranking algorithms, it is desirable to encourage a high precision at the top of the ranked list. Existing methods either seek a smooth convex surrogate for a non-smooth ranking metric or directly modify updating procedures to encourage top accuracy. In this work we point out that these methods do not scale well in a large-scale setting, and this is partly due to the inaccurate pointwise or pairwise rank estimation. We propose a new framework for personalized ranking. It uses batch-based rank estimators and smooth rank-sensitive loss functions. This new batch learning framework leads to more stable and accurate rank approximations compared to previous work. Moreover, it enables explicit use of parallel computation to speed up training. We conduct empirical evaluations on three item recommendation tasks, and our method shows a consistent accuracy improvement over current state-of-the-art methods. Additionally, we observe time efficiency advantages when data scale increases.
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,4 @@


		We cast the problem of combinatorial auction design in a Bayesian framework in order to incorporate prior information into the auction process and minimize the number of rounds to convergence. We first develop a generative model of agent valuations and market prices such that clearing prices become maximum a posteriori estimates given observed agent valuations. This generative model then forms the basis of an auction process which alternates between refining estimates of agent valuations and computing candidate clearing prices. We provide an implementation of the auction using assumed density filtering to estimate valuations and expectation maximization to compute prices. An empirical evaluation over a range of valuation domains demonstrates that our Bayesian auction mechanism is highly competitive against the combinatorial clock auction in terms of rounds to convergence, even under the most favorable choices of price increment for this baseline.
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,4 @@


		The state of the art in bidirectional search has changed significantly a very short time period; we now can answer questions about unidirectional and bidirectional search that until very recently we were unable to answer. This paper is designed to provide an accessible overview of the recent research in bidirectional search in the context of the broader efforts over the last 50 years. We give particular attention to new theoretical results and the algorithms they inspire for optimal and near-optimal node expansions when finding a shortest path.
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,4 @@


		Accurate keypoint localization of human pose needs diversified features: the high level for contextual dependencies and the low level for detailed refinement of joints. However, the importance of the two factors varies from case to case, but how to efficiently use the features is still an open problem. Existing methods have limitations in preserving low level features, adaptively adjusting the importance of different levels of features, and modeling the human perception process. This paper presents three novel techniques step by step to efficiently utilize different levels of features for human pose estimation. Firstly, an inception of inception (IOI) block is designed to emphasize the low level features. Secondly, an attention mechanism is proposed to adjust the importance of individual levels according to the context. Thirdly, a cascaded network is proposed to sequentially localize the joints to enforce message passing from joints of stand-alone parts like head and torso to remote joints like wrist or ankle. Experimental results demonstrate that the proposed method achieves the state-of-the-art performance on both MPII and LSP benchmarks.
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,4 @@


		We demonstrate an embodied cognitive agent that helps scientists visualize and analyze exo-planets and their host stars. The prototype is situated in a room equipped with a large display, microphones, cameras, speakers, and pointing devices. Users communicate with the agent via speech, gestures, and combinations thereof, and it responds by displaying content and generating synthesized speech. Extensive use of context facilitates natural interaction with the agent.
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,4 @@


		Pay-per-click advertising includes various formats (e.g., search, contextual, and social) with a total investment of more than 140 billion USD per year. An advertising campaign is composed of some subcampaigns-each with a different ad-and a cumulative daily budget. The allocation of the ads is ruled exploiting auction mechanisms. In this paper, we propose, for the first time to the best of our knowledge, an algorithm for the online joint bid/budget optimization of pay-per-click multi-channel advertising campaigns. We formulate the optimization problem as a combinatorial bandit problem, in which we use Gaussian Processes to estimate stochastic functions, Bayesian bandit techniques to address the exploration/exploitation problem, and a dynamic programming technique to solve a variation of the Multiple-Choice Knapsack problem. We experimentally evaluate our algorithm both in simulation-using a synthetic setting generated from real data from Yahoo!-and in a real-world application over an advertising period of two months.
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,4 @@


		Beam search is a desirable choice of test-time decoding algorithm for neural sequence models because it potentially avoids search errors made by simpler greedy methods. However, typical cross entropy training procedures for these models do not directly consider the behaviour of the final decoding method. As a result, for cross-entropy trained models, beam decoding can sometimes yield reduced test performance when compared with greedy decoding. In order to train models that can more effectively make use of beam search, we propose a new training procedure that focuses on the final loss metric (e.g. Hamming loss) evaluated on the output of beam search. While well-defined, this "direct loss" objective is itself discontinuous and thus difficult to optimize. Hence, in our approach, we form a sub-differentiable surrogate objective by introducing a novel continuous approximation of the beam search decoding procedure.In experiments, we show that optimizing this new training objective yields substantially better results on two sequence tasks (Named Entity Recognition and CCG Supertagging) when compared with both cross entropy trained greedy decoding and cross entropy trained beam decoding baselines.