Generating Search Rankings with Deep Reinforcement Learning
Abhishek Sharma
California
- 0 Collaborators
Learning to Rank is the problem involved with ranking a sequence of documents based on their relevance to a given query. In this work, we propose DeepQRank, a deep q-learning approach to this problem. ...learn more
Project status: Published/In Market
Groups
Student Developers for AI
Intel Technologies
DevCloud
Overview / Usage
In the Learning to Rank (LTR) problem, we are given a set of queries. Each query is usually accompanied by multiple (hundreds of) "documents", which are items with varying degrees of relevance to the query. These document-query pairs are common in search engine settings such as Google Search, as well as recommendation engines. The goal of LTR is to return a list of these pairs so that the documents are intelligently ranked by relevance. Most models approximate a relevance function f(X) -> Y, where Y is the "relevance score" for document X. This is known as pointwise learning to rank. These models require the dataset to include a predetermined score that accompanies each document-query pair. In listwise learning to rank (the focus of this project), the document-query pairs have no target value to predict, but rather a correct order in which the documents are pre-ranked. The model's job is to reconstruct these rankings for future document-query pairs.
Deep Q-Learning has been shown to be a useful method for training an agent in sequential decision making. In this paper, we show that DeepQRank, our deep q-learning agent, demonstrates performance on learning to rank tasks that can be considered state-of-the-art. Though computationally slower than a supervised learning approach, our agent has fewer limitations in that it can perform both pointwise (predict the target "relevance" for a single item) and listwise (return an entire ranked list) ranking, as opposed to traditional supervised learning methods which are only capable of pointwise ranking.
Algorithms that perform well in learning to rank settings can be used to develop search engines and recommendation systems. In fact, such algorithms are in use by Alibaba, Bing, and Yandex, a Russian search engine.
Methodology / Approach
In this work, we propose DeepQRank, a deep q-learning approach to this problem. Related work has shown the effectiveness of representing LTR as a Markov Decision Process (MDP). We build upon this research by applying a new learning algorithm to the application of ranking. In order to apply deep q-learning, we represent Learning to Rank as an MDP. Essentially, the agent begins with a randomly ordered list of documents. It selects documents from this list one by one based on which has the maximum estimated reward. Equipped with this formulation, our agent is now able to rank documents. While both MDP representations and deep learning approaches performed well in their approaches to LTR, each has its limitations. The MDP approach isn't able to learn a complex function to represent a document's rank since it isn't paired with a neural network. Meanwhile, the deep learning approach requires each document to have a "relevance" label in a discrete range. For example, some datasets assign a relevance strength in the range [0,4] to each document-query pair. This makes the dataset suitable for supervised learning approaches such as the neural network called "FastAP". We will run our algorithm against Microsoft's LETOR dataset.
Technologies Used
Intel Technologies
- Intel Python
- Intel DevCloud
Other Technologies
- SciPy Stack (NumPy, Pandas, Scikit-learn, SciPy)
- PyTorch
Repository
https://github.com/abhi1345/deep-q-rank