====== Information Retrieval ======
With the advent to deep learning models, there is a lot of new work to be done in the area of information retrieval (IR). See below for examples.

===== Overviews =====
  * [[https://arxiv.org/pdf/1708.00247.pdf|Azad & Deepak 2017 - Query Expansion Techniques for Information Retrieval: A Survey]]

===== Papers =====
  * [[https://arxiv.org/pdf/2101.07918.pdf|Yu et al 2021 - PGT: Pseudo Relevance Feedback Using a Graph-Based Transformer]]
  * [[https://arxiv.org/pdf/2103.05256.pdf|Naseri et al 2021 - CEQE: Contextualized Embeddings for Query Expansion]]
  * [[https://arxiv.org/pdf/2104.07186.pdf|Gao et al 2021 - COIL: Revisit Exact Lexical Match in Information Retrieval with Contextualized Inverted List]] There is really a lot that can be done in this area.  This is just the beginning

===== Conversational Search =====
See [[nlp:Dialog#Conversational Search]].

===== Retrieval Methods used in NLP =====
These are retrieval methods used in NLP tasks such as open-domain QA, retrieval-based dialog, retrieval-based prompting, etc.

  * BM25
    * [[https://dl.acm.org/doi/pdf/10.1145/2682862.2682863|Trotman et al 2014 - Improvements to BM25 and Language Models Examined]] [[https://github.com/dorianbrown/rank_bm25|Github implementation]] of various variations
  * Generating Titles
    * [[https://arxiv.org/pdf/2204.05511.pdf|Chen et al 2023 - GERE: Generative Evidence Retrieval for Fact Verification]]

===== Dense Document Retrieval with LLMs ====
  * **Overviews**
    * [[https://arxiv.org/pdf/2211.14876.pdf|Zhao et al 2023 - Dense Text Retrieval based on Pretrained
Language Models: A Survey]]
  * [[https://dl.acm.org/doi/pdf/10.1145/3511808.3557527|Ma et al 2023 - A Contrastive Pre-training Approach to Learn Discriminative Autoencoder for Dense Retrieval]]

===== Generative IR =====
  * **Overviews**
    * [[https://arxiv.org/pdf/2404.14851|Li et al 2024 - From Matching to Generation: A Survey on Generative Information Retrieval]]

===== Datasets =====
See also [[https://paperswithcode.com/datasets?task=information-retrieval&mod=texts&page=1|here]] and [[https://paperswithcode.com/task/information-retrieval/latest?page=14|here]].

  * [[https://paperswithcode.com/dataset/learning-to-rank-challenge|Yahoo! Learning to Rank Challenge]] [[https://webscope.sandbox.yahoo.com/catalog.php?datatype=c|homepage]]
  * [[https://arxiv.org/pdf/2104.08663.pdf|Thakur et al 2021 - BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models]]

===== Software =====
  * [[https://www.lemurproject.org/|Lemur]] Open-source research search engine
  * [[https://aurelieherbelot.net/pears/|PeARS]] Local browser-based search engine (seems to be a research system)
  * [[https://github.com/dorianbrown/rank_bm25|Rank-BM25]] A two line search engine

===== Conferences =====
  * [[https://sigir.org/|SIGIR]], [[https://sigir.org/sigir2019/|2019]], [[https://sigir.org/sigir2020/|2020]], [[https://sigir.org/sigir2021/|2021]]

===== People =====
  * [[https://scholar.google.com/citations?user=-bLGeg0AAAAJ&hl=en|James Allan]]
  * [[https://scholar.google.com/citations?user=Un5KXJ4AAAAJ&hl=en|Jamie Callan]] (not up-to-date) or [[https://arxiv.org/search/cs?searchtype=author&query=Callan%2C+J|arXiv]]
  * [[https://scholar.google.com/citations?user=ArV74ZMAAAAJ&hl=en|Bruce Croft]]
  * [[https://scholar.google.com/citations?user=IIXpJ8oAAAAJ&hl=en&oi=ao|Laura Dietz]]
  * [[https://scholar.google.com/citations?user=tbxCHJgAAAAJ&hl=en|Ji-Rong Wen]]

===== Related Pages =====
  * [[nlp:Dialog#Conversational Search]]
  * [[ml:Learning to Rank]]