====== Information Retrieval ====== With the advent to deep learning models, there is a lot of new work to be done in the area of information retrieval (IR). See below for examples. ===== Overviews ===== * [[https://arxiv.org/pdf/1708.00247.pdf|Azad & Deepak 2017 - Query Expansion Techniques for Information Retrieval: A Survey]] ===== Papers ===== * [[https://arxiv.org/pdf/2101.07918.pdf|Yu et al 2021 - PGT: Pseudo Relevance Feedback Using a Graph-Based Transformer]] * [[https://arxiv.org/pdf/2103.05256.pdf|Naseri et al 2021 - CEQE: Contextualized Embeddings for Query Expansion]] * [[https://arxiv.org/pdf/2104.07186.pdf|Gao et al 2021 - COIL: Revisit Exact Lexical Match in Information Retrieval with Contextualized Inverted List]] There is really a lot that can be done in this area. This is just the beginning ===== Conversational Search ===== See [[nlp:Dialog#Conversational Search]]. ===== Retrieval Methods used in NLP ===== These are retrieval methods used in NLP tasks such as open-domain QA, retrieval-based dialog, retrieval-based prompting, etc. * BM25 * [[https://dl.acm.org/doi/pdf/10.1145/2682862.2682863|Trotman et al 2014 - Improvements to BM25 and Language Models Examined]] [[https://github.com/dorianbrown/rank_bm25|Github implementation]] of various variations * Generating Titles * [[https://arxiv.org/pdf/2204.05511.pdf|Chen et al 2023 - GERE: Generative Evidence Retrieval for Fact Verification]] ===== Dense Document Retrieval with LLMs ==== * **Overviews** * [[https://arxiv.org/pdf/2211.14876.pdf|Zhao et al 2023 - Dense Text Retrieval based on Pretrained Language Models: A Survey]] * [[https://dl.acm.org/doi/pdf/10.1145/3511808.3557527|Ma et al 2023 - A Contrastive Pre-training Approach to Learn Discriminative Autoencoder for Dense Retrieval]] ===== Generative IR ===== * **Overviews** * [[https://arxiv.org/pdf/2404.14851|Li et al 2024 - From Matching to Generation: A Survey on Generative Information Retrieval]] ===== Datasets ===== See also [[https://paperswithcode.com/datasets?task=information-retrieval&mod=texts&page=1|here]] and [[https://paperswithcode.com/task/information-retrieval/latest?page=14|here]]. * [[https://paperswithcode.com/dataset/learning-to-rank-challenge|Yahoo! Learning to Rank Challenge]] [[https://webscope.sandbox.yahoo.com/catalog.php?datatype=c|homepage]] * [[https://arxiv.org/pdf/2104.08663.pdf|Thakur et al 2021 - BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models]] ===== Software ===== * [[https://www.lemurproject.org/|Lemur]] Open-source research search engine * [[https://aurelieherbelot.net/pears/|PeARS]] Local browser-based search engine (seems to be a research system) * [[https://github.com/dorianbrown/rank_bm25|Rank-BM25]] A two line search engine ===== Conferences ===== * [[https://sigir.org/|SIGIR]], [[https://sigir.org/sigir2019/|2019]], [[https://sigir.org/sigir2020/|2020]], [[https://sigir.org/sigir2021/|2021]] ===== People ===== * [[https://scholar.google.com/citations?user=-bLGeg0AAAAJ&hl=en|James Allan]] * [[https://scholar.google.com/citations?user=Un5KXJ4AAAAJ&hl=en|Jamie Callan]] (not up-to-date) or [[https://arxiv.org/search/cs?searchtype=author&query=Callan%2C+J|arXiv]] * [[https://scholar.google.com/citations?user=ArV74ZMAAAAJ&hl=en|Bruce Croft]] * [[https://scholar.google.com/citations?user=IIXpJ8oAAAAJ&hl=en&oi=ao|Laura Dietz]] * [[https://scholar.google.com/citations?user=tbxCHJgAAAAJ&hl=en|Ji-Rong Wen]] ===== Related Pages ===== * [[nlp:Dialog#Conversational Search]] * [[ml:Learning to Rank]]