Schedule

Date   Description Bibliography
Slides
11/09/2020   Introduction and Boolean Retrieval

Chapter 1, Introduction to Information Retrieval, by C. Manning, P. Raghavan, and H. Schütze

Optional reading:

  • A. Moffat, J. Zobel, D. Hawking, Recommended reading for IR research students, ACM SIGIR Forum, vol. 39, no. 2, pp. 3-14, 2005.
  • See Sergey Brin, speaking on Search, Google and Life, UC Berkeley, Oct. 2005.

 

18/09/2020   Text encoding: tokenization, stemming, lemmatization, stop words, phrases

Chapter 2, Introduction to Information Retrieval, by C. Manning, P. Raghavan, and H. Schütze

Optional reading:

  • Bahle, D., Williams, H. E., and Zobel, J. 2002. Efficient phrase querying with an auxiliary index. In Proceedings of the 25th Annual international ACM SIGIR Conference on Research and Development in information Retrieval (Tampere, Finland, August 11 - 15, 2002).
25/09/2020   Dictionaries & Tolerant retrieval

Chapter 3, Introduction to Information Retrieval, by C. Manning, P. Raghavan, and H. Schütze

Optional reading:

  • J. Zobel and P. Dart. Finding approximate matches in large lexicons. Software - practice and experience 25(3), March 1995.
  • K. Kukich. Techniques for automatically correcting words in text. ACM Computing Surveys 24(4), Dec 1992.
2/10/2020   Index construction

Chapter 4, Introduction to Information Retrieval, by C. Manning, P. Raghavan, and H. Schütze

Optional reading:

  • Shanks, V. R. and Williams, H. E. 2003. Index construction for linear categorisation. In Proceedings of the Twelfth international Conference on information and Knowledge Management (New Orleans, LA, USA, November 03 - 08, 2003).
  • Dean, J. and Ghemawat, S. 2004. MapReduce: simplified data processing on large clusters. In Proceedings of the 6th Conference on Symposium on Opearting Systems Design & Implementation - Volume 6 (San Francisco, CA, December 06 - 08, 2004).
  • See the video of Jeff Dean's (Google Inc) colloquium Google: A Behind-the-Scenes Look at the University of Washington, October 2004; covers aspects of MapReduce and the systems behind the search engine.
9/10/2020  

Index construction

Index compression

 

Chapters 4, 5 Introduction to Information Retrieval, by C. Manning, P. Raghavan, and H. Schütze

Optional reading:

  • Büttcher, S. and Clarke, C. L. 2007. Index compression is good, especially for random access. In Proceedings of the Sixteenth ACM Conference on Conference on information and Knowledge Management (Lisbon, Portugal, November 06 - 10, 2007).

16/10/2020

 

 

Vector Space Retrieval & Computing Scores in a complete search system

 

Chapters 6, 7, 8 Introduction to Information Retrieval, by C. Manning, P. Raghavan, and H. Schütze

Optional reading:

  • Zobel, J. and Moffat, A. 1998. Exploring the similarity space. SIGIR Forum 32, 1 (Apr. 1998).
23/10/2020  

Relevance Feedback & Query Expansion

XML retrieval 

Chapters 8, 9. Introduction to Information Retrieval, by C. Manning, P. Raghavan, and H. Schütze

Chapter 10, Introduction to Information Retrieval, by C. Manning, P. Raghavan, and H. Schütze.

Optional reading:

  • Anh, V. N., de Kretser, O., and Moffat, A. 2001. Vector-space ranking with effective early termination. In Proceedings of the 24th Annual international ACM SIGIR Conference on Research and Development in information Retrieval (New Orleans, Louisiana, United States).
30/10/2020   Midterm
  • Topics: Chapters 1-9, Manning, Raghavan, Schutze.
  • The midterm exam will last 120 minutes.

6/11/2020

 

  Data classification / Data clustering

Chapters 13, 14. Introduction to Information Retrieval, by C. Manning, P. Raghavan, and H. Schütze

Chapters 16, 17. Introduction to Information Retrieval, by C. Manning, P. Raghavan, and H. Schütze

Optional  reading:

  • Jain, A. K., Murty, M. N., and Flynn, P. J. 1999. Data clustering: a review. ACM Comput. Surv. 31, 3 (Sep. 1999), 264-323
  • See the video of Ulrike von Luxburg's (Max Planck Institute for Biological Cybernetics) colloquium Lectures on Clustering at the PASCAL Bootcamp in Machine Learning.
  • Jain, A. K., Murty, M. N., and Flynn, P. J. 1999. Data clustering: a review. ACM Comput. Surv. 31, 3 (Sep. 1999), 264-323
  • See the video of Yee Whye Teh's (University College London) colloquium Hierarchical Clustering at the EPSRC Winter School in Mathematics for Data Modelling.

13/11/2020

 

 

Web search Basics/Crawling and Indexing

 

Chapter 20. Introduction to Information Retrieval, by C. Manning, P. Raghavan, and H. Schütze


Chapter 3, Mining Massive Datasets, by Jure Leskovec, Anand Rajaraman and Jeff Ullman, Cambridge University Press, 2014

Slides for the minhash

Crawling Techniques (Chapter  6, Modeling the Internet and the Web- Probabilistic Methods and Algorithms, by Pierre Baldi, Paolo Frasconi, Padhraic Smyth, Wiley, 2003.)

Georgios John Fakas, Zhi Cai, Nikos Mamoulis: Diverse and proportional size-l object summaries using pairwise relevance. VLDB J. 25(6): 791-816 (2016)2015

Georgios John Fakas, Zhi Cai, Nikos Mamoulis: Diverse and Proportional Size-l Object Summaries for Keyword Search. SIGMOD Conference 2015: 363-375

Optional reading:

  •  How search works
  •  Search Engine Users: Internet searchers are confident, satisfied and trusting -- but they are also unaware and naive, by Deborah Fallows, Pew Internet Research report, January 23, 2005.
  • Kobayashi, M. and Takeda, K. 2000. Information retrieval on the web. ACM Comput. Surv. 32, 2 (Jun. 2000), 144-173.
  • An Investigation of Web Crawler behavior: Characterization and Metrics. M. D. Dikaiakos, A. Stassopoulou, L. Papageorgiou. Computer Communications, May 2005. Vol. 28, Issue 8, pp. 880-897, Elsevier (available online through Elsevier's portal; locally in pdf).
  • Crawling the Infinite Web Baeza-Yates, R.; Castillo, C. , Journal of Web Engineering, February, Volume 6, Number 1, p.49--72, (2007)

 

 

20/11/2020

 

 

Link Analysis

Chapter 21, Introduction to Information Retrieval, by C. Manning, P. Raghavan, and H. Schütze

Chapter 5, Mining Massive Datasets, by Anand Rajaraman and Jeff Ullman, Cambridge University Press, 2011

Link Analysis (Chapter  5, Modeling the Internet and the Web- Probabilistic Methods and Algorithms, by Pierre Baldi, Paolo Frasconi, Padhraic Smyth, Wiley, 2003.)

Optional reading:  

27/11/2020   Projects Presentation