Skip to menu Skip to content Skip to footer
Course profile

Information Retrieval and Web Search (INFS7410)

Study period
Sem 2 2024
Location
St Lucia
Attendance mode
In Person

Course overview

Study period
Semester 2, 2024 (22/07/2024 - 18/11/2024)
Study level
Postgraduate Coursework
Location
St Lucia
Attendance mode
In Person
Units
2
Administrative campus
St Lucia
Coordinating unit
Elec Engineering & Comp Science School

The course discusses the theory, design, and implementation of Information Retrieval (IR) techniques in text-based information systems. The theoretical component of the course focuses on IR methods for the processing, indexing, querying, ranking, organisation, and classification of textual documents, including Web documents. A variety of current research topics are also covered, including professional search and recommendation systems. The practical component of the course addresses the design and implementation of high-capacity text retrieval and filtering systems such as web search engines.

The ubiquity of text-based information systems (e.g., Web search engines, online advertising, and recommender systems) is fuelling the rapid growth of interest in studying various topics in the area of Information Retrieval (IR). The remarkable success of search engines like Google, Bing and the likesᅠis a striking example of how important it is for academia and industry to foster innovation in the field of IR and text-based information systems. This course will coverᅠvarious IR techniques that are employed into building a complete text-based information system. Changes to assessment have been made with weekly quizzes replacing mid-semester exam to encourage students to engage throughout the course. Introduced a new module focused on generative large language models and generative IR due to their importance in fuelling the latest developments in the field.

Course requirements

Assumed background

Background in data managements (e.g.,ᅠSQL and indexing),ᅠprobability and statistics, algorithms and data structures;ᅠsolid programming skillsᅠ(Python) and algorithmical thinking.

Prerequisites

You'll need to complete the following courses before enrolling in this one:

INFS2200 or INFS7903

Course contact

Course staff

Lecturer

Professor Guido Zuccon

Timetable

The timetable for this course is available on the UQ Public Timetable.

Aims and outcomes

The aim of this course is to provide a comprehensive introduction to Information Retrieval (IR). The areas covered include data acquisition andᅠpre-processing (crawling, stemming), indexing, querying, ranking, organisation and representation of textual documents, evaluation. The goal is to present fundamental concepts and algorithms for each topic, thus providing the students with the necessary background and practical skills for the application of information retrieval in web search engines. In addition, this course also provides a starting point for those students who are interested in pursuing research in information retrieval or related fields.

Learning outcomes

After successfully completing this course you should be able to:

LO1.

Gain in-depth knowledge of the core principles in the information retrieval field such as text representation and similarity computation, indexing textual documents, query correction and expansion, retrieval models, performance evaluation and metrics

LO2.

Acquire hands-on practical experience with the development of components of Web search engines and experimentation of end-to-end systems using relevant open-source libraries.

LO3.

Understand advanced topics in information retrieval research such as learning to rank, neural retrieval and ranking.

LO4.

Create, analyse and evaluate novel information retrieval solutions to search related problems

LO5.

Compare and contrast information retrieval methods and communicate their differences, advantages and disadvantages, based on quantitative evaluation.

LO6.

Analyse search tasks and problems, and identify and communicate relevant information retrieval solutions.

Assessment

Assessment summary

Category Assessment task Weight Due date
Quiz Weekly quizzes
  • Online
10%

26/07/2024 3:00 pm

2/08/2024 3:00 pm

9/08/2024 3:00 pm

16/08/2024 3:00 pm

23/08/2024 3:00 pm

30/08/2024 3:00 pm

6/09/2024 3:00 pm

13/09/2024 3:00 pm

20/09/2024 3:00 pm

4/10/2024 3:00 pm

11/10/2024 3:00 pm

18/10/2024 3:00 pm

25/10/2024 3:00 pm

Computer Code, Project Project Part 1 20%

30/08/2024 4:00 pm

Computer Code, Project Project Part 2 20%

18/10/2024 4:00 pm

Examination Final Oral Exam
  • Hurdle
  • Identity Verified
  • In-person
50%

4/11/2024 - 15/11/2024

A hurdle is an assessment requirement that must be satisfied in order to receive a specific grade for the course. Check the assessment details for more information about hurdle requirements.

Assessment details

Weekly quizzes

  • Online
Mode
Written
Category
Quiz
Weight
10%
Due date

26/07/2024 3:00 pm

2/08/2024 3:00 pm

9/08/2024 3:00 pm

16/08/2024 3:00 pm

23/08/2024 3:00 pm

30/08/2024 3:00 pm

6/09/2024 3:00 pm

13/09/2024 3:00 pm

20/09/2024 3:00 pm

4/10/2024 3:00 pm

11/10/2024 3:00 pm

18/10/2024 3:00 pm

25/10/2024 3:00 pm

Learning outcomes
L01, L03

Task description

Online Quizzes on Blackboard. To be completed individually.

Submission guidelines

Deferral or extension

You cannot defer or apply for an extension for this assessment.

To accommodate unforeseen circumstances such as illness, your quiz score will be based on the best 10 out of 13 submissions.

Late submission

100% Late Penalty after 1 hour grace period. The one-hour grace period is recorded from the time the submission is due.

Project Part 1

Mode
Written
Category
Computer Code, Project
Weight
20%
Due date

30/08/2024 4:00 pm

Learning outcomes
L02, L04, L05, L06

Task description

This project requires students to implement a set of identified indexing and retrieval methods. The students also need to evaluate and analyse the results of the techniques.

This is an individually assessed project.

Submission guidelines

UQ Blackboard Online Submission

Deferral or extension

You may be able to apply for an extension.

The maximum extension allowed is 7 days. Extensions are given in multiples of 24 hours.

Marked assignments with feedback and/or detailed solutions with feedback will be released to students within 14-21 days, where the earlier time frame applies if there are no extensions.

Late submission

A penalty of 10% of the maximum possible mark will be deducted per 24 hours from time submission is due for up to 7 days. After 7 days, you will receive a mark of 0.

Project Part 2

Mode
Written
Category
Computer Code, Project
Weight
20%
Due date

18/10/2024 4:00 pm

Learning outcomes
L02, L04, L05, L06

Task description

This project requires students to implement a set of identified indexing and retrieval methods. The students also need to evaluate and analyse the results of the techniques.

This is an individually assessed project.

Submission guidelines

UQ Blackboard Online Submission

Deferral or extension

You may be able to apply for an extension.

The maximum extension allowed is 7 days. Extensions are given in multiples of 24 hours.

Marked assignments with feedback and/or detailed solutions with feedback will be released to students within 14-21 days, where the earlier time frame applies if there are no extensions.

Late submission

A penalty of 10% of the maximum possible mark will be deducted per 24 hours from time submission is due for up to 7 days. After 7 days, you will receive a mark of 0.

Final Oral Exam

  • Hurdle
  • Identity Verified
  • In-person
Mode
Oral
Category
Examination
Weight
50%
Due date

4/11/2024 - 15/11/2024

Learning outcomes
L01, L03, L05, L06

Task description

All topics covered.

Hurdle requirements

To pass this course, a minimum of 50% must be obtained in the final exam.

Exam details

Planning time no planning time minutes
Duration 60 minutes
Calculator options

No calculators permitted

Open/closed book Closed Book examination - no written materials permitted
Exam platform Other
Invigilation

Invigilated in person

Submission guidelines

Deferral or extension

You may be able to defer this exam.

Course grading

Full criteria for each grade is available in the Assessment Procedure.

Grade Cut off Percent Description
1 (Low Fail) 0 - 19

Absence of evidence of achievement of course learning outcomes.

2 (Fail) 20 - 44

Minimal evidence of achievement of course learning outcomes.

3 (Marginal Fail) 45 - 49

Demonstrated evidence of developing achievement of course learning outcomes

4 (Pass) 50 - 64

Demonstrated evidence of functional achievement of course learning outcomes.

5 (Credit) 65 - 74

Demonstrated evidence of proficient achievement of course learning outcomes.

6 (Distinction) 75 - 84

Demonstrated evidence of advanced achievement of course learning outcomes.

7 (High Distinction) 85 - 100

Demonstrated evidence of exceptional achievement of course learning outcomes.

Additional course grading information

Note that you have to score at least 25 out of 50 marks (50%) for the final exam to pass the course (otherwise the mark will be capped at 3), and percentages will not be roundedᅠbefore any grade cut-offs apply.

Supplementary assessment

Supplementary assessment is available for this course.

Additional assessment information

Having Troubles?

If you are having difficulties with any aspect of the course material, you should seek help. Speak to the course teaching staff.

If external circumstances are affecting your ability to work on the course, you should seek help as soon as possible. The University and UQ Union have organisations and staff who are able to help, for example, UQ Student Services are able to help with study and exam skills, tertiary learning skills, writing skills, financial assistance, personal issues, and disability services (among other things).

Complaints and criticisms should be directed in the first instance to the course coordinator. If you are not satisfied with the outcome, you may bring the matter to the attention of the School of EECS Director of Teaching and Learning.

Generative AI and Machine Translation in Assessment

  • Weekly Quizzes: This assessment task evaluates students' abilities, skills and knowledge without the aid of generative Artificial Intelligence (AI) or Machine Translation (MT). Students are advised that the use of AI technologies to develop responses is strictly prohibited and may constitute student misconduct under the Student Code of Conduct.
  • Project (Part 1 and Part 2): Artificial Intelligence (AI) and Machine Translation (MT) are emerging tools that may support students in completing this assessment task. Students may appropriately use AI and/or MT in completing this assessment task. Students must clearly reference any use of AI or MT in each instance. A failure to reference generative AI or MT use may constitute student misconduct under the Student Code of Conduct.
  • Oral Examination: This assessment task is to be completed in-person. The use of generative Artificial Intelligence (AI) and Machine Translation (MT) tools will not be permitted. Any attempted use of Generative AI may constitute student misconduct under the Student Code of Conduct.

Learning resources

You'll need the following resources to successfully complete the course. We've indicated below if you need a personal copy of the reading materials or your own item.

Library resources

Find the required and recommended resources for this course on the UQ Library website.

Additional learning resources information

Course material will be published on the course website

Learning activities

The learning activities for this course are outlined below. Learn more about the learning outcomes that apply to this course.

Filter activity type by

Please select
Clear filters
Learning period Activity type Topic
Multiple weeks

From Week 1 To Week 13
(22 Jul - 27 Oct)

Practical

Practicals

Practicals will provide an opportunity to further understand the concepts and techniques introduced in the lectures via examples, exercises, projects and problem-solving. Further concepts will be developed during these activities. There will be a mix of theory and hands-on activities. NOTE: these sessions start from week 1 and run until week 13 included.

Learning outcomes: L02, L04, L05, L06

Week 1

(22 Jul - 28 Jul)

Lecture

Intro, Search Engine Architecture, Text Analysis

Introduction, Motivation and Logistic. Architecture of a Search Engine, key components of study across this course. Zipf's Law, Stemming, Stopwording. Indexing techniques and processing, including data structures.

Learning outcomes: L01, L04, L05

Week 2

(29 Jul - 04 Aug)

Lecture

Offline Evaluation

Offline evaluation methodology and evaluation frameworks, set-based and rank-based evaluation measures, statistical significance testing, parameter tuning.

Learning outcomes: L01, L05

Week 3

(05 Aug - 11 Aug)

Lecture

Retrieval Models 1

Retrieval Models: term-based and term dependency methods for matching and ranking documents.

Learning outcomes: L01, L05

Week 4

(12 Aug - 18 Aug)

Lecture

Retrieval Models 2

Fusion methods and semantic matching methods

Learning outcomes: L01, L05

Week 5

(19 Aug - 25 Aug)

Lecture

Retrieval Models 3

Methods based on query analysis, query expansion, word embeddings, relevance feedback

Learning outcomes: L01, L05

Week 6

(26 Aug - 01 Sep)

Lecture

Retrieval Models 4

Learning to rank

Learning outcomes: L03, L05

Week 7

(02 Sep - 08 Sep)

Lecture

Retrieval Models 5

Exploit Implicit Signals: Online Learning to rank, federated online learning to rank, counterfactual learning, online evaluation, click models

Learning outcomes: L01, L03, L05

Week 8

(09 Sep - 15 Sep)

Lecture

Retrieval Models 6

Pre-trained language models, Transformers and the BERT rankers

Learning outcomes: L03, L05

Week 9

(16 Sep - 22 Sep)

Lecture

Retrieval Models 7

Dense Retrievers

Learning outcomes: L03, L05

Week 11

(07 Oct - 13 Oct)

Lecture

Index Compression

Entropy and ambiguity, methods for compression: delta encoding, bit aligned codes, byte aligned codes; looking ahead and skipping

Learning outcomes: L01, L05

Week 12

(14 Oct - 20 Oct)

Lecture

Crawling and Link Analysis

Methods for crawling Web pages; making use of link information for retrieval: PageRank and HITS.

Learning outcomes: L01, L05

Week 13

(21 Oct - 27 Oct)

Lecture

Interactive IR, Diversity

Evaluation practices based on experiments with users. Ranking methods for diversity, ambiguity and redundancy

Learning outcomes: L04, L05, L06

Policies and procedures

University policies and procedures apply to all aspects of student life. As a UQ student, you must comply with University-wide and program-specific requirements, including the:

Learn more about UQ policies on my.UQ and the Policy and Procedure Library.

School guidelines

Your school has additional guidelines you'll need to follow for this course: