Syllabus - Information Extraction and Retrieval (CD604 (A))


CSE-Data Science/Data Science

Information Extraction and Retrieval (CD604 (A))

VI

UNIT-I

Introduction

History of IR- Components of IR - Issues -Open source Search engine Frameworks - The Impact of the web on IR - The role of artificial intelligence (AI) in IR – IR Versus Web Search - Components of a search engine, Characterizing the web.

UNIT-II

Boolean and Vector space retrieval models

Term weighting - TF-IDF weighting- cosine similarity - Preprocessing - Inverted indices - efficient processing with sparse vectors Language Model based IR - Probabilistic IR -Latent Semantic indexing - Relevance feedback and query expansion.

UNIT-III

Web search overview

web structure the user paid placement search engine optimization, Web Search Architectures - crawling - meta-crawlers, Focused Crawling - web indexes - Nearduplicate detection - Index Compression - XML retrieval.

UNIT-IV

Link Analysis

hubs and authorities - Page Rank and HITS algorithms -Searching and Ranking - Relevance Scoring and ranking for Web - Similarity - Hadoop & Map Reduce - Evaluation - Personalized search - Collaborative filtering and content-based recommendation of documents And products - handling invisible Web - Snippet generation Summarization. Question Answering, Cross- Lingual Retrieval.

UNIT-V

Information filtering: organization and relevance feedback

Text Mining- Text classification and clustering - Categorization algorithms, naive Bayes, decision trees and nearest neighbor - Clustering algorithms: agglomerative clustering, k-means, expectation maximization (EM).

Practicals

Reference Books

  • C. Manning, P. Raghvan and H Schutze: Introduction to Information Retrieval, Cambridge University Press, 2008.

  • Ricardo Baeza -Yates and Berthier Ribeiro –Neto, Modern Information Retrieval. The Concepts and Technology behind Search 2nd Edition, ACM Press Books 2011.

  • Bruce Croft, Donald Metzler and Trevor Strohman Search Engines Information Retrieval in Practice 1st Edition Addison Wesley, 2009

  • 4.Mark Levene, An Introduction to Search Engines and Web Navigation, 2nd Edition Wiley 2010.