Syllabus - Information Extraction and Retrieval (CD604 (A))
CSE-Data Science/Data Science
Information Extraction and Retrieval (CD604 (A))
VI
UNIT-I
Introduction
History of IR- Components of IR - Issues -Open source Search engine Frameworks - The Impact of the web on IR - The role of artificial intelligence (AI) in IR – IR Versus Web Search - Components of a search engine, Characterizing the web.
UNIT-II
Boolean and Vector space retrieval models
Term weighting - TF-IDF weighting- cosine similarity - Preprocessing - Inverted indices - efficient processing with sparse vectors Language Model based IR - Probabilistic IR -Latent Semantic indexing - Relevance feedback and query expansion.
UNIT-III
Web search overview
web structure the user paid placement search engine optimization, Web Search Architectures - crawling - meta-crawlers, Focused Crawling - web indexes - Nearduplicate detection - Index Compression - XML retrieval.
UNIT-IV
Link Analysis
hubs and authorities - Page Rank and HITS algorithms -Searching and Ranking - Relevance Scoring and ranking for Web - Similarity - Hadoop & Map Reduce - Evaluation - Personalized search - Collaborative filtering and content-based recommendation of documents And products - handling invisible Web - Snippet generation Summarization. Question Answering, Cross- Lingual Retrieval.
UNIT-V
Information filtering: organization and relevance feedback
Text Mining- Text classification and clustering - Categorization algorithms, naive Bayes, decision trees and nearest neighbor - Clustering algorithms: agglomerative clustering, k-means, expectation maximization (EM).
Practicals
Reference Books
-
C. Manning, P. Raghvan and H Schutze: Introduction to Information Retrieval, Cambridge University Press, 2008.
-
Ricardo Baeza -Yates and Berthier Ribeiro –Neto, Modern Information Retrieval. The Concepts and Technology behind Search 2nd Edition, ACM Press Books 2011.
-
Bruce Croft, Donald Metzler and Trevor Strohman Search Engines Information Retrieval in Practice 1st Edition Addison Wesley, 2009
-
4.Mark Levene, An Introduction to Search Engines and Web Navigation, 2nd Edition Wiley 2010.