Syllabus - Data Science (IT 702(A))


Information Technology

Data Science (IT 702(A))

VII-Semester

Unit I

Data Science and Big Data Overview

Types of data, Sources of data, Data collection, Data storage and management, Big Data Overview, Characterization of Big data, Drivers of Big Data, Challenges, Big Data Use Cases, Defining Big Data Analytics and examples of its use cases, Data Analytics Lifecycle: Discovery, Data Preparation, Model Planning, Model Building, Communicate Results, Operationalize.

Unit II

Advanced Analytical Theory and Methods

Clustering, K-means, Additional Clustering Algorithms, Association Rules, Apriori Algorithm, Applications of Association Rules, Regression, Linear Regression, Logistic Regression, Classification, Decision Trees, Naive Bayes, Additional Classification Methods, Text Analysis, Text Analysis Steps, Determining Sentiments.

Unit III

Advanced Analytics-Technology and Tools

Analytics for Unstructured Data Use Cases, MapReduce, Apache Hadoop,Traditional database vs Hadoop, Hadoop Core Components, HDFS, Design of HDFS, HDFS Components, HDFS Architecture, Hadoop 2.0 Architecture, Hadoop-2.0 Resource Management, YARN.

Unit IV

The Hadoop Ecosystem

Introduction to Hive, Hbase, HiveUse Cases: Facebook, Healthcare; Hive Architecture, Hive Components. Integrating Data Sources, Dealing with Real-Time Data Streams and Complex Event Processing, Overview of Pig, Difference between Hive and Pig, Use Cases of Pig, Pig program structure, Pig Components, Pig Execution, Pig data models, Overview of Mahout, Mahout working.

Unit V

Introduction to R, Basic Data Analytics Methods Using R, Communicating and Operationalizing an Analytics Project, Creating the Final Deliverables, Data Visualization Basics.

Course Objective

The objective of this course is to familiarize students with the roles of a data scientist and enable them to analyze data to derive meaningful information from it.

Course Outcome

["Demonstrate proficiency with statistical analysis of data.", "Build and assess data-based models.", "Execute statistical analyses with professional statistical software.", "Demonstrate skill in data management.", "Apply data science concepts and methods to solve problems in real-world contexts and will communicate these solutions effectively"]

Practicals

Reference Books

  • EMC Education Services, “Data Science and Big Data Analytics”, Wiley, 2015.

  • Judith Hurwitz, Alan Nugent, Fern Halper, and Marcia Kaufman, “Big Data for Dummies”,Wiley & Sons, 2013.

  • VigneshPrajapati, “Big Data Analytics with R and Hadoop” ,Packt Publishing, 2013.

  • David Dietrich, Barry Heller, and Beibei Yang“Data Science and Big Data Analytics:Discovering, Analyzing, Visualizing and Presenting Data”, John Wiley & Sons, Inc.