Syllabus - Data Analytics & Visualization (CD702 (A))
CSE
Data Analytics & Visualization (CD702 (A))
VII-Semester
Unit 1
Data Definitions and Analysis Techniques
Elements, Variables, and Data categorization Levels of Measurement Data management and indexing Introduction to Statistical Concepts: Sampling Distributions, Resampling, Statistical Inference and Descriptive Statistics, Measures of central tendency, Measures of location of dispersions
Unit 2
Advance Data analysis techniques
Statistical hypothesis generation and testing, Chi-Square test, t-Test, Analysis of variance, Correlation analysis, Maximum likelihood test, Regression Modelling, Multivariate Analysis, Bayesian Modelling, Inference and Bayesian Network, Regression analysis
Unit 3
Data Wrangling
Intro to Data Wrangling, Gathering Data, Assessing Data, Cleaning Data. Data Visualization in Data Analysis: Design of Visualizations, Univariate Exploration of Data, Bivariate Exploration of Data, Multivariate Exploration of Data, Explanatory Visualizations.
Unit 4
Data Ecosystem
Overview of the Data Analyst Ecosystem, Types of Data, Understanding Different Types of File Formats, Sources of Data, Overview of Data Repositories, NoSQL, Data Marts, Data Lakes, ETL, and Data Pipelines, Foundations of Big Data, Big Data processing tools such as Hadoop, Hadoop Distributed File System (HDFS), Hive, and Spark
Unit 5
Data Visualization tools
Python visualization libraries (matplotlib, pandas, seaborn, ggplot, plotly), Introduction to PowerBI tools, Examples of inspiring (industry) projects- Exercise: create your own visualization of a complex dataset.
Practicals
Reference Books
-
Joel Grus, Data Science from Scratch, Shroff Publisher Publisher /O’Reilly Publisher Media
-
Annalyn Ng, Kenneth Soo, Numsense! Data Science for the Layman, Shroff Publisher Publisher
-
Cathy O’Neil and Rachel Schutt. Doing Data Science, Straight Talk from The Frontline. O’Reilly Publisher Media.
-
Jure Leskovek, Anand Rajaraman and Jeffrey Ullman. Mining of Massive Datasets. v2.1, Cambridge University Press.
-
Jake VanderPlas, Python Data Science Handbook, Shroff Publisher Publisher /O’Reilly Publisher Media
-
Philipp Janert, Data Analysis with Open Source Tools, Shroff Publisher Publisher /O’Reilly Publisher Media.