Syllabus - DATA MINING AND ANALYTICS (CB-606 (C))


Computer Science and Business System (CSBS)

DATA MINING AND ANALYTICS (CB-606 (C))

VI

UNIT – I

Introduction to Data Mining

What is data mining? Related technologies - Machine Learning, DBMS, OLAP, Statistics, Stages of the Data Mining Process, Data Mining Techniques, Knowledge Representation Methods, Applications

UNIT – II

Data preprocessing

Data cleaning, Data transformation, Data reduction, Discretization and generating concept hierarchies, Installing Weka 3 Data Mining System, Experiments with Weka - filters, discretization

Data mining knowledge representation

Task relevant data, Background knowledge, Representing input data and output knowledge, Visualization techniques

Attribute-oriented analysis

Attribute generalization, Attribute comparison, Statistical measures relevance, Class

UNIT – III

Data mining algorithms - Association rules

Motivation and terminology, Example: mining weather data, Basic idea: item sets, Generating item sets and rules efficiently, Correlation analysis

Data mining algorithms - Classification

Basic rudimentary rules: 1R, algorithm, Decision trees, covering rules learning/mining tasks, Inferring

Data mining algorithms - Prediction

The prediction task, Statistical (Bayesian) classification, Bayesian networks, Instance-based methods (nearest neighbor), linear models

UNIT – IV

Descriptive analytics

Data Modeling, Trend Analysis, Simple Linear Regression Analysis

Forecasting models

Heuristic methods, predictive modeling and pattern discovery, Logistic Regression: Logit transform, ML estimation, Tests of hypotheses, Wald test, LR test, score test, test for overall regression, multiple logistic regression, forward, backward method, interpretation of parameters, relation with categorical data analysis. Interpreting Regression Models, Implementing Predictive Models.

Generalized Linear model

link functions such as Poisson, binomial, inverse binomial, inverse Gaussian, Gamma.

Non Linear Regression (NLS)

Linearization transforms, their uses & limitations, examination of non-linearity, initial estimates, iterative procedures for NLS, grid search, Newton-Raphson, steepest descent, Marquardt’s methods. Introduction to Semiparametric regression models, additive regression models. Introduction to nonparametric regression methods.

UNIT – V

Time Series Analysis

Auto - Covariance, Auto-correlation and their properties. Exploratory time series analysis, Test for trend and seasonality, Exponential and moving average smoothing, Holt – Winter smoothing, forecasting based on smoothing.

Linear time series models

Autoregressive, Moving Average, Autoregressive Moving Average and Autoregressive Integrated Moving Average models; Estimation of ARMA models such as Yule-Walker estimation for AR Processes, Maximum likelihood and least squares estimation for ARMA Processes, Forecasting using ARIMA models.

Prescriptive Analytics

Mathematical optimization, Networks modeling-Multi-objective optimization-Stochastic modeling, Decision and Risk analysis, Decision trees.

Practicals

  • Installing Weka and exploring a dataset

  • Create a Weather Table with the help of WEKA tool

  • Apply Pre-Processing techniques to the training data set of Weather Table

  • Normalize Weather Table data using Knowledge Flow

  • Implement A-priori algorithm

  • Implement FP Growth algorithm

  • Implement Decision Tree learning

  • Implement linear regression technique for statistical model building

  • Implement Non-linear regression technique for statistical model building

  • Implement Logistic Regression

  • Implement classification using Multilayer perceptron

  • Implement Bagging using Random Forests

  • Implement Bayesian networks

  • Implement K-Mean algorithm

Reference Books

  • Jiawei Han and Micheline Kamber, “Data Mining: Concepts and Techniques”, Morgan Kaufmann Publishers, 3rd ed, 2010.

  • Lior Rokach and Oded Maimon, “Data Mining and Knowledge Discovery Handbook”, Springer, 2nd edition, 2010.

  • Daniel T. Larose, O.P. Wali, "Data Mining and Predictive Analytics, 2ed (An Indian Adaptation)", Wiley India

  • Box, G.E.P and Jenkins G.M. (1970) Time Series Analysis, Forecasting and Control, Holden-Day.