Syllabus - DATA MINING AND ANALYTICS (CB-606 (C))
Computer Science and Business System (CSBS)
DATA MINING AND ANALYTICS (CB-606 (C))
VI
UNIT – I
Introduction to Data Mining
What is data mining? Related technologies - Machine Learning, DBMS, OLAP, Statistics, Stages of the Data Mining Process, Data Mining Techniques, Knowledge Representation Methods, Applications
UNIT – II
Data preprocessing
Data cleaning, Data transformation, Data reduction, Discretization and generating concept hierarchies, Installing Weka 3 Data Mining System, Experiments with Weka - filters, discretization
Data mining knowledge representation
Task relevant data, Background knowledge, Representing input data and output knowledge, Visualization techniques
Attribute-oriented analysis
Attribute generalization, Attribute comparison, Statistical measures relevance, Class
UNIT – III
Data mining algorithms - Association rules
Motivation and terminology, Example: mining weather data, Basic idea: item sets, Generating item sets and rules efficiently, Correlation analysis
Data mining algorithms - Classification
Basic rudimentary rules: 1R, algorithm, Decision trees, covering rules learning/mining tasks, Inferring
Data mining algorithms - Prediction
The prediction task, Statistical (Bayesian) classification, Bayesian networks, Instance-based methods (nearest neighbor), linear models
UNIT – IV
Descriptive analytics
Data Modeling, Trend Analysis, Simple Linear Regression Analysis
Forecasting models
Heuristic methods, predictive modeling and pattern discovery, Logistic Regression: Logit transform, ML estimation, Tests of hypotheses, Wald test, LR test, score test, test for overall regression, multiple logistic regression, forward, backward method, interpretation of parameters, relation with categorical data analysis. Interpreting Regression Models, Implementing Predictive Models.
Generalized Linear model
link functions such as Poisson, binomial, inverse binomial, inverse Gaussian, Gamma.
Non Linear Regression (NLS)
Linearization transforms, their uses & limitations, examination of non-linearity, initial estimates, iterative procedures for NLS, grid search, Newton-Raphson, steepest descent, Marquardt’s methods. Introduction to Semiparametric regression models, additive regression models. Introduction to nonparametric regression methods.
UNIT – V
Time Series Analysis
Auto - Covariance, Auto-correlation and their properties. Exploratory time series analysis, Test for trend and seasonality, Exponential and moving average smoothing, Holt – Winter smoothing, forecasting based on smoothing.
Linear time series models
Autoregressive, Moving Average, Autoregressive Moving Average and Autoregressive Integrated Moving Average models; Estimation of ARMA models such as Yule-Walker estimation for AR Processes, Maximum likelihood and least squares estimation for ARMA Processes, Forecasting using ARIMA models.
Prescriptive Analytics
Mathematical optimization, Networks modeling-Multi-objective optimization-Stochastic modeling, Decision and Risk analysis, Decision trees.
Practicals
- Installing Weka and exploring a dataset
- Create a Weather Table with the help of WEKA tool
- Apply Pre-Processing techniques to the training data set of Weather Table
- Normalize Weather Table data using Knowledge Flow
- Implement A-priori algorithm
- Implement FP Growth algorithm
- Implement Decision Tree learning
- Implement linear regression technique for statistical model building
- Implement Non-linear regression technique for statistical model building
- Implement Logistic Regression
- Implement classification using Multilayer perceptron
- Implement Bagging using Random Forests
- Implement Bayesian networks
- Implement K-Mean algorithm
Reference Books
-
Jiawei Han and Micheline Kamber, “Data Mining: Concepts and Techniques”, Morgan Kaufmann Publishers, 3rd ed, 2010.
-
Lior Rokach and Oded Maimon, “Data Mining and Knowledge Discovery Handbook”, Springer, 2nd edition, 2010.
-
Daniel T. Larose, O.P. Wali, "Data Mining and Predictive Analytics, 2ed (An Indian Adaptation)", Wiley India
-
Box, G.E.P and Jenkins G.M. (1970) Time Series Analysis, Forecasting and Control, Holden-Day.