Syllabus - Introduction to Toolkits for Data Science (CD503 (C))


CSE-Data Science/Data Science

Introduction to Toolkits for Data Science (CD503 (C))

V semester

Unit 1

Python for Data Science

Review of Numpy, Pandas and Scikit-learn.Supervised Learning Techniques packages/toolkit for regression and classification: - Decision Trees, Naive Bayes, Classification, Support vector machines, Random Forest, Neural network, Ensemble Methods, Ordinary Least Squares Regression, Logistic Regression, etc. Unsupervised Learning, Clustering: k-means, adaptive hierarchical clustering, Gaussian mixture, Optimization Using Evolutionary Techniques etc.

Unit 2

R for Data Science

Basic of R and RStudio. R data structures: vectors, factors, lists, arrays, matrices, and data frames. Working with data: Import data into R and visualize data. Data Analytics Software: Weka, Orange, Rapidminer, Minitab, PowerBI, GitHub, Google Colab.

Unit 3

Introduction to Deep Learning

Basics of TensorFlow and keras,Basics of PyTorch, perform style transfer of one image to another, Perform text generation, and sentiment analysis with PyTorch. Neural networks that recognize objects, improve the accuracy of object recognition using CNN, use pre-trained models to build state-of-the-art classifiers, Saving and Loading models, Time series forecasting with RNNs, and LSTMs,

Unit 4

Introduction to Time Series Analysis

Time series regression and exploratory data analysis toolkits: ARMA/ARIMA models, model identification/estimation/linear operators, Fourier analysis, spectral estimation, and state-space models.

Unit 5

Cloud Computing for Data Science

Implementation of Machine Learning and Deep learning through AWS/Azure platform. Version controlling tools for data science projects. Case studies of data science projects.

Practicals

Reference Books

  • Brockwell& Davis (2016) Introduction to Time Series and Forecasting, 3rd edition, Springer

  • Cryer& Chan (2008) Time-Series Analysis with Applications in R, Springer

  • Prado & West (2010) Time Series: Modeling, Computation, and Inference Chapman & Hall

  • Petris, Petrone, Campagnoli (2009) Dynamic Linear Models with R, Springer

  • Ruppert& Matteson (2016) Statistics and Data Analysis for Financial Engineering with R examples, 2nd Edition, Springer

  • R for Data Science: Import, Tidy, Transform, Visualize, and Model Data, 1st Edition, O’reilly publication.