Syllabus - Parallel Computing (IT 803 (D))

Information Technology

Parallel Computing (IT 803 (D))

VIII

Unit I

Introduction

The need for parallelism, Forms of parallelism (SISD, SIMD, MISD, MIMD), Moore's Law and Multi-cores, Fundamentals of Parallel Computers, Communication architecture, Message passing architecture, Data parallel architecture, Dataflow architecture, Systolic architecture, Performance Issues

Unit II

Large Cache Design

Shared vs. Private Caches, Centralized vs. Distributed Shared Caches, Snooping-based cache coherence protocol, directory-based cache coherence protocol, Uniform Cache Access, Non-Uniform Cache Access, D-NUCA, S-NUCA, Inclusion, Exclusion, Difference between transaction and transactional memory, STM, HTM

Unit III

Graphics Processing Unit

GPUs as Parallel Computers, Architecture of a modern GPU, Evolution of Graphics Pipelines, GPGPUs, Scalable GPUs, Architectural characteristics of Future Systems, Implication of Technology and Architecture for users, Vector addition, Applications of GPU

Unit IV

Introduction to Parallel Programming

Strategies, Mechanism, Performance theory, Parallel Programming Patterns: Nesting pattern, Parallel Control Pattern, Parallel Data Management, Map: Scaled Vector, Mandelbrot, Collative: Reduce, Fusing Map and Reduce, Scan, Fusing Map and Scan, Data Recognition: Gather, Scatter, Pack , Stencil and Recurrence, Fork-Join, Pipeline

Unit V

Parallel Programming Languages

Distributed Memory Programming with MPI: trapezoidal rule in MPI, I/O handling, MPI derived datatype, Collective Communication, Shared Memory Programming with Pthreads: Conditional Variables, read-write locks, Cache handling, Shared memory programming with Open MP: Parallel for directives, scheduling loops, Thread Safety, CUDA: Parallel programming in CUDA C, Thread management, Constant memory and Event, Graphics Interoperability, Atomics, Streams

Course Objective

To develop an understanding of the fundamental principles and engineering trade-offs involved in designing modern parallel computers and to develop programming skills to effectively implement parallel architecture

Course Outcome

["To develop an understanding of various basic concepts associated with parallel computing environments", "Understand, appreciate and apply parallel and distributed algorithms in problem solving", "Acquire skills to measure the performance of parallel and distributed programs", "Design parallel programs to enhance machine performance in parallel hardware environment", "Design and implement parallel programs in modern environments such as CUDA, OpenMP, etc"]

Practicals

Reference Books

D. E. Culler, J. P. Singh, and A. Gupta, “Parallel Computer Architecture”, MorganKaufmann, 2004
Rajeev Balasubramonian, Norman P. Jouppi, and Naveen Muralimanohar, “Multi-Core Cache Hierarchies”, Morgan & Claypool Publishers, 2011
Peter and Pach Eco, “An Introduction to Parallel Programming”, Elsevier, 2011
James R. Larus and Ravi Rajwar, “Transactional Memory”, Morgan & Claypool Publishers, 2007
David B. Kirk, Wen-mei W. Hwu, “Programming Massively Parallel Processors: A Hands-on Approach”, 2010
Barbara Chapman, F. Desprez, Gerhard R. Joubert, Alain Lichnewsky, Frans Peters “Parallel Computing: From Multicores and GPU's to Petascale”, 2010
Michael McCool, James Reinders, Arch Robison, “Structured Parallel Programming: Patterns for Efficient Computation”, 2012
Jason Sanders, Edward Kandrot, “CUDA by Example: An Introduction to GeneralPurpose GPU Programming”, 2011