(a) What is hashing function? Explain any four hashing functions with examples. (b) Write notes on the following: (i) Overflow handling in hashing (ii) Static tree tables


Q.) (a) What is hashing function? Explain any four hashing functions with examples. (b) Write notes on the following: (i) Overflow handling in hashing (ii) Static tree tables

Subject: Data Structure and Algorithm

(a) What is a Hashing Function?

A hashing function is an algorithm that takes an input (or 'message') and returns a fixed-size string of bytes. The output is typically a 'hash code' or 'hash value'. Hashing functions are designed to be a one-way function, which means it should be computationally infeasible to reverse the operation or find two different inputs that produce the same hash value (a condition known as a collision). Hashing functions are widely used in computer science for various applications such as data retrieval, cryptographic security, and data integrity verification.

Four Hashing Functions with Examples:

  1. MD5 (Message Digest Algorithm 5):

    • Purpose: Originally designed for cryptographic purposes but now considered broken and unsuitable for further use because it's vulnerable to hash collisions.
    • Output Size: 128-bit hash value.
    • Example: plaintext Input: "Hello, World!" MD5 Hash: "65a8e27d8879283831b664bd8b7f0ad4"
  2. SHA-1 (Secure Hash Algorithm 1):

    • Purpose: Designed by the National Security Agency (NSA) to be part of the Digital Signature Algorithm. Like MD5, SHA-1 is also no longer considered secure against well-funded attackers.
    • Output Size: 160-bit hash value.
    • Example: plaintext Input: "Hello, World!" SHA-1 Hash: "2ef7bde608ce5404e97d5f042f95f89f1c232871"
  3. SHA-256 (Secure Hash Algorithm 256-bit):

    • Purpose: Part of the SHA-2 family, designed by the NSA, and widely used for secure applications including in blockchain and SSL certificates.
    • Output Size: 256-bit hash value.
    • Example: plaintext Input: "Hello, World!" SHA-256 Hash: "a591a6d40bf420404a011733cfb7b190d62c65bf0bcda32b57b277d9ad9f146e"
  4. CRC32 (Cyclic Redundancy Check):

    • Purpose: A checksum hashing function used to detect accidental changes to raw data in digital networks and storage devices.
    • Output Size: 32-bit hash value.
    • Example: plaintext Input: "Hello, World!" CRC32 Hash: "1c291ca3"

(b) Notes on the following:

(i) Overflow Handling in Hashing

Overflow occurs in hashing when a hash function generates the same index for more than one key, which is known as a collision. Handling collisions is critical to maintain the efficiency of a hash table. There are several methods to handle overflow:

  1. Chaining:

    • Each slot of the hash table contains a linked list of entries that hash to the same index.
    • When a collision occurs, the item is added to the list at that index.
  2. Open Addressing:

    • All elements are stored in the hash table itself.
    • When a collision occurs, the table is probed sequentially or by another hashing function to find an empty slot.
  3. Double Hashing:

    • A form of open addressing with a secondary hash function.
    • If a collision occurs, the secondary hash function is used to calculate the interval between probes.
  4. Rehashing:

    • The hash table is resized and all existing items are rehashed and inserted into the new table.
    • This is often done when the load factor of the table exceeds a certain threshold.

(ii) Static Tree Tables

Static tree tables refer to a data structure that uses a tree-like model to organize static data in a hierarchical manner. The term 'static' implies that the structure of the tree and the data it contains do not change over time. Binary search trees, AVL trees, and B-trees are examples of static tree tables.

Characteristics of Static Tree Tables:

  • Ordered: The tree maintains data in a sorted order, allowing for efficient search operations.
  • Balanced: Some static trees (like AVL trees) maintain balance to ensure that operations such as insertions, deletions, and searches can be performed in logarithmic time.
  • Immutable: Once constructed, the structure does not change, which simplifies the design and can improve performance for read-only access patterns.

Applications of Static Tree Tables:

  • Database Indexing: Used to index records for efficient retrieval.
  • File Systems: The directory structure can be represented as a static tree.
  • Lookup Tables: Used in applications where fast lookup of static data is required.

Static tree tables are particularly useful when the dataset does not change, or changes infrequently, and when there is a need for fast search and retrieval operations.