What do you mean by Hashing and Indexing?


Q.) What do you mean by Hashing and Indexing?

Subject: Data Structures

Introduction

Definition of Hashing

Hashing is a technique used in data structures to quickly locate and retrieve data from a database. It involves the use of a special function, known as a hash function, to map data of arbitrary size to fixed-size values. The output of the hash function, known as a hash code or simply hash, serves as an index in an array where the corresponding data is stored.

Definition of Indexing

Indexing, on the other hand, is a data structure technique used to quickly locate and access the data in a database. Indexes are used in databases to find data rows with specific column values quickly. Without an index, the database server must begin with the first data block and then read through the entire table to find the relevant rows.

Hashing

Explanation of Hashing

In hashing, a hash function is used to convert a given key into an array index. The data associated with the key is then stored at that index in the array, allowing for quick access in the future. The purpose of hashing in data structures is to allow for the fast retrieval of data. Instead of searching through an array element by element, the hash function provides a 'shortcut' to the location of the data.

Hash Function

A hash function is a function that takes a key as input and returns an array index as output. A good hash function has the following properties:

  1. It is deterministic, meaning the same input will always produce the same output.
  2. It returns values that are evenly distributed across the array, to minimize the likelihood of collisions (where two keys hash to the same index).

Collision Resolution

A collision in hashing occurs when two keys produce the same hash, i.e., they map to the same index in the array. There are several methods to resolve collisions, including:

  1. Open Addressing: In this method, when a collision occurs, we look for another open slot in the array to store the value.
  2. Chaining: In this method, each array index points to a linked list of keys that hash to the same index.

Example of Hashing

Let's consider a simple example of hashing. Suppose we have a hash function that maps a key to an index by taking the remainder when the key is divided by 10 (i.e., the hash function is h(k) = k mod 10). If our keys are the numbers 12, 44, and 33, they would hash to the indices 2, 4, and 3, respectively.

Advantages and Disadvantages of Hashing

Advantages of hashing include:

  1. Fast access time: With a good hash function, data can be retrieved in constant time, i.e., O(1).
  2. Efficient use of space: Data is stored compactly, with minimal wasted space.

Disadvantages of hashing include:

  1. Collisions: If many keys hash to the same index, the time to retrieve data can degrade to O(n), where n is the number of keys hashing to the same index.
  2. Dependency on the hash function: The performance of a hash table is heavily dependent on the quality of the hash function. A poor hash function can result in many collisions and poor performance.

Indexing

Explanation of Indexing

In indexing, an index is created on a column of a database table. The index provides a quick way to look up rows in the table based on the values in that column. The purpose of indexing in data structures is to speed up data retrieval.

Types of Indexing

There are several types of indexing, including:

  1. Primary Indexing: This is the most basic type of indexing, where the index is created on the primary key of the table.
  2. Secondary Indexing: This type of indexing is used when we need to retrieve data without using the primary key.
  3. Clustering Indexing: This type of indexing is used when data needs to be stored physically on the disk.

Example of Indexing

Consider a database table with columns for 'Name', 'Age', and 'Address'. If we frequently need to retrieve rows based on the 'Age' column, we could create an index on this column. The index would be a separate data structure (like a B-tree) that would have entries for each unique age, and each entry would point to the rows in the table where that age appears.

Advantages and Disadvantages of Indexing

Advantages of indexing include:

  1. Faster data retrieval: Indexing can significantly speed up data retrieval, especially for large tables.
  2. Efficient queries: Queries that use the indexed columns can be more efficient.

Disadvantages of indexing include:

  1. Increased storage requirements: Indexes take up space, which can be a concern for large databases.
  2. Slower writes: When data is inserted or updated, the index also needs to be updated, which can slow down write operations.

Comparison of Hashing and Indexing

Hashing Indexing
Purpose Fast retrieval of data by mapping keys to array indices Fast retrieval of data by creating an index on a database column
Method Use of a hash function Use of an index data structure
Best Case Time Complexity O(1) Depends on the type of index, but can be O(log n) for balanced tree indexes
Worst Case Time Complexity O(n) (when all keys hash to the same index) O(n) (when the index does not help to narrow down the search)
Space Complexity O(n) O(n)

In general, hashing is used when the keys are simple and can be easily hashed to an array index. Indexing is used when the data is stored in a database and queries need to be performed on specific columns.

Conclusion

Both hashing and indexing are important concepts in data structures that help in the efficient retrieval of data. Hashing uses a hash function to map keys to array indices, while indexing creates an index on a database column to speed up queries. Both techniques have their advantages and disadvantages, and the choice between them depends on the specific requirements of the data structure or application.

Summary

Hashing is a technique used in data structures to quickly locate and retrieve data from a database. It involves the use of a special function, known as a hash function, to map data of arbitrary size to fixed-size values. Indexing, on the other hand, is a data structure technique used to quickly locate and access the data in a database. Indexes are used in databases to find data rows with specific column values quickly.

Analogy

Hashing is like using a map to find a specific location. The map provides a shortcut to the location without having to search through every street. Indexing is like using an index in a book to quickly find a specific topic. The index lists the page numbers where the topic can be found, saving time and effort.

Quizzes
Flashcards
Viva Question and Answers

Quizzes

What is hashing?
  • A technique used to quickly locate and retrieve data from a database
  • A data structure technique used to quickly locate and access data in a database
  • A function that maps data to fixed-size values
  • A method to resolve collisions in hashing