Database Indexing and Hashing Techniques
Indexing and Hashing in Databases
Indexing
Indexing is a data structure technique used to efficiently retrieve records from database files based on certain attributes. Indexing has been implemented on these attributes.
Types of Indexing
- Primary Index: Defined on an ordered data file, where the data file is ordered on a key field (generally the primary key of the relation).
- Secondary Index: May be generated from a field that is a candidate key and has a unique value in every record, or a non-key with duplicate values.
- Clustering Index: Defined on an ordered data file, where the data file is ordered on a non-key field.
Ordering Index Types
Dense Index
In a dense index, there is an index record for every search key value in the database. This makes searching faster but requires more space to store index records. Index records contain the search key value and a pointer to the actual record on the disk.
Sparse Index
In a sparse index, index records are not created for every search key. An index record here contains a search key and an actual pointer to the data on the disk. To search a record, we first proceed by the index record and reach the actual location of the data. If the data we are looking for is not where we directly reach by following the index, then the system starts a sequential search until the desired data is found.
B+ Tree
A B+ tree is a balanced binary search tree that follows a multi-level index format. The leaf nodes of a B+ tree denote actual data pointers. The B+ tree ensures that all leaf nodes remain at the same height, thus balanced. The leaf nodes are linked using a link list; therefore, a B+ tree can support random access as well as sequential access.
Hashing
Static Hashing
In static hashing, when a search-key value is provided, the hash function always computes the same address. For example, if a mod-4 hash function is used, then it shall generate only 5 values. The output address shall always be the same for that function. The number of buckets provided remains unchanged at all times.
- Insertion: When a record is required to be entered using static hashing, the hash function h computes the bucket address for the search key K, where the record will be stored. Bucket address = h(K)
- Search: When a record needs to be retrieved, the same hash function can be used to retrieve the address of the bucket where the data is stored.
- Delete: This is simply a search followed by a deletion operation.
Dynamic Hashing
The problem with static hashing is that it does not expand or shrink dynamically as the size of the database grows or shrinks. Dynamic hashing provides a mechanism in which data buckets are added and removed dynamically and on-demand. Dynamic hashing is also known as extended hashing.
- Querying: Look at the depth value of the hash index and use those bits to compute the bucket address.
- Update: Perform a query as above and update the data.
- Deletion: Perform a query to locate the desired data and delete it.
- Insertion: Compute the address of the bucket. If the bucket is already full:
- Add more buckets.
- Add additional bits to the hash value.
- Re-compute the hash function.
Difference Between Indexing and Hashing
Hashing is an effective technique for calculating the direct location of a data record on the disk without using an index structure. Indexing uses data references that hold the address of the disk block with the value corresponding to the key, while hashing uses mathematical functions, called hash functions, to calculate the direct locations of data records on the disk.