What Are Indexes in a Database?

An index helps in rapid access of database records by storing pointers to their actual disk location.

What Are Indexes in a Database?

  • An index offers an efficient way to quickly access the records from the database files stored on the disk drive. It optimizes the database querying speed by serving as an organized lookup table with pointers to the location of the requested data.

  • The advancements in database technology allow companies to store terabytes of information efficiently in large databases. However, accessing data quickly from large databases is crucial for business success today. Organizations can rapidly retrieve the desired information from large data volumes with database indexing.

    Most databases store data in the form of a record (also known as a tuple) within different tables. Each should have a specific key or attribute for unique identification, known as the primary key. For instance, the primary key in the employee table in a database can be employee ID, which will be unique in every record. The employee ID index stores these special keys systematically, along with a pointer showing the disk location where the actual record is stored. Whenever you execute a query with a specific key value like employee ID, the database will quickly look up the index to locate the record(s) associated with the key instead of checking every record in the table.

    Since indexing optimizes query performance, most database management systems support built-in and user-defined indexes. Table columns marked with unique or primary key constraints may have implicit indexes in most databases. In contrast, user-defined indexes are helpful when most database queries contain non-primary key columns.

    Indexing also has certain drawbacks. If not done correctly, it can negatively affect the speed of update and delete operations in large databases as transactions need to maintain both tables and indexes. Additionally, index tables require extra space on the underlying physical storage structures of databases, as well as regularly scheduled preventative maintenance.

  • Indexes typically use a balanced-tree (i.e., B-tree) structure to store the data in a sorted manner. (Other index structures are possible, such as Oracle’s hash index, but are uncommon). Usage of B-tree also enhances the speed of data search and access operations in a database. The data structure associated with the index has two fields. The first field stores the values of the database column the index is sorting. The other field stores a group of pointers helping to identify the disk location of the column values.

    Outlined below are some of the key aspects of database indexes:

    Cardinality: is one of the essential aspects you should consider when creating database indexes. A table column contains both unique and non-unique values. The cardinality of an index is the total count of non-repeated values it holds. Cardinality is expressed from the high and low standpoint. In the case of high cardinality, most values in the indexed column are dissimilar. Conversely, most values in the indexed column are repetitive with low cardinality.

    For example, assume an employee table with three columns: employee ID, age, and department. The cardinality of the employee ID column with the primary key constraint will be high because every record will have a distinct value for this field. In contrast, the cardinality of the department and age columns will be low because they may contain multiple repetitive values. Creating an index on a low cardinality column is not preferable because it returns multiple records upon querying it, which increases overall query execution time and degrades the database performance.

    Selectivity: the cardinality of an index divided by the overall tuples in the index represents selectivity. For example, imagine an employee table has 100 rows, and one of its indexed columns has 50 unique values, which is also the cardinality of the column. The selectivity of the indexed column would be then 50/100 = 0.5. The selectivity of “1” is considered the best as it indicates all values in the index are unique. The selectivity of a column with the primary key constraint is always high because it contains zero similar values.

    In contrast, a column having multiple non-unique values has low selectivity. For instance, the gender column in an employee table with 10,000 records will have low selectivity because it contains repetitive values, such as male and female. The selectivity of the gender column would be 2/10000 = 0.0002.

    Since most databases focus on selectivity figures to create an ideal query execution plan(s), it is preferable to create indexes on columns with high selectivity. For example, creating an index on the employee name column would be much better than the gender column because most values in the employee name column would be distinct compared to the gender column. Any query using the values of the employee name column in the WHERE clause will return limited records compared with the gender column. It also improves the query response time since the database needs to scan limited records to find the desired data.

    Density: helps evaluate the count of repeated values in a table column. It can be derived using the following equation: 1/count of unique values in the column. It also means an index on a high-density column would return more records for any particular query since it contains more duplicate values. Therefore, indexes with high density adversely affect the query execution time. Density is also inversely proportional to the selectivity because a higher selectivity value of an index results in fewer rows in the query output.

    For example, assume an employee table with three columns: employee ID, department, and name. Many employees can belong to the same department in this table, but every employee can only have a single ID. Therefore, an index on the employee ID column would be more “selective” than the department column since it has low density or zero duplicate values compared to the department column. Query optimizers in databases, such as SQL Server, utilize density data to determine the expected number of records a column can return upon querying it.

    Density, cardinality, and selectivity metrics associated with an index are vital for database query optimizers to create an efficient query execution plan(s). These metrics allow the database to determine whether utilizing indexes is better for record retrieval from the database. For instance, if an index has a low selectivity value, it is often better to retrieve a particular record by scanning the entire table than the index. Scanning a table through an index requires more time and database resources such as server memory and disk I/O. Therefore, avoiding index usage is better if it doesn’t offer considerable performance benefits.

Featured in this Resource
Like what you see? Try out the products.
Database Performance Analyzer

Monitor and optimize multiple database management system (DBMS) platforms for cloud and on-premises environments.

Email Link To TrialFully functional for 14 days
SolarWinds SQL Sentry

SolarWinds SQL Sentry provides database performance monitoring for only the Microsoft SQL Server and platform.

Email Link To TrialFully functional for 14 days
Database Performance Monitor

Database performance monitoring and optimization for traditional, open-source, and cloud-native databases.

Start Free TrialFully functional for 14 days

View More Resources

What is Database Management System (DBMS)?

Database performance management system is designed to help admins more easily troubleshoot and resolve DBMS performance issues by monitoring performance and providing root-cause analysis of your database using multi-dimensional views to answer the who, what, when, where, and why of performance issues.

View IT Glossary

What is SSAS (SQL Server Analysis Services)?

SQL Server Analysis Services (SSAS) is a multidimensional online analytical processing (OLAP) server and an analytics engine used for data mining. It allows IT professionals to break up large volumes of data into more easily analyzed parts. A component of Microsoft SQL Server, it helps enable analysis by organizing data into easily searchable cubes.

View IT Glossary

What is MIB?

MIB is an organized, up-to-date repository of managed objects for identifying and monitoring SNMP network devices.

View IT Glossary

What is CPU usage?

CPU utilization indicates the amount of load handled by individual processor cores to run various programs on a computer.

View IT Glossary

What is a Relational Database?

A relational database allows you to easily find, scan, and sort specific information based on the relationship among the different fields defined within a table.

View IT Glossary

What is Database Concurrency?

Database concurrency is a unique characteristic enabling two or more users to retrieve information from the database at the same time without affecting data integrity.

View IT Glossary