The appearance of indexes is actually to improve the efficiency of data query, just like the index of a book.
Common Models of Indexes
Hash Tables
A hash table is a structure for storing data in a key-value form. We only need to input the key to be searched, and we can find its corresponding value.
Hash tables are suitable for scenarios where only equal value queries are required.
Ordered Arrays
Ordered arrays perform very well in equal value query and range query scenarios. Just looking at the query efficiency, the ordered array is the best data structure. However, it becomes troublesome when data needs to be updated. If you need to insert a record in the middle, all the records behind it must be moved, which is too costly. Therefore, ordered array indexes are only suitable for static storage engines, such as saving all the population information of a city in 2017, which is data that will not be modified.
Search Trees
In InnoDB, tables are stored in the form of indexes according to the primary key order. This type of table is known as an index-organized table. Also, because we mentioned earlier, InnoDB uses the B+ tree index model, so all data is stored in the B+ tree.
The types of indexes are divided into primary key indexes and non-primary key indexes. The leaf nodes of the primary key index store the entire row of data. In InnoDB, the primary key index is also known as the clustered index.
The content of the leaf nodes of the non-primary key index is the value of the primary key. In InnoDB, non-primary key indexes are also known as secondary indexes.
Differences between queries based on primary key indexes and ordinary indexes
- If the statement is
select * from T where ID=500
, i.e., the primary key query method, then you only need to search the ID B+ tree.
- If the statement is
select * from T where k=5
, i.e., the normal index query method, you need to first search the k index tree, get the ID value of 500, and then search the ID index tree once. This process is called back to the table (or back to the root).
A query based on a non-primary key index needs to scan an additional index tree. Therefore, in applications, primary key queries should be used as much as possible.
Index Maintenance
B+ tree, in order to maintain the order of the index, must carry out necessary maintenance when inserting new values. When a data page is already full, according to the B+ tree algorithm, a new data page needs to be applied for, and part of the data is moved over. This process is called page splitting. In this case, performance is naturally affected.
In addition to performance, the page split operation also affects the utilization of data pages. The data that was originally in one page is now divided into two pages, and the overall space utilization rate is reduced by about 50%.
After deleting data from two adjacent pages, when the utilization rate is very low, the data pages will be merged. The merge process can be considered as the reverse process of splitting.
In the data insertion mode of auto-increment primary keys, each insertion of a new record is an append operation, which does not involve moving other records and will not trigger leaf node splitting.
The shorter the primary key, the smaller the leaf nodes of the normal index, and the less space the normal index occupies.
From the perspective of performance and storage space, the auto-increment primary key is often a more reasonable choice.
In queries, the index k has already "covered" our query needs, and we call it a covering index. Because the covering index can reduce the number of tree searches and significantly improve query performance, using the covering index is a commonly used performance optimization technique.
The B+ tree index structure can use the "leftmost prefix" of the index to locate records. It finds the first record that meets the conditions, then traverses backwards until the conditions are not met.
When creating a composite index, how to arrange the order of fields within the index
The evaluation criterion is the reusability of the index. Because the leftmost prefix is supported, when the composite index (a,b) is already there, we generally don't need to build an index on a separately. Therefore, the first principle is that if by adjusting the order, the maintenance of an index can be reduced, then this order is often the first to be considered for adoption.
Index Pushdown
MySQL 5.6 introduced index condition pushdown optimization. It allows the system to make judgments on the fields included in the index during the index traversal process, directly filter out records that do not meet the conditions, and reduce the number of back-to-root operations.