前缀索引

全栈程序员站长

发布于 2022-08-31 21:37:53

7410

发布于 2022-08-31 21:37:53

大家好，又见面了，我是你们的朋友全栈君。

当索引是很长的字符序列时，这个索引将会很占内存，而且会很慢，这时候就会用到前缀索引了。所谓的前缀索引就是去索引的前面几个字母作为索引，但是要降低索引的重复率，索引我们还必须要判断前缀索引的重复率。先看这样一张表：

mysql> select * from test;
+----------+-------+
| name     | score |
+----------+-------+
| zhangsan | 123   |
| wangwu   | 345   |
| zhaoliu  | 234   |
| lisisi   | 687   |
+----------+-------+
4 rows in set (0.08 sec)

如果以name作为索引，当name对应的字符串很长时，就要考虑索引的占用空间和效率问题。这时候就需要引入前缀索引，在使用前缀索引时，首先要去比较重复率。

mysql> select 1.0*count(distinct name)/count(*) from test;
+-----------------------------------+
| 1.0*count(distinct name)/count(*) |
+-----------------------------------+
|                           1.00000 |
+-----------------------------------+
1 row in set (0.00 sec)

mysql> select 1.0*count(distinct left(name,2))/count(*) from test;
+-------------------------------------------+
| 1.0*count(distinct left(name,2))/count(*) |
+-------------------------------------------+
|                                   0.75000 |
+-------------------------------------------+
1 row in set (0.00 sec)

mysql> select 1.0*count(distinct left(name,1))/count(*) from test;
+-------------------------------------------+
| 1.0*count(distinct left(name,1))/count(*) |
+-------------------------------------------+
|                                   0.75000 |
+-------------------------------------------+
1 row in set (0.00 sec)

mysql> select 1.0*count(distinct left(name,3))/count(*) from test;
+-------------------------------------------+
| 1.0*count(distinct left(name,3))/count(*) |
+-------------------------------------------+
|                                   0.75000 |
+-------------------------------------------+
1 row in set (0.00 sec)

mysql> select 1.0*count(distinct left(name,4))/count(*) from test;
+-------------------------------------------+
| 1.0*count(distinct left(name,4))/count(*) |
+-------------------------------------------+
|                                   1.00000 |
+-------------------------------------------+
1 row in set (0.00 sec)

mysql> select 1.0*count(distinct left(name,2))/count(*) from test;
+-------------------------------------------+
| 1.0*count(distinct left(name,2))/count(*) |
+-------------------------------------------+
|                                   0.75000 |
+-------------------------------------------+
1 row in set (0.00 sec)

mysql> select 1.0*count(distinct left(name,5))/count(*) from test;
+-------------------------------------------+
| 1.0*count(distinct left(name,5))/count(*) |
+-------------------------------------------+
|                                   1.00000 |
+-------------------------------------------+
1 row in set (0.00 sec)

其中left函数为字符串截取函数。

select 1.0*count(distinct name)/count(*) from test这是比较整个name的重复率，当时这是最好的情况。然后分别截取name字符的前几个字母，最后选取的计算值要接近整个取整个name时得出的计算值，然后再选中占用空间小的。由上面执行的结果可知应选中name的前4个字母作为索引最为适合。

创建索引：

mysql> alter table test add key(name(4));
Query OK, 4 rows affected (0.15 sec)
Records: 4  Duplicates: 0  Warnings: 0

随后就可以正常按name字符进行查找了。

发布者：全栈程序员栈长，转载请注明出处：https://javaforall.cn/142153.html原文链接：https://javaforall.cn

本文参与?腾讯云自媒体分享计划，分享自作者个人站点/博客。

原始发表：2022年5月2，如有侵权请联系 cloudcommunity@tencent.com 删除

java

https

网络安全

编程算法

本文分享自作者个人站点/博客?前往查看

如有侵权，请联系 cloudcommunity@tencent.com 删除。

本文参与?腾讯云自媒体分享计划? ，欢迎热爱写作的你一起参与！

登录后参与评论

0 条评论

热度

前缀索引

前缀索引

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐