主键和加过 index 的 column 查询效率差别为什么这么大

messages 表里有百万级的数据，这样一个语句：

select * from messages
where site_id = 7
order by created_at desc
limit 1

查询一下需要120s
但是改成 order by id desc ，查询只需要 24ms 。

id是主键，site_id, created_at 都是加过index的。

想知道为什么查询时间差距会这么大？

谢谢！

desc

查询

Messages

22 条回复 • 2015-03-20 10:16:07 +08:00

dingyaguang117

2015-03-19 12:50:34 +08:00 via iPhone

因为creat at没加索引，需要内存排序

dingyaguang117

2015-03-19 12:51:23 +08:00 via iPhone

说错，没加id 和creat at的联合索引

xinyewdz

2015-03-19 13:14:40 +08:00

id应该是数字，数字排序是很快的。created_at是时间类型，数据类型比较复杂，导致排序慢。

est

2015-03-19 13:20:53 +08:00

@xinyewdz 这个。。。。。索引都是二进制的。。。。。

laoyur

2015-03-19 13:22:13 +08:00

坐等楼主实践2楼的做法后的反馈结果

moliliang

2015-03-19 13:31:36 +08:00

@xinyewdz 数据库中存储的时间是时间戳，也是数字类型吧。

jacob

2015-03-19 13:35:09 +08:00

@dingyaguang117 lz不说了加了索引吗

xinyewdz

2015-03-19 13:42:07 +08:00

@est 非常感谢指出问题。刚google了下索引的原理。问题应该是created_at这个字段，不是唯一索引，导致基数太小。“询优化器会在基数性小于记录数的30%时放弃索引”，基数被认为是索引中惟一值的数量。
贴两个地址：
索引原理： http://www.ituring.com.cn/article/986
低基数索引： http://www.ibm.com/developerworks/cn/data/library/techarticle/dm-1309cardinal/

jhdxr

2015-03-19 13:42:50 +08:00

@jacob @Yo_oY 加了索引还是联合索引是不一样的。一次查询没法同时使用多个索引的，所以还是要filesort。LZ可以贴下explain的结果看下

Yo_oY

2015-03-19 14:18:02 +08:00

@jhdxr

explain select * from messages
where site_id = 7
order by id desc
limit 1

id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE messages ref index_messages_on_site_id index_messages_on_site_id 5 const 102302 Using where

explain select * from messages
where site_id = 7
order by created_at desc
limit 1

id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE messages index index_messages_on_site_id index_messages_on_created_at 9 NULL 21 Using where

mgc

2015-03-19 14:20:57 +08:00

@jhdxr 我去，山大毕业了么

zenliver

2015-03-19 14:26:44 +08:00

(site_id, created_at), 加上这个就起作用了，楼主似乎对索引工作方式理解有误，不是加上就起作用的，用的时候，想想你的索引的btree结构，希望对你有帮助

Yo_oY

2015-03-19 14:41:35 +08:00

感谢楼上诸位。
加了个(site_id, created_at)的联合索引，查询速度也只要几十毫秒了。

不过我还是有点疑问，id 和 site_id 并没有建立联合索引，速度依然很快。
难道 order by id 和 order by created_at 会有本质区别么，id 作为主键，已经不是单纯的索引了？

zenliver

2015-03-19 14:46:31 +08:00

@Yo_oY 因为你建了site_id索引，主索引会自动加到该索引里，其实是（create_at, id)

zenliver

2015-03-19 14:47:27 +08:00

@Yo_oY 所以主索引尽量小，因为会自动加到你建立的索引里

Yo_oY

2015-03-19 14:50:50 +08:00

@zenliver 懂了，多谢！

popo233

2015-03-19 15:19:14 +08:00

@jhdxr 十年多前玩过你头像这个游戏 -。-

lincanbin

2015-03-19 19:12:57 +08:00

一条查询只能用一个索引
你可以explain看看

jhdxr

2015-03-19 22:28:12 +08:00

@popo233 最近要出新/复刻版了。。。我在期待。。。

jhdxr

2015-03-19 22:29:08 +08:00

@mgc 你是？。。。（/你怎么看出我是山大的？）

lqs

2015-03-20 01:17:14 +08:00

前者要全表扫描一遍才能知道 site_id = 7 且 created_at 最大的那一行，
后者只需按照 id 从大到小扫描到第一个 site_id = 7 的行就结束了。

你试试把后者的查询条件里的 7 改成一个不存在的值，就和前者一样慢了。

Yo_oY

2015-03-20 10:16:07 +08:00

@lqs 试了一下，改成不存在的值查询也很快，应该不是你说的这个原因。