MySQL random ordering
Ordering results randomly in MySQL seems trivial at first…
SELECT * FROM table ORDER BY RAND()
Seemlike a simple concept. It works well. Except one thing. It uses filesorts and temporary tables:
mysql> explain SELECT * FROM table ORDER BY RAND(); +----+-------------+------------+------+---------------+------+---------+------+-------+---------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+------------+------+---------------+------+---------+------+-------+---------------------------------+ | 1 | SIMPLE | table | ALL | NULL | NULL | NULL | NULL | 28288 | Using temporary; Using filesort | +----+-------------+------------+------+---------------+------+---------+------+-------+---------------------------------+
Ouch! Things get much much worse when you need to do joins or multi-column sorting against a joined table, and filesorts and temporary tables do not scale well. I have one query that has a runtime of >200 seconds despite the data not being huge.
This led me to much fruitless playing around and searching the net to find a mechanism I could use to do this faster. There was one hint I found that looked promising, but in the end, It didn’t help me do what I wanted.
I even tried my hand at generating a source of random numbers in a derived table against which I could select a subset appropriate for the desired page. The problems I ran into with that were that I couldn’t generate rows in the absence of a row-source to join against, and that rand(n) (where n is a seed) doesn’t guarantee unique results, so I had to use Distinct, which always produced a result set smaller than the desired number of entries. This is further compounded by the fact that using limit n,m operations always returns the same sequence of random numbers (unlike limiting on a sort by rand()).
I’ll likely have to implement a partitioning based scheme, where objects are assiigned to a particular bucket as they’re entered into the database, and then use the buckets to select a random subset for each different user. I haven’t worked out the details of the mechanics of this yet, but I’ll post here later.
Oh, yeah, I need to paginate.
Trackbacks
Use the following link to trackback from your own site:
http://blog.royhooper.ca/trackbacks?article_id=35
