Question 48: Why is query B faster than query A?

23. May 2010

See the scenario below.
It is quite typical in data warehouses, to access data via dimension tables.
This is an example for a date dimension.


-- row generator
create table rgen
  select 1 as n union select 2 union select 3 union select 4 union
  select 5 union select 6 union select 7 union select 8 union
  select 9 union select 10;

-- fact table: 1 million rows
select @a:=0;
create table r1tst
  select
  @a:=@a + 1 as rkey,
  floor(@a/10000) as rbatchid,
  floor(rand()*10) + 1 as rtype,
  adddate('2010-01-01', interval (@a/4000) day) as rdate,
  (rand()*1000000/100) as rsum
  from rgen a,rgen b,rgen c,rgen d,rgen e,rgen f;
alter table r1tst add index(rbatchid),
  add index(rdate);

-- date dimension
select @d:=-1;
create table r_date_dim select
   @d:=@d+1 as date_key,
   adddate('2010-01-01', interval @d day) as date_dt,
   DAY(adddate('2010-01-01', interval @d day)) as date_day,
   DAYNAME(adddate('2010-01-01', interval @d day)) as date_day_name,
   DAYOFMONTH(adddate('2010-01-01', interval @d day)) as date_day_of_month,
   DAYOFWEEK(adddate('2010-01-01', interval @d day)) as date_day_of_week,
   DAYOFYEAR(adddate('2010-01-01', interval @d day)) as date_day_of_year,
   LAST_DAY(adddate('2010-01-01', interval @d day)) as date_last_day,
   MONTH(adddate('2010-01-01', interval @d day)) as date_month,
   MONTHNAME(adddate('2010-01-01', interval @d day)) as date_month_name,
   QUARTER(adddate('2010-01-01', interval @d day)) as date_quarter,
   TO_DAYS(adddate('2010-01-01', interval @d day)) as date_days,
   WEEK(adddate('2010-01-01', interval @d day)) as date_week,
   WEEKDAY(adddate('2010-01-01', interval @d day)) as date_week_day,
   WEEKOFYEAR(adddate('2010-01-01', interval @d day)) as date_week_of_year,
   YEAR(adddate('2010-01-01', interval @d day)) as date_year,
   YEARWEEK(adddate('2010-01-01', interval @d day)) as year_week
from rgen a,rgen b, rgen c;
alter table r_date_dim add index (date_year),
   add index (date_month), add index(date_dt),
   add index (date_key);

-- query A:
mysql> select sum(rsum) from r1tst v1,r_date_dim v2
    -> where v1.rdate=v2.date_dt and v2.date_year=2010 and v2.date_quarter=2;
+-----------------+
| sum(rsum)       |
+-----------------+
| 1819518419.8031 |
+-----------------+
1 row in set (37.64 sec)

mysql> explain select sum(rsum) from r1tst v1,r_date_dim v2
    -> where v1.rdate=v2.date_dt and v2.date_year=2010 and v2.date_quarter=2;
+--+------+---+----+-----------------+-------+----+--------+-------+-----------+
|id|select|tab|type|possible_keys    |key    | len| ref    |rows   |Extra      |
+--+------+---+----+-----------------+-------+----+--------+-------+-----------+
| 1|SIMPLE|v1 |ALL |rdate            |NULL   |NULL|NULL    |1000000|           |
| 1|SIMPLE|v2 |ref |date_year,date_dt|date_dt|9   |v1.rdate|      1|Using where|
+--+------+---+----+-----------------+-------+----+--------+-------+-----------+
2 rows in set (0.03 sec)

-- query B:
mysql> select sum(rsum) from r1tst v1,r_date_dim v2
    -> where v1.rdate=date(v2.date_dt) and v2.date_year=2010 and v2.date_quarter=2;
+-----------------+
| sum(rsum)       |
+-----------------+
| 1819518419.8031 |
+-----------------+
1 row in set (5.41 sec)

mysql> explain select sum(rsum) from r1tst v1,r_date_dim v2
    -> where v1.rdate=date(v2.date_dt) and v2.date_year=2010 and v2.date_quarter=2;
+--+------+---+----+---------+---------+----+-----+----+-----------+
|id|select|tab|type|possible |key      | len| ref |rows|Extra      |
+--+------+---+----+---------+---------+----+-----+----+-----------+
| 1|SIMPLE| v2|ref |date_year|date_year|  5 |const| 361|Using where|
| 1|SIMPLE| v1|ref |rdate    |rdate    |  9 | func|3984|Using where|
+--+------+---+----+---------+---------+----+-----+----+-----------+
2 rows in set (0.00 sec)

In theory it should be a really bad idea having a function around the the join criteria, as it prevents index usage.

Why is query B faster than query A?
(Find all correct answers)

a) There is no good index on the date-dimension, so the optimizer thinks A is the best plan.
b) In B, the date() in the join forces a certain join order.
c) Restricting on the fact table directly is more efficient than going via date dimension.

[ Basic Optimizations (10%) - Overview of Optimization Principles ]

Tags: learning, mysql, question

Question 47: Which indexes will help with the performance of the following query?

22. May 2010

By plogi in mysql questions

1 Comment

Have a look at the following tables:


create table ggen
  select 1 as n union select 2 union select 3 union select 4 union
  select 5 union select 6 union select 7 union select 8 union
  select 9 union select 10;

create table g1tst
  select floor(rand()*10) + 1 as gtype,
  adddate('2010-01-01', interval rand()*360 day) as gdate,
  (rand()*1000000/100) as gsum
  from ggen a,ggen b,ggen c,ggen d,ggen e;

Which indexes will help with the performance of the following query?
(Find all correct answers)


select count(*)
  from g1tst v1,g1tst v2
  where v1.gdate=v2.gdate and v1.gtype=1 and v2.gtype=2;

+----------+
| count(*) |
+----------+
|   278397 |
+----------+
1 row in set (34.37 sec)

a) alter table g1tst add index (gdate,gtype);
b) alter table g1tst add index (gtype,gdate);
c) alter table g1tst add index (gtype);
d) alter table g1tst add index (gsum);
e) alter table g1tst add index (gdate);

[ Basic Optimizations (10%) - Using Indexes for Optimization ]

Tags: learning, mysql, question

Question 46: What does “SELECT 1;” actually do?

21. May 2010

By plogi in mysql questions

1 Comment

In many of the questions selects without tables are used.
But what does “SELECT 1;” for example actually do?

(Pick 1 correct answer; or if you like a challenge try finding all)

a) It results in a syntax error of course!
b) With Oracle you write it like: “select 1 from dual;”
c) In DB2 you get the same result with “values 1;”.
d) The result will be 1 row with 1 column containing “1″.

[ Just for fun ]

Tags: learning, mysql, question

Question 45: Which of the following plans belongs to the query?

20. May 2010

By plogi in mysql questions

1 Comment

See the following situation:

create table fgen
  select 1 as n union select 2 union select 3 union
  select 4 union select 5 union select 6 union
  select 7 union select 8 union select 9 union select 10;

create table f1tst
  select floor(rand()*10) + 1 as ftype,
  adddate('2010-01-01',
  interval rand()*360 day) as fdate,
  (rand()*1000000/100) as fsum
  from fgen a,fgen b,fgen c,fgen d,fgen e;

alter table f1tst add index (fdate);

explain select min(fdate),max(fdate),sum(fsum)
  from f1tst v1 where subdate(fdate,interval 10 day) >= now();

Which of the following plans belongs to the query?

a)
+--+-----------+---+-----+--------+-----+-------+----+----+-----------+
|id|select_type|tab| type|possible| key |key_len| ref|rows|   Extra   |
+--+-----------+---+-----+--------+-----+-------+----+----+-----------+
| 1| SIMPLE    | v1|range| fdate  |fdate|     9 |NULL|2663|Using where|
+--+-----------+---+-----+--------+-----+-------+----+----+-----------+

b)
+--+-----------+---+----+--------+----+-------+----+------+-----------+
|id|select_type|tab|type|possible| key|key_len| ref| rows |   Extra   |
+--+-----------+---+----+--------+----+-------+----+------+-----------+
| 1| SIMPLE    | v1| ALL| fdate  |NULL|  NULL |NULL|100000|Using where|
+--+-----------+---+----+--------+----+-------+----+------+-----------+

c)
+--+-----------+---+----+--------+----+-------+----+------+-----------+
|id|select_type|tab|type|possible| key|key_len| ref| rows |   Extra   |
+--+-----------+---+----+--------+----+-------+----+------+-----------+
| 1| SIMPLE    | v1| ALL| NULL   |NULL|  NULL |NULL|100000|Using where|
+--+-----------+---+----+--------+----+-------+----+------+-----------+

d)
+--+-----------+----+----+--------+----+-------+----+----+----------------+
|id|select_type| tab|type|possible| key|key_len| ref|rows|      Extra     |
+--+-----------+----+----+--------+----+-------+----+----+----------------+
| 1| SIMPLE    |NULL|NULL| NULL   |NULL|  NULL |NULL|NULL|Impossible WHERE|
+--+-----------+----+----+--------+----+-------+----+----+----------------+

[ Basic Optimizations (10%) - Overview of Optimization Principles ]

Tags: learning, mysql, question

Question 44: Which query belongs to the execution plan?

19. May 2010

By plogi in mysql questions

1 Comment

Check the scenario below:

create table egen
  select 1 as n union select 2 union select 3 union
  select 4 union select 5 union select 6 union
  select 7 union select 8 union select 9 union select 10;

create table e1tst
  select floor(rand()*10) + 1 as etype,
  adddate('2010-01-01', interval rand()*360 day) as edate,
  (rand()*1000000/100) as esum
  from egen a,egen b,egen c,egen d,egen e;

create table e2tst select n etype,uuid() as ename from egen a;

alter table e1tst add index (edate,etype);
alter table e2tst add index (etype);

+--+------------------+---+-----+--------+-----+-------+----+----+-----------+
|id|select_type       |tab|type |possible|key  |key_len|ref |rows|Extra      |
+--+------------------+---+-----+--------+-----+-------+----+----+-----------+
| 1|PRIMARY           | v1|range| edate  |edate|  9    |NULL|2426|Using where|
| 2|DEPENDENT SUBQUERY| v2|ref  | etype  |etype|  8    |func|   2|Using where|
+--+------------------+---+-----+--------+-----+-------+----+----+-----------+
2 rows in set (0.00 sec)

Which query belongs to the execution plan?

a) select edate,ename,sum(esum) from e1tst v1
    inner join e2tst v2 on (v1.etype=v2.etype)
    where edate between '2010-01-01' and '2010-01-10'
    group by edate,ename;

b) select edate,
    (select v2.ename from e2tst v2 where v1.etype=v2.etype)
    as ename,sum(esum)
    from e1tst v1
    where edate between '2010-01-01' and '2010-01-10'
    group by edate,etype;

c) select s1.edate,s2.ename,s1.sumesum
    from (select edate,etype,sum(esum) as sumesum
            from e1tst v1
            where edate between '2010-01-01' and '2010-01-10'
            group by edate,etype) s1
    inner join e2tst s2 on (s1.etype=s2.etype);

[ Basic Optimizations (10%) - Using Indexes for Optimization ]

Tags: learning, mysql, question

Question 43: Which indexes would you create to speed up the query below?

18. May 2010

By plogi in mysql questions

1 Comment

See the following scenario:

create table jgen
  select 1 as n union select 2 union select 3 union select 4 union
  select 5 union select 6 union select 7 union select 8 union select 9
  union select 10;
create table j1tst
  select floor(rand()*10) + 1 as jttype,
    adddate('2010-01-01', interval rand()*360 day) as jdate,
    (rand()*1000000/100) as jtsum
  from jgen a,jgen b,jgen c,jgen d,jgen e,jgen f;
create table j2tst
  select n jttype,uuid() as jtname from jgen a;

Which indexes would you create to speed up the query below?
(Find all)

select count(*) from j1tst v1
  inner join j2tst v2 on (v1.jttype=v2.jttype)
  where jdate between '2010-01-01' and '2010-01-10';

a) alter table j1tst add index (jdate);
b) alter table j1tst add index (jttype);
c) alter table j2tst add index (jttype);
d) alter table j2tst add index (jtname);
e) alter table j1tst add index (jtsum);

[ Basic Optimizations (10%) - Using Indexes for Optimization ]

Tags: learning, mysql, question

Question 42: How do you find the largest table?

17. May 2010

By plogi in mysql questions

1 Comment

Find as many ways as possible to determine the name and schema of the table with the most rows inside your mysql-instance.

Hint: Use the information schema.

[ Obtaining Database Metadata (10%) - Using INFORMATION_SCHEMA to Obtain Metadata ]

Tags: learning, mysql, question

Question 41: Which of the queries is the most efficient?

9. March 2010

By urs in mysql questions

1 Comment

-- test data creation
create table rgen
  select 1 as n
    union select 2 union select 3 union select 4 union select 5;
create table rtst
  select floor(rand()*3) as rt,(rand()*1000000/100) as rs
  from rgen a,rgen b,rgen c,rgen d,rgen e,rgen f,rgen g;

With the given tables, you would like to create a result similar to the one below:

+-------+-------+-------+
| 1nr   | 2nr   | 3nr   |
+-------+-------+-------+
| 26146 | 25874 | 26105 |
+-------+-------+-------+
1 row in set (0.13 sec)

Which of the queries is the most efficient?

a) select
     (select count(*) from rtst where rt=0) as 1nr,
     (select count(*) from rtst where rt=1) as 2nr,
     (select count(*) from rtst where rt=2) as 3nr;

b) select
     sum(if(rt=0,1,0)) as 1nr,
     sum(if(rt=1,1,0)) as 2nr,
     sum(if(rt=2,1,0)) as 3nr
   from rtst;

c) select
     v1.c as 1nr,
     v2.c as 2nr,
     v3.c as 3nr
   from
     (select count(*) as c from rtst where rt=0) v1,
     (select count(*) as c from rtst where rt=1) v2,
     (select count(*) as c from rtst where rt=2) v3;

[ Querying for Data (15%) - Using SELECT to Retrieve Data ]

Tags: learning, mysql, question

Question 40: Which of the following queries return `0`?

8. March 2010

By urs in mysql questions

1 Comment

(Find all correct answers)

a) select count(i)
     from (select 1 as i union select 2 union select null) v1
     where i is null;
 
b) select count(*)
     from (select 1 as i union select 2 union select null) v1
     where i is null;
 
c) select count(*)
     from (select 1 as i union select 2 union select null) v1
     where i <=> null;
 
d) select count(*)
     from (select 1 as i union select 2 union select null) v1
     where i = null;

[ SQL Expressions (15%) - NULL Values ]

Tags: learning, mysql, question

Question 39: Which of the alternatives below actually work?

7. March 2010

By urs in mysql questions

1 Comment

Imagine you have to create a running number for a query result and select only the line 1, 2 and 18.
You choose user variables as a means to do that.

(Find all correct answers)

a) set @a=0;
      select *
      from (select (@a:=@a+1) as ln,table_name
              from information_schema.tables
              where table_schema='mysql' order by table_name) v1
      where ln in (1,2,18);
 
b) set @a=0;
    select (@a:=@a+1) as ln,table_name
      from information_schema.tables
      where table_schema='mysql' and @a in (1,2,18)
      order by table_name;
 
c) set @a=0;
    select @a as ln,table_name
      from information_schema.tables
      where table_schema='mysql' having (@a:=@a+1) in (1,2,18)
      order by table_name;

[ User Variables (5%) - User Variable Properties ]

Tags: learning, mysql, question

Next » « Previous

Question 48: Why is query B faster than query A?

Question 47: Which indexes will help with the performance of the following query?

Question 46: What does “SELECT 1;” actually do?

Question 45: Which of the following plans belongs to the query?

Question 44: Which query belongs to the execution plan?

Question 43: Which indexes would you create to speed up the query below?

Question 42: How do you find the largest table?

Question 41: Which of the queries is the most efficient?

Question 40: Which of the following queries return `0`?

Question 39: Which of the alternatives below actually work?

About MySQL Question of the Day

Search

Archives

cool mysql resources

Users

Pages