create table work1 ( site_id int, sale_dt date, amount decimal(10,2), primary key (site_id,sale_dt)); create table work2 ( site_id int, sale_dt date, amount decimal(10,2), primary key (site_id,sale_dt)); -- test data create table wgen(a1 bit(1)); insert into wgen values (1),(1),(1),(0),(1),(1),(1),(1),(0),(1); insert ignore into work1 select rand()*100, adddate('2010-01-01', interval rand()*360 day),rand()*100000 from wgen v1,wgen v2,wgen v3,wgen v4; insert ignore into work2 select rand()*100,adddate('2010-01-01', interval rand()*360 day),rand()*100000 from wgen v1,wgen v2,wgen v3,wgen v4;
You would like to rewrite the query below in different ways.
select count(*) from work1 v1 where not exists ( select * from work2 where site_id=v1.site_id and sale_dt=v1.sale_dt);
(Find all correct answers)
a) select count(*) from work1 v1 left join work2 v2 on (v1.site_id=v2.site_id and v1.sale_dt=v2.sale_dt) where v2.site_id is null and v2.sale_dt is null; b) select count(*) from work1 v1, work2 v2 where (v1.site_id<>v2.site_id and v1.sale_dt<>v2.sale_dt); c) select count(*) from work1 v1 where (v1.site_id,v1.sale_dt) not in (select site_id,sale_dt from work2); d) select count(*) from (select * from work1 minus select * from work2) v1;
Extra question: How does the test-data generation work, and how many rows will it create?
[ Subqueries (10%) - Converting Subqueries to Joins ]
Answers:
a c
Answer a is probably the most efficient.
b does not return the correct result.
d uses “MINUS”, which is not supported in mysql.
The wgen table is filled with 10 rows. The insert selects from a cartesian join from 4 tables,
(well, 4 aliases of the same table) resulting in 10 * 10 * 10 * 10 = 10000 rows.