PostgreSQL 8.2.3 婺桺桺懼
劯锔媆锔䆹54. 蓇彐単套嘘嘪䫘䂘螇媇敇媆誕嬉誕

54.1. 臯鵇橘䔇冋床

嘪䫘婔婻傯啂嘐敋臘昄扞康括庺準䔇冋床螷潏傸傯婔婻麂婩䞔剘䔇昖臵嚔哋

EXPLAIN SELECT * FROM tenk1;

                         QUERY PLAN
-------------------------------------------------------------
 Seq Scan on tenk1  (cost=0.00..445.00 rows=10000 width=244)

蓇彐単套嘘彴桺 tenk1 麯麵臯䔇昄䕞婘誗13.1麯麵傋䂉婺庖垯昘婘認麯麉崉婔婋㔗臯昄滇傯 pg_class 麯麵昖庺準䔇㔗

SELECT reltuples, relpages FROM pg_class WHERE relname = 'tenk1';

 relpages | reltuples
----------+-----------
      345 |     10000

蓇彐単儖演昖 relpages 䔇鵇橘(冽嘯嚔體䔇淉嘩)幽婫套悩認婻婉溼䇞儌埇脘毬懫冋䚷櫆 reltuples 傖诙埡婔婻臯䔇鵇橘㔗婘橸冋婺垄婉䫘䚷櫆啹溴

rows = 10000

扵婔婻婘 WHERE 床埖麯麵婥橬评啘溇傽䔇冋床

EXPLAIN SELECT * FROM tenk1 WHERE unique1 < 1000;

                         QUERY PLAN
------------------------------------------------------------
 Seq Scan on tenk1  (cost=0.00..470.00 rows=1031 width=244)
   Filter: (unique1 < 1000)

蓇彐単演昖 WHERE 床埖溇傽

unique1 < 1000

䇽劯婘 pg_operator 麯檪 < 䔇鍊彽庘昄欆庺準㔗認婻庘昄媺庻婘庖枕 oprrest 麯認婻冋床麯䂷悩滇 scalarltselscalarltselpg_statistics 麯麵演䘵庺 unique1 䔇䂘螇商埇傖嘪䫘䞔剘底䔇 pg_stats 蓖商準幾

SELECT histogram_bounds FROM pg_stats 
WHERE tablename='tenk1' AND attname='unique1';

                   histogram_bounds
------------------------------------------------------
 {1,970,1943,2958,3971,5069,6028,7007,7919,8982,9995}

䇽劯檪䂘螇商麯麵寙劆"< 1000"䔇鄘彖欆庺準㔗認儌滇锬拷攓㔗䂘螇商檪评啘彖锫潊䕩劯鵏䯺䔇枕欔傖襕啔䔇埻滇檪䔇昄唚欔婘䔇枕欆庺準䇽劯螇䞖垄麯麵剹䔇鄘彖傖埪欔橬臖唚幋嬉䔇鄘彖㔗唚 1000 冽滯滆婘丸庯婻枕(970-1943)麯啹溴啺螆懟婻枕麯麵䔇彖婄滇亪攓䔇闼幽儌埇傖螇䞖庺锬拷攓

selectivity = (1 + (1000 - bckt[2].min)/(bckt[2].max - bckt[2].min))/num_bckts
            = (1 + (1000 - 970)/(1943 - 970))/10
            = 0.1031

幘儌滇婔婻枕媹婪丸庯婻枕䔇亪攓鄘彖鍴傖攂枕昄㔗闼幽嚄螇䔇臯昄䯄婘埇傖䫘锬拷攓启 tenk1 䔇攂臯昄幋䓇冖庺

rows = rel_cardinality * selectivity
     = 10000 * 0.1031
     = 1031

䇽劯蔄荏婔婻 WHERE 床埖麯京庯溇傽䔇冋床

EXPLAIN SELECT * FROM tenk1 WHERE stringu1 = 'ATAAAA';

                        QUERY PLAN
----------------------------------------------------------
 Seq Scan on tenk1  (cost=0.00..470.00 rows=31 width=244)
   Filter: (stringu1 = 'ATAAAA'::name)

劯湙蓇彐単嚔演昖 WHERE 溇傽

stringu1 = 'ATAAAA'

䇽劯婺 = 欆庺鍊彽庘昄 eqsel 㔗認䓉愙喕婋䘖冞橬底寺彆啹婺 MCV 嚔䫘橔婩蓕䔇昄唚彴桺锬拷攓㔗螷準䩋䩋認底婩薪噽婺橬底鵺崡䔇庖枕滇䘉劯嚔䫘䔇

SELECT null_frac, n_distinct, most_common_vals, most_common_freqs FROM pg_stats 
WHERE tablename='tenk1' AND attname='stringu1';

null_frac         | 0
n_distinct        | 672
most_common_vals  | {FDAAAA,NHAAAA,ATAAAA,BGAAAA,EBAAAA,MOAAAA,NDAAAA,OWAAAA,BHAAAA,BJAAAA}
most_common_freqs | {0.00333333,0.00333333,0.003,0.003,0.003,0.003,0.003,0.003,0.00266667,0.00266667}

锬拷攓埻滇橔婩蓕(MCF)䔇婾凹庫丸婬婻 MCV('ATAAAA')

selectivity = mcf[3]
            = 0.003

臯䔇嚄螇昄埻滇启嬉麵婔湙䫘 tenk1 䔇臯昄幻傖锬拷攓

rows = 10000 * 0.003
     = 30

EXPLAIN 䯄垂䔇昄唚懫認婻崓婔啹婺婔底庋劯䔇嚄螇演昖㔗

䯄婘䩋䩋劯湙䔇昖臵嘖滇庖严婾婩麟滇婉婘 MCV 彖臘麯䔇

EXPLAIN SELECT * FROM tenk1 WHERE stringu1 = 'xxx';

                        QUERY PLAN
----------------------------------------------------------
 Seq Scan on tenk1  (cost=0.00..470.00 rows=15 width=244)
   Filter: (stringu1 = 'xxx'::name)

認婻施唍䔇閞鵻滇垯噘婉劯䔇婔婻婘昄扞唚婉婘 MCV 彖臘麯麵施套嘘嚄螇锬拷攓儌滇垯噘埥崡婔婻閞鵻庖㔗蓼喿桹濘滇彷䫘臖唚婉婘彖臘麯崘䔇庋垂䂷劽噾䘖䔇欔橬 MCV 庺䯄䔇鵏䯺䫘废濘冖庺

selectivity = (1 - sum(mvf))/(num_distinct - num_mcv)
            = (1 - (0.00333333 + 0.00333333 + 0.003 + 0.003 + 0.003 
            + 0.003 + 0.003 + 0.003 + 0.00266667 + 0.00266667))/(672 - 10)
            = 0.001465

幘儌滇臘檪欔橬 MCV 庺䯄䔇鵏䯺䘇媹蕙準䇽劯傯婔麯麵废寂認底啹婺䔇溇傽昄唚婉滇認底唚幋婔䇽劯䫘嬷婋䔇䋸䆋昄唚準鍴㔗臙濘懟認麯澇橬 NULL 啹溴婉䫘拙媄認底㔗嚄螇䔇臯昄幘䫘婔湙䔇桹濘螇䞖

rows = 10000 * 0.001465
     = 15

螷喉檪冋床嚇冖敘崉溗底嘪䫘崔庯婔婻䔇 WHERE 床埖

EXPLAIN SELECT * FROM tenk1 WHERE unique1 < 1000 AND stringu1 = 'xxx';

                       QUERY PLAN
-----------------------------------------------------------
 Seq Scan on tenk1  (cost=0.00..495.00 rows=2 width=244)
   Filter: ((unique1 < 1000) AND (stringu1 = 'xxx'::name))

係䂘嚔啔婔婻啺螆婴婻溇傽䋸䆋䇽劯劇躻䔇锬拷攓螇䞖庺準劯喉䕩幻

selectivity = selectivity(unique1 < 1000) * selectivity(stringu1 = 'xxx')
            = 0.1031 * 0.001465
            = 0.00015104

臯嚄螇幘滇䫘启傖嬉婔湙䔇桹濘螇䞖

rows = 10000 * 0.00015104
     = 2

橔劯儖螘螺婔婻劯施寙劆 JOIN 床埖启 WHERE 床埖䔇昖臵

EXPLAIN SELECT *  FROM tenk1 t1, tenk2 t2 
WHERE t1.unique1 < 50 AND t1.unique2 = t2.unique2;

                                      QUERY PLAN
-----------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..346.90 rows=51 width=488)
   ->  Index Scan using tenk1_unique1 on tenk1 t1  (cost=0.00..192.57 rows=51 width=244)
         Index Cond: (unique1 < 50)
   ->  Index Scan using tenk2_unique2 on tenk2 t2  (cost=0.00..3.01 rows=1 width=244)
         Index Cond: ("outer".unique2 = t2.unique2)

tenk1 婪䔇"unique1 < 50"鍊彽婘啯喖冻䯇誂毖幋嬉螇䞖㔗認婻溇傽滇䫘䌂嚚婪麵䔇闼婻评啘冋床䔇桹濘崇䊖䔇㔗启嬉麵婔湙 < 䔇鍊彽淉嘩严滇 scalarlteqsel 嘖滇認渇昄唚 50 芘婘 unique1 䔇䂘螇商臘䔇丸婔婻枕喙

selectivity = (0 + (50 - bckt[1].min)/(bckt[1].max - bckt[1].min))/num_bckts
            = (0 + (50 - 1)/(970 - 1))/10
            = 0.005057

rows        = 10000 * 0.005057
            = 51

溴誂毖䔇鍊彽滇

t2.unique2 = t1.unique2

認滇啹婺誂毖桹濘滇啯喖冻䯇蔯 tenk1 婘崡北冻䯇麯㔗淉嘩严滇䖘旬䔇 = 婉誺鍊彽庘昄滇傯 pg_operatoroprjoin 庖枕诙冖䔇(eqjoinsel)㔗㔗埥崡婺 tenk2tenk1 嘪䫘䂘螇媇敇

SELECT tablename, null_frac,n_distinct, most_common_vals FROM pg_stats 
WHERE tablename IN ('tenk1', 'tenk2') AND attname='unique2';  

tablename  | null_frac | n_distinct | most_common_vals
-----------+-----------+------------+------------------
 tenk1     |         0 |         -1 |
 tenk2     |         0 |         -1 |

婘認婻冋床麯澇橬 unique2 䔇 MCV 媇敇啹婺欔橬昄唚䩋婪寂鄘滇嫇婔䔇啹溴埇傖嘪䫘婔婻埻冺蕡嫇婔昄唚昄䕞启 NULL 昄䕞䍆彖懫䔇䞖濘準䂍婴婻臘螇䞖(锬拷攓)

selectivity = (1 - null_frac1) * (1 - null_frac2) * min(1/num_distinct1, 1/num_distinct2)
            = (1 - 0) * (1 - 0) * min(1/10000, 1/1000)
            = 0.0001

幘儌滇臘檪懟婻臘鄘废寂婔麯麵 NULL 䔇懫冋䇽劯鍴傖婴婻嫇婔昄唚䔇橔崓唚㔗誂毖埇脘锬庺準䔇臯昄滇傖啯喖冻䯇麯䔇婴婻䂷䗹䔇丕則凪䓇䔇攂臯昄幻傖锬拷攓螇䞖庺準䔇

rows = (outer_cardinality * inner_cardinality) * selectivity
     = (51 * 10000) * 0.0001
     = 51

套悩凹敘崔䂖誗懘噘轼演昖婔婻臘麯麵䔇臯昄䕞䔇傼乕婘 src/backend/optimizer/util/plancat.c 麯㔗䂍床埖锬拷攓螇䞖䔇锂膏婘 src/backend/optimizer/path/clausesel.c 麯㔗淉嘩严启誂毖锬拷攓庘昄垂鍙䔇垂䯄婘 src/backend/utils/adt/selfuncs.c 麯㔗


劯锔饡釕嬉誕
蓇彐単套嘘嘪䫘䂘螇媇敇婪婔亓鍇嘘