OceanBase 聯(lián)接順序

2021-06-30 11:42 更新

在多表聯(lián)接的場景中,優(yōu)化器的一個(gè)很重要的任務(wù)是決定各個(gè)表之間的聯(lián)接順序(Join Order),因?yàn)椴煌穆?lián)接順序會影響中間結(jié)果集的大小,進(jìn)而影響到計(jì)劃整體的執(zhí)行代價(jià)。

為了減少執(zhí)行計(jì)劃的搜索空間和計(jì)劃執(zhí)行的內(nèi)存占用,OceanBase 數(shù)據(jù)庫優(yōu)化器在生成聯(lián)接順序時(shí)主要考慮左深樹的聯(lián)接形式。下圖展示了左深樹、右深樹和多支樹的計(jì)劃形狀。


OceanBase 數(shù)據(jù)庫聯(lián)接順序的生成采用了 System-R 的動態(tài)規(guī)劃算法,考慮的因素包括每一個(gè)表可能的訪問路徑、Interesting Order、聯(lián)接算法(NESTED-LOOP、BLOCK-BASED NESTED-LOOP 或者 SORT-MERGE 等)以及不同表之間的聯(lián)接選擇率等。

給定 N 個(gè)表的聯(lián)接,OceanBase 數(shù)據(jù)庫生成聯(lián)接順序的方法如下:

  1. 為每一個(gè)基表生成訪問路徑,保留代價(jià)最小的訪問路徑以及有所有有 Interesting Order 的路徑。一個(gè)路徑 如果具有 Interesting Order,它的序能夠被后續(xù)的算子使用。

  2. 生成所有表集合的大小為 i (1 < i <= N) 的計(jì)劃。 OceanBase 數(shù)據(jù)庫一般只考慮左深樹,表集合大小為 i 的計(jì)劃可以由一個(gè)表集合大小為 i 的計(jì)劃和一個(gè)基表的計(jì)劃組成。OceanBase 數(shù)據(jù)庫按照這種策略,考慮了所有的聯(lián)接算法以及 Interesting Order 的繼承等因素把所有表集合大小為 i 的計(jì)劃生成。這里也只是保留代價(jià)最小的計(jì)劃以及所有具有 Interesting Order 的計(jì)劃。

同時(shí),OceanBase 數(shù)據(jù)庫提供了 HINT 機(jī)制 /*+LEADING(table_name_list)*/去控制多表聯(lián)接的順序。

如下例所示,開始選擇的聯(lián)接順序是先做 t1、t2 的 JOIN 聯(lián)接,然后再和 t3 做 JOIN 聯(lián)接;如果用戶希望先做 t2、t3 的 JOIN 聯(lián)接,然后再和 t1做 JOIN 聯(lián)接,則可以使用 HINT /*+LEADING(t2,t3,t1)*/去控制;如果用戶希望先做 t1、t3 的 JOIN 聯(lián)接,然后再和 t2 做 JOIN 聯(lián)接,則可以使用 HINT /*+LEADING(t1,t3,t2)*/去控制。

obclient>CREATE TABLE t1(c1 INT, c2 INT, PRIMARY KEY(c1));
Query OK, 0 rows affected (0.31 sec)

obclient>CREATE TABLE t2(c1 INT, c2 INT, PRIMARY KEY(c1));
Query OK, 0 rows affected (0.33 sec)

obclient>CREATE TABLE t3(c1 INT, c2 INT, PRIMARY KEY(c1));
Query OK, 0 rows affected (0.44 sec)

obclient>EXPLAIN SELECT * FROM t1,t2,t3 WHERE t1.c1 = t2.c2 AND t2.c1 = t3.c2;
+-----------------------------------------------------------------+
| Query Plan                                                                              |
+-----------------------------------------------------------------+
| =======================================
|ID|OPERATOR    |NAME|EST. ROWS|COST  |
---------------------------------------
|0 |HASH JOIN   |    |98010    |926122|
|1 | TABLE SCAN |T3  |100000   |61860 |
|2 | HASH JOIN  |    |99000    |494503|
|3 |  TABLE SCAN|T1  |100000   |61860 |
|4 |  TABLE SCAN|T2  |100000   |61860 |
=======================================

Outputs & filters: 
-------------------------------------
  0 - output([T1.C1], [T1.C2], [T2.C1], [T2.C2], [T3.C1], [T3.C2]), filter(nil), 
      equal_conds([T2.C1 = T3.C2]), other_conds(nil)
  1 - output([T3.C2], [T3.C1]), filter(nil), 
      access([T3.C2], [T3.C1]), partitions(p0)
  2 - output([T1.C1], [T1.C2], [T2.C1], [T2.C2]), filter(nil), 
      equal_conds([T1.C1 = T2.C2]), other_conds(nil)
  3 - output([T1.C1], [T1.C2]), filter(nil), 
      access([T1.C1], [T1.C2]), partitions(p0)
  4 - output([T2.C2], [T2.C1]), filter(nil), 
      access([T2.C2], [T2.C1]), partitions(p0)

obclient>EXPLAIN SELECT /*+LEADING(t2,t3,t1)*/* FROM t1,t2,t3 WHERE t1.c1 = t2.c2
        AND t2.c1 = t3.c2;
+-----------------------------------------------------------------+
| Query Plan                                                                              |
+-----------------------------------------------------------------+
| ========================================
|ID|OPERATOR    |NAME|EST. ROWS|COST   |
----------------------------------------
|0 |HASH JOIN   |    |98010    |1096613|
|1 | HASH JOIN  |    |99000    |494503 |
|2 |  TABLE SCAN|T2  |100000   |61860  |
|3 |  TABLE SCAN|T3  |100000   |61860  |
|4 | TABLE SCAN |T1  |100000   |61860  |
========================================

Outputs & filters: 
-------------------------------------
  0 - output([T1.C1], [T1.C2], [T2.C1], [T2.C2], [T3.C1], [T3.C2]), filter(nil), 
      equal_conds([T1.C1 = T2.C2]), other_conds(nil)
  1 - output([T2.C1], [T2.C2], [T3.C1], [T3.C2]), filter(nil), 
      equal_conds([T2.C1 = T3.C2]), other_conds(nil)
  2 - output([T2.C2], [T2.C1]), filter(nil), 
      access([T2.C2], [T2.C1]), partitions(p0)
  3 - output([T3.C2], [T3.C1]), filter(nil), 
      access([T3.C2], [T3.C1]), partitions(p0)
  4 - output([T1.C1], [T1.C2]), filter(nil), 
      access([T1.C1], [T1.C2]), partitions(p0)

obclient>EXPLAIN SELECT /*+LEADING(t1,t3,t2)*/* FROM t1,t2,t3 WHERE t1.c1 = t2.c2 
       AND t2.c1 = t3.c2;
+-----------------------------------------------------------------+
| Query Plan                                                                              |
+-----------------------------------------------------------------+
| =============================================================
|ID|OPERATOR                   |NAME|EST. ROWS  |COST       |
-------------------------------------------------------------
|0 |HASH JOIN                  |    |98010      |53098071243|
|1 | NESTED-LOOP JOIN CARTESIAN|    |10000000000|7964490204 |
|2 |  TABLE SCAN               |T1  |100000     |61860      |
|3 |  MATERIAL                 |    |100000     |236426     |
|4 |   TABLE SCAN              |T3  |100000     |61860      |
|5 | TABLE SCAN                |T2  |100000     |61860      |
=============================================================

Outputs & filters: 
-------------------------------------
  0 - output([T1.C1], [T1.C2], [T2.C1], [T2.C2], [T3.C1], [T3.C2]), filter(nil), 
      equal_conds([T1.C1 = T2.C2], [T2.C1 = T3.C2]), other_conds(nil)
  1 - output([T1.C1], [T1.C2], [T3.C1], [T3.C2]), filter(nil), 
      conds(nil), nl_params_(nil)
  2 - output([T1.C1], [T1.C2]), filter(nil), 
      access([T1.C1], [T1.C2]), partitions(p0)
  3 - output([T3.C1], [T3.C2]), filter(nil)
  4 - output([T3.C2], [T3.C1]), filter(nil), 
      access([T3.C2], [T3.C1]), partitions(p0)
  5 - output([T2.C2], [T2.C1]), filter(nil), 
      access([T2.C2], [T2.C1]), partitions(p0)
以上內(nèi)容是否對您有幫助:
在線筆記
App下載
App下載

掃描二維碼

下載編程獅App

公眾號
微信公眾號

編程獅公眾號