博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
Greenplum通过gp_dist_random('gp_id') 在所有节点调用某个函数
阅读量:6205 次
发布时间:2019-06-21

本文共 8431 字,大约阅读时间需要 28 分钟。

使用greenplum时,如果需要调用一个函数,这个函数很可能就在master执行,而不会跑到segment上去执行。

例如 random()函数。
通过select random()来调用的话,不需要将这条SQL发送到segment节点,所以执行计划如下,没有gather motion的过程。

postgres=# explain analyze select random();                                         QUERY PLAN                                         ----------------------------------------------------------------------------------------   Result  (cost=0.01..0.02 rows=1 width=0)     Rows out:  1 rows with 0.017 ms to end, start offset by 0.056 ms.     InitPlan       ->  Result  (cost=0.00..0.01 rows=1 width=0)             Rows out:  1 rows with 0.004 ms to end of 2 scans, start offset by 0.059 ms.   Slice statistics:     (slice0)    Executor memory: 29K bytes.     (slice1)    Executor memory: 29K bytes.   Statement statistics:     Memory used: 128000K bytes   Total runtime: 0.074 ms  (11 rows)

如果要让这个函数在segment执行,怎么办呢?

通过gp_dist_random('gp_id')来调用,gp_dist_random的参数是一个可查询的视图,或表。

postgres=# explain analyze select random() from gp_dist_random('gp_id');                                                                 QUERY PLAN                                                                  -----------------------------------------------------------------------------------------------------------------------------------------   Gather Motion 240:1  (slice1; segments: 240)  (cost=0.00..4.00 rows=240 width=0)     Rows out:  240 rows at destination with 6.336 ms to first row, 59 ms to end, start offset by 4195 ms.     ->  Seq Scan on gp_id  (cost=0.00..4.00 rows=1 width=0)           Rows out:  Avg 1.0 rows x 240 workers.  Max 1 rows (seg0) with 0.073 ms to first row, 0.075 ms to end, start offset by 4207 ms.   Slice statistics:     (slice0)    Executor memory: 471K bytes.     (slice1)    Executor memory: 163K bytes avg x 240 workers, 163K bytes max (seg0).   Statement statistics:     Memory used: 128000K bytes   Total runtime: 4279.445 ms  (10 rows)

gp_id在每个segment中都有一条记录,所以以上SQL会在每个SEGMENT中调用一次random()并返回所有结果,例如我的测试环境中有240个segment, 那么以上SQL将返回240条记录。

在gp_id的定义中,介绍了gp_dist_random用它可以做一些管理的工作:

譬如查询数据库的大小,查询表的大小,其实都是这样统计的。
src/backend/catalog/postgres_bki_srcs

/*-------------------------------------------------------------------------   *   * gp_id.h   *        definition of the system "database identifier" relation (gp_dbid)   *        along with the relation's initial contents.   *   * Copyright (c) 2009-2010, Greenplum inc   *   * NOTES   *    Historically this table was used to supply every segment with its   * identification information.  However in the 4.0 release when the file   * replication feature was added it could no longer serve this purpose   * because it became a requirement for all tables to have the same physical   * contents on both the primary and mirror segments.  To resolve this the   * information is now passed to each segment on startup based on the   * gp_segment_configuration (stored on the master only), and each segment   * has a file in its datadirectory (gp_dbid) that uniquely identifies the   * segment.   *   *   The contents of the table are now irrelevant, with the exception that   * several tools began relying on this table for use as a method of remote   * function invocation via gp_dist_random('gp_id') due to the fact that this   * table was guaranteed of having exactly one row on every segment.  The   * contents of the row have no defined meaning, but this property is still   * relied upon.   */  #ifndef _GP_ID_H_  #define _GP_ID_H_      #include "catalog/genbki.h"  /*   * Defines for gp_id table   */  #define GpIdRelationName                        "gp_id"    /* TIDYCAT_BEGINFAKEDEF       CREATE TABLE gp_id     with (shared=true, oid=false, relid=5001, content=SEGMENT_LOCAL)     (     gpname       name     ,     numsegments  smallint ,     dbid         smallint ,     content      smallint      );       TIDYCAT_ENDFAKEDEF  */

查询数据库大小的GP函数

postgres=# \df+ pg_database_size                                                                                                       List of functions     Schema   |       Name       | Result data type | Argument data types |  Type  |  Data access   | Volatility |  Owner   | Language |      Source code      |                         Description                           ------------+------------------+------------------+---------------------+--------+----------------+------------+----------+----------+-----------------------+-------------------------------------------------------------   pg_catalog | pg_database_size | bigint           | name                | normal | reads sql data | volatile   | dege.zzz | internal | pg_database_size_name | Calculate total disk space usage for the specified database   pg_catalog | pg_database_size | bigint           | oid                 | normal | reads sql data | volatile   | dege.zzz | internal | pg_database_size_oid  | Calculate total disk space usage for the specified database  (2 rows)

其中pg_database_size_name 的源码如下:

很明显,在统计数据库大小时也用到了select sum(pg_database_size('%s'))::int8 from gp_dist_random('gp_id');

Datum  pg_database_size_name(PG_FUNCTION_ARGS)  {          int64           size = 0;          Name            dbName = PG_GETARG_NAME(0);          Oid                     dbOid = get_database_oid(NameStr(*dbName));            if (!OidIsValid(dbOid))                  ereport(ERROR,                                  (errcode(ERRCODE_UNDEFINED_DATABASE),                                   errmsg("database \"%s\" does not exist",                                                  NameStr(*dbName))));                                                            size = calculate_database_size(dbOid);                    if (Gp_role == GP_ROLE_DISPATCH)          {                  StringInfoData buffer;                                    initStringInfo(&buffer);                    appendStringInfo(&buffer, "select sum(pg_database_size('%s'))::int8 from gp_dist_random('gp_id');", NameStr(*dbName));                    size += get_size_from_segDBs(buffer.data);          }            PG_RETURN_INT64(size);  }

不信我们可以直接查询这个SQL,和使用pg_database_size函数得到的结果几乎是一样的,只差了calculate_database_size的部分。

postgres=# select sum(pg_database_size('postgres'))::int8 from gp_dist_random('gp_id');        sum         ----------------   16006753522624  (1 row)    postgres=# select pg_database_size('postgres');   pg_database_size   ------------------     16006763924106  (1 row)

gp_dist_random('gp_id')本质上就是在所有节点查询gp_id,

gp_dist_random('pg_authid')就是在所有节点查询pg_authid,
例如:

postgres=# select * from gp_dist_random('gp_id');    gpname   | numsegments | dbid | content   -----------+-------------+------+---------   Greenplum |          -1 |   -1 |      -1   Greenplum |          -1 |   -1 |      -1   Greenplum |          -1 |   -1 |      -1   Greenplum |          -1 |   -1 |      -1   Greenplum |          -1 |   -1 |      -1   Greenplum |          -1 |   -1 |      -1   Greenplum |          -1 |   -1 |      -1   Greenplum |          -1 |   -1 |      -1   Greenplum |          -1 |   -1 |      -1   Greenplum |          -1 |   -1 |      -1  。。。。。。

如果不想返回太多记录,可以使用limit 来过滤,但是执行还是会在所有的segment都执行,如下:

postgres=# explain analyze select random() from gp_dist_random('gp_id') limit 1;                                                                    QUERY PLAN                                                                     -----------------------------------------------------------------------------------------------------------------------------------------------   Limit  (cost=0.00..0.04 rows=1 width=0)     Rows out:  1 rows with 5.865 ms to first row, 5.884 ms to end, start offset by 4212 ms.     ->  Gather Motion 240:1  (slice1; segments: 240)  (cost=0.00..0.04 rows=1 width=0)           Rows out:  1 rows at destination with 5.857 ms to end, start offset by 4212 ms.           ->  Limit  (cost=0.00..0.02 rows=1 width=0)                 Rows out:  Avg 1.0 rows x 240 workers.  Max 1 rows (seg0) with 0.062 ms to first row, 0.063 ms to end, start offset by 4228 ms.                 ->  Seq Scan on gp_id  (cost=0.00..4.00 rows=1 width=0)                       Rows out:  Avg 1.0 rows x 240 workers.  Max 1 rows (seg0) with 0.060 ms to end, start offset by 4228 ms.   Slice statistics:     (slice0)    Executor memory: 463K bytes.     (slice1)    Executor memory: 163K bytes avg x 240 workers, 163K bytes max (seg0).   Statement statistics:     Memory used: 128000K bytes   Total runtime: 4288.007 ms  (14 rows)

转载地址:http://xehca.baihongyu.com/

你可能感兴趣的文章
DataTable.ImportRow()与DataTable.Rows.Add()的区别
查看>>
微信小程序左右滑动切换页面示例代码--转载
查看>>
C2 shell
查看>>
【jQuery】关于选择器中的 :first 、 :first-child 、 :first-of-type
查看>>
dom 解析xml文件
查看>>
程序集、应用程序配置及App.config和YourSoft.exe.config .
查看>>
二叉树的基本操作及应用(三)
查看>>
A SimpleDataStore
查看>>
朱晔和你聊Spring系列S1E3:Spring咖啡罐里的豆子
查看>>
IOS CALayer的属性和使用
查看>>
温故而知新:柯里化 与 bind() 的认知
查看>>
查看修改swap空间大小
查看>>
Django REST framework
查看>>
出了本练内功的书:《完美软件开发:方法与逻辑》
查看>>
C链表反转(时间复杂度O(n))
查看>>
CSS 如何让Table的里面TD全有边框 而Table的右左边框没有
查看>>
我是怎样成长为系统架构师的
查看>>
从Eclipse转移到IntelliJ IDEA一点心得
查看>>
emoji表情引发的JNI崩溃
查看>>
如何让帝国CMS7.2搜索模板支持动态标签调用
查看>>