按实例查看资源使用

查看资源使用(sd_stat_perf扩展)¶

当一个数据库负载较大时，需要找出慢查询并分析原因。sd_stat_perf扩展（默认自动安装）支持查看数据库当前资源（cpu、磁盘、内存、网络）使用情况。

提供了5个基础视图

视图名	功能
stat_perf.__sd_workfile()	返回当前节点的溢出临时文件信息（函数）
stat_perf.pg_sd_stat_activity	获取当前节点pg_stat_activity视图中的查询的资源使用情况
stat_perf.pg_sd_stat_curr	获取当前节点pg_stat_activity视图中的查询的资源使用情况（最近1秒）
stat_perf.sd_analyze_session_time	按时长统计当前节点pg_stat_activity视图中查询
stat_perf.sd_stat_activity	概要查看节点上每条sql的资源使用（从sql开始执行到现在）
stat_perf.sd_stat_activity_detail	详细查看节点上每条sql的资源使用（从sql开始执行到现在）
stat_perf.sd_stat_curr	概要查看节点上每条sql的资源使用（最近1秒）
stat_perf.sd_stat_curr_detail	详细查看节点上每条sql的资源使用（最近1秒）
stat_perf.sd_stat_host	查看主机的资源使用(最近1秒)
stat_perf.sd_workfile	返回当前节点的溢出临时文件信息（视图）
stat_perf.sd_workfile_with_query	获取当前节点的查询的溢出临时文件信息
stat_perf.sd_workfile_per_query	以查询和分片为单位汇总当前节点的溢出临时文件信息
stat_perf.sd_stat_active_backend_queue	查询后端运行队列统计信息

注： * 以上视图工作在多进程模式下，即必须设置 enable_multi_threading = off

查看每条sql的资源使用¶

stat_perf.pg_sd_stat_activity 视图

获取当前节点pg_stat_activity视图中的查询的资源使用情况。

本质是pg_stat_activity与stat_perf.sd_stat_activity_detail关联结果。

Column	Type	说明
datid	oid	数据库ID
datname	name	数据库名
pid	integer	进程ID
sess_id	bigint	会话ID
usesysid	oid	用户ID
usename	name	用户名
application_name	text	应用名称
client_addr	inet	客户端地址
client_hostname	text	客户端主机名
client_port	integer	客户端端口号
backend_start	timestamp with time zone	后端连接开始时间
xact_start	timestamp with time zone	事务开始时间
query_start	timestamp with time zone	查询开始时间
state_change	timestamp with time zone	状态变更时间
wait_event_type	text	等待时间类型
wait_event	text	等待的事件
state	text	查询状态
backend_xid	xid	后端xid
backend_xmin	xid	后端xmin
query	text	SQL
backend_type	text	后端类型
rsgid	oid	资源组ID
rsgname	text	资源组名
processed_rows	bigint	影响行数
scanned_rows	bigint	扫描行数
mem_used	bigint	内存使用量（字节）
mem_peak	bigint	内存峰值（字节）
slice	integer	查询计划切片
cmd_id	integer	查询的命令ID
cpu_all	integer	CPU使用率（%）
cpu_user	integer	用户态CPU使用量（%）
cpu_sys	integer	内核态CPU使用量（%）
cpu_iowait	integer	IO wait状态的CPU使用量（%）
mem_rss_used	bigint	驻留内存（字节）
mem_rss_peak	bigint	驻留内存峰值（字节）
disk_read	bigint	磁盘读（字节）
disk_write	bigint	磁盘写（字节）
temp_write	bigint	磁盘临时文件写（字节）
ic_recv	bigint	内网收（字节）
ic_send	bigint	内网发（字节）
ic_retry	bigint	内网重试次数

stat_perf.pg_sd_stat_curr 视图

获取当前节点pg_stat_activity视图中的查询的资源使用情况（最近1秒）。

本质是pg_stat_activity与stat_perf.sd_stat_curr_detail关联结果。

字段说明同“stat_perf.pg_sd_stat_activity”。

stat_perf.sd_analyze_session_time 视图

按时长统计当前节点pg_stat_activity视图中查询。

Column	Type	说明
client_addr	inet	客户端地址
usename	name	用户名
backend_type	text	后端类型
connect_cnt	bigint	连接总数
connect_1s	bigint	超过1秒的连接数
connect_1m	bigint	超过1分钟的连接数
connect_1h	bigint	超过1小时的连接数
connect_1d	bigint	超过1天的连接数
run_cnt	bigint	执行中的总数
run_1s	bigint	执行中的超过1秒的连接数
run_1m	bigint	执行中的超过1分钟的连接数
run_1h	bigint	执行中的超过1小时的连接数
run_1d	bigint	执行中的超过1天的连接数
idle_cnt	bigint	空闲连接总数
idle_1s	bigint	空闲超过1秒的连接数
idle_1m	bigint	空闲超过1分钟的连接数
idle_1h	bigint	空闲超过1小时的连接数
idle_1d	bigint	空闲超过1天的连接数

stat_perf.sd_stat_activity 视图

Column	Type	说明
pid	integer	进程pid
sess_id	bigint	集群session_id
cmd_id	integer	一个session内的命令编号
slice	integer	对应计划的slice号
state	text	当前进程活动状态
cpu_all	integer	cpu百分比
mem_used	bigint	实时虚拟内存（字节）
mem_peak	bigint	峰值虚拟内存（字节）
disk_read	bigint	磁盘读（字节）
disk_write	bigint	磁盘写（字节）
ic_recv	bigint	udp收（字节）
ic_send	bigint	udp发（字节）

stat_perf.sd_stat_activity_detail 视图

Column	Type	说明
pid	integer	进程pid
sess_id	bigint	集群session_id
cmd_id	integer	一个session内的命令编号
slice	integer	对应计划的slice号
state	text	当前进程活动状态
usename	name	用户名
rsgid	text	资源组ID
rsgname	name	资源组名
cpu_all	integer	cpu百分比
cpu_user	integer	user cpu百分比
cpu_sys	integer	sys cpu百分比
cpu_iowait	integer	iowait cpu百分比
mem_rss_used	bigint	实时驻留内存（字节）
mem_rss_peak	bigint	峰值驻留内存（字节）
mem_used	bigint	实时虚拟内存（字节）
mem_peak	bigint	峰值虚拟内存（字节）
mem_used_ve	bigint	VE引擎申请的当前内存（字节）
mem_peak_ve	bigint	VE引擎申请的峰值内存（字节）
ve_arena_used	bigint	VE引擎申请的当前内存池（字节）
ve_hash_used	bigint	VE引擎hash表当前申请的内存（字节）
ve_col_used	bigint	VE引擎列存储当前申请的内存（字节）
ve_buf_used	bigint	VE引擎当前申请的缓冲区（字节）
ve_arena_peak	bigint	VE引擎申请的峰值内存池（字节）
ve_hash_peak	bigint	VE引擎hash表峰值申请的内存（字节）
ve_col_peak	bigint	VE引擎hash表峰值申请的内存（字节）
ve_buf_peak	bigint	VE引擎列存储峰值申请的内存（字节）
disk_read	bigint	磁盘读（字节）
disk_write	bigint	磁盘写（字节）
temp_write	bigint	算子产生的的临时文件写（字节）
ic_recv	bigint	网络收（字节）
ic_send	bigint	网络发（字节）
ic_retry	bigint	网络重试次数

cpu百分比：显示的是单个进程单核cpu百分比，范围0-100，如果该进程内有多线程并行，数值会超过100
user cpu百分比：通常代表程序使用的cpu
sys cpu百分比，通常代表一些锁操作、系统占用的cpu
iowait cpu百分比，通常代表cpu在等待磁盘读写
驻留内存：实际使用的内存，包含私有内存和共享内存
峰值驻留内存：内核必须>=4.0否则显示0
虚拟内存：分配出的内存(不一定已使用)，由于程序做了内部修正，可能显示的比驻留内存大
缓冲区内存：通常被用来从磁盘上压缩解压文件
注：使用stat_perf.sd_stat_activity 需要预先设置 sd_stat_activity_enable = on

例：查询指定sess_id的实际内存使用

--<0> 找出当前正在执行的sql对应的sess_id
select sess_id from pg_stat_activity
where query like 'select sum(l_orderkey) from lineitem%';
    sess_id
--------------
     1000000000002
(1 row)


--<1> 在当前coordinator上查询
select mem_rss_used from stat_perf.sd_stat_activity_detail
where sess_id=1000000000002;
 mem_rss_used
--------------
     37462016
(1 row)


--<2> 在所有的executor上查询
select sc_dbindex_id,sc_hostname,sc_port,mem_rss_used
from stat_perf.sd_stat_activity_detail{executor}
where sess_id=1000000000002;
 sc_dbindex_id | sc_hostname | sc_port | mem_rss_used
---------------+-------------+---------+--------------
             0 | test-4      | 5225    |     98570240
             1 | test-4      | 5226    |     98508800
             2 | test-4      | 5227    |     98697216
(3 rows)


--<3> 在executor 1 上查询
select sc_dbindex_id,sc_hostname,sc_port,mem_rss_used
from stat_perf.sd_stat_activity_detail{1}
where sess_id=1000000000002;
 sc_dbindex_id | sc_hostname | sc_port | mem_rss_used
---------------+-------------+---------+--------------
             1 | test-4      | 5226    |     98508800
(1 row)


--<4> 在主机名是 'test-4' 的机器的所有节点查询
select sc_dbindex_id,sc_hostname,sc_port,mem_rss_used 
from stat_perf.sd_stat_activity_detail{'test-4'}
where sess_id=1000000000002;
 sc_dbindex_id | sc_hostname | sc_port | mem_rss_used 
---------------+-------------+---------+--------------
             0 | test-4      | 5225    |     98570240
             1 | test-4      | 5226    |     98508800
             2 | test-4      | 5227    |     98697216
            -1 | test-4      | 5125    |     37462016
(4 rows)


--<5> 在主机名是 'test-4' 的机器上随机选一个节点查询
select sc_dbindex_id,sc_hostname,sc_port,mem_rss_used 
from stat_perf.sd_stat_activity_detail{{'test-4'}}
where sess_id=1000000000002;
 sc_dbindex_id | sc_hostname | sc_port | mem_rss_used 
---------------+-------------+---------+--------------
             0 | test-4      | 5225    |     98570240
(1 row)


--<6> 在所有包含executor的主机上，每个主机随机挑选一个节点查询                                                             ^
select sc_dbindex_id,sc_hostname,sc_port,mem_rss_used 
from stat_perf.sd_stat_activity_detail{{executor}}
where sess_id=1000000000002;
 sc_dbindex_id | sc_hostname | sc_port | mem_rss_used 
---------------+-------------+---------+--------------
             0 | test-4      | 5225    |     98570240
(1 row)

查看最近1秒每条sql的资源使用¶

stat_perf.sd_stat_curr视图

stat_perf.sd_stat_curr的字段与stat_perf.sd_stat_activity相同，除内存外，其他各项含义从sql开始执行到现在，变成了最近1秒的资源使用情况

stat_perf.sd_stat_curr 视图

Column	Type	说明
pid	integer	进程pid
sess_id	bigint	集群session_id
cmd_id	integer	一个session内的命令编号
slice	integer	对应计划的slice号
state	text	当前进程活动状态
cpu_all	integer	最近1秒 cpu百分比
mem_used	bigint	实时虚拟内存（字节）
mem_peak	bigint	峰值虚拟内存（字节）
disk_read	bigint	最近1秒磁盘读（字节）
disk_write	bigint	最近1秒磁盘写（字节）
ic_recv	bigint	最近1秒网络收（字节）
ic_send	bigint	最近1秒网络发（字节）

stat_perf.sd_stat_curr_detail 视图

Column	Type	说明
pid	integer	进程pid
sess_id	bigint	集群session_id
cmd_id	integer	一个session内的命令编号
slice	integer	对应计划的slice号
state	text	当前进程活动状态
usename	name	用户名
rsgid	text	资源组ID
rsgname	name	资源组名
cpu_all	integer	最近1秒 cpu百分比
cpu_user	integer	最近1秒 user cpu百分比
cpu_sys	integer	最近1秒 sys cpu百分比
cpu_iowait	integer	最近1秒 iowait cpu百分比
mem_rss_used	bigint	实时驻留内存（字节）
mem_rss_peak	bigint	峰值驻留内存（字节）
mem_used	bigint	实时虚拟内存（字节）
mem_peak	bigint	峰值虚拟内存（字节）
mem_used_ve	bigint	VE引擎申请的当前内存（字节）
mem_peak_ve	bigint	VE引擎申请的峰值内存（字节）
ve_arena_used	bigint	VE引擎申请的当前内存池（字节）
ve_hash_used	bigint	VE引擎hash表当前申请的内存（字节）
ve_col_used	bigint	VE引擎列存储当前申请的内存（字节）
ve_buf_used	bigint	VE引擎当前申请的缓冲区（字节）
ve_arena_peak	bigint	VE引擎申请的峰值内存池（字节）
ve_hash_peak	bigint	VE引擎hash表峰值申请的内存（字节）
ve_col_peak	bigint	VE引擎hash表峰值申请的内存（字节）
ve_buf_peak	bigint	VE引擎列存储峰值申请的内存（字节）
disk_read	bigint	最近1秒磁盘读（字节）
disk_write	bigint	最近1秒磁盘写（字节）
temp_write	bigint	最近1秒算子产生的的临时文件写（字节）
ic_recv	bigint	最近1秒 udp收（字节）
ic_send	bigint	最近1秒 udp发（字节）
ic_retry	bigint	最近1秒 udp重试次数

例：查询指定sess_id的最近1秒实际内存使用

--<1> 在当前coordinator上查询
select mem_rss_used from stat_perf.sd_stat_curr_detail
where sess_id=1000000000002;
 mem_rss_used
--------------
     37462016
(1 row)


--<2> 在所有的executor上查询
select sc_dbindex_id,sc_hostname,sc_port,mem_rss_used
from stat_perf.sd_stat_curr_detail{executor} where sess_id=1000000000002;
 sc_dbindex_id | sc_hostname | sc_port | mem_rss_used
---------------+-------------+---------+--------------
             0 | test-4      | 5225    |     98570240
             1 | test-4      | 5226    |     98508800
             2 | test-4      | 5227    |     98697216
(3 rows)

如果想查询更长时间的，可以使用函数 stat_perf.sd_stat_get_curr(秒)

查看最近1秒主机的资源使用¶

stat_perf.sd_stat_host 视图

Column	Type	说明
avg_load	integer	最近1秒平均负载
cpu_all	integer	最近1秒 cpu百分比
cpu_user	integer	最近1秒 user cpu百分比
cpu_sys	integer	最近1秒 sys cpu百分比
cpu_iowait	integer	最近1秒 iowait cpu百分比
mem_pct	integer	最近1秒内存使用百分比
mem_used	bigint	最近1秒内存使用字节
disk_read	bigint	最近1秒磁盘读（字节）
disk_write	bigint	最近1秒磁盘写（字节）
disk_busy	integer	最近1秒磁盘负载（%）
net_send	bigint	最近1秒网络发（字节）
net_recv	bigint	最近1秒网络收（字节）
net_err	bigint	最近1秒网络错误次数

例：查询host上的资源使用

--<1> 查询当前主机的资源使用
select avg_load as load,
     cpu_all as cpu,
     pg_size_pretty(mem_used) as mem,
     mem_pct,
     pg_size_pretty(disk_read) as disk_r,
     pg_size_pretty(disk_write) as disk_w,
     disk_busy as busy,
     pg_size_pretty(net_send) as net_s,
     pg_size_pretty(net_recv) as net_r,
     net_err as net_e
from
     stat_perf.sd_stat_host;

load|cpu| mem |mem_pct|disk_r|disk_w|busy|net_s|net_r |net_e
----+---+-----+-------+------+------+----+-----+------+------
  1 |  2|71 GB|   28  |15 kB |65 kB |  0 |24 MB|298 kB|    0
(1 row)



--<2> 查询所有主机的资源使用
select avg_load as load,
     cpu_all as cpu,
     pg_size_pretty(mem_used) as mem,
     pg_size_pretty(disk_read) as disk_r,
     pg_size_pretty(disk_write) as disk_w,
     disk_busy as busy,
     pg_size_pretty(net_send) as net_s,
     pg_size_pretty(net_recv) as net_r,
     net_err as net_e,
     sc_hostname as host
from
     stat_perf.sd_stat_host {{}};

load|cpu|  mem |disk_r |disk_w |busy|net_s |net_r |net_e|host
----+---+------+-------+-------+----+------+------+-----+-----
  1 | 3 |101 GB|0 bytes|116 kB |   1|65 kB |88 kB |  0  |mpp-1
  4 | 8 |70 GB |0 bytes|0 bytes|   0|519 kB|519 kB|  0  |mpp-2
  1 | 2 |71 GB |0 bytes|0 bytes|   0|323 kB|323 kB|  0  |mpp-3
(3 rows)

如果想查询更长时间的，可以使用函数 stat_perf.sd_stat_host(秒)

使用示例¶

查看集群上各个节点的负载

-- <1> 查看当前活动session数，并查看每个session在各节点的进程数
select sc_hostname,sess_id,sc_dbindex_id,count(*)
from stat_perf.sd_stat_activity{executor}
where sess_id > 0
group by sc_hostname,sess_id,sc_dbindex_id
order by 1,2,3;
sc_hostname |     sess_id      | sc_dbindex_id | count 
-------------+------------------+---------------+-------
 seabox-PC   | 2251799813685313 |             0 |     1
 seabox-PC   | 2251799813685313 |             1 |     1
 seabox-PC   | 2251799813685313 |             2 |     1
 seabox-PC   | 2251799813685313 |             3 |     1
 seabox-PC   | 2251799813685313 |             4 |     1
 seabox-PC   | 2251799813685313 |             5 |     1
 seabox-PC   | 2251799813685313 |             6 |     1
 seabox-PC   | 2251799813685313 |             7 |     1
 seabox-PC   | 2251799813685822 |             0 |     9
 seabox-PC   | 2251799813685822 |             1 |     9
 seabox-PC   | 2251799813685822 |             2 |     9
 seabox-PC   | 2251799813685822 |             3 |     9
 seabox-PC   | 2251799813685822 |             4 |     9
 seabox-PC   | 2251799813685822 |             5 |     9
 seabox-PC   | 2251799813685822 |             6 |     9
 seabox-PC   | 2251799813685822 |             7 |     9
 seabox-PC   | 2251799813685832 |             0 |    25
 seabox-PC   | 2251799813685832 |             1 |    25
 seabox-PC   | 2251799813685832 |             2 |    25
 seabox-PC   | 2251799813685832 |             3 |    25
 seabox-PC   | 2251799813685832 |             4 |    25
 seabox-PC   | 2251799813685832 |             5 |    25
 seabox-PC   | 2251799813685832 |             6 |    25
 seabox-PC   | 2251799813685832 |             7 |    25


-- <2> 查看 sess_id 2251799813685599 在各个节点资源使用情况
select
    sc_dbindex_id as id,
    count(*) c,
    pg_size_pretty(sum(disk_read)) disk_r,
    pg_size_pretty(sum(disk_write)) disk_w,
    pg_size_pretty(sum(ic_send)) ic_s,
    pg_size_pretty(sum(ic_recv)) ic_r,
    sum(ic_retry) r,
    max(cpu_all) cpu,
    pg_size_pretty(sum(mem_rss_used)) rss,
    pg_size_pretty(sum(mem_used))vmem
from
    stat_perf.sd_stat_activity_detail{}
where
    sess_id = 2251799813685822
group by
    sc_dbindex_id
order by sc_dbindex_id;
id| c |  disk_r  |disk_w| ic_s  | ic_r  |r|cpu|  rss  |  vmem   
--+---+----------+------+-------+-------+-+---+-------+---------
-2| 1 |0 bytes   |128 kB|0 bytes|0 bytes|0|  0|81 MB  | 12 MB
-1| 1 |0 bytes   |164 kB|0 bytes|0 bytes|0|  0|136 MB | 20 MB
 0| 9 |348 kB    |128 kB|258 MB |286 MB |4| 52|1449 MB| 1752 MB
 1| 9 |0 bytes   |128 kB|259 MB |286 MB |5| 54|1421 MB| 1668 MB
 2| 9 |4096 bytes|128 kB|257 MB |228 MB |5| 53|1405 MB| 1676 MB
 3| 9 |0 bytes   |128 kB|257 MB |215 MB |6| 48|1439 MB| 1576 MB
 4| 9 |0 bytes   |128 kB|258 MB |300 MB |9| 85|1445 MB| 1604 MB
 5| 9 |8192 bytes|128 kB|258 MB |240 MB |8| 44|1365 MB| 1764 MB
 6| 9 |8192 bytes|128 kB|256 MB |257 MB |3| 53|1364 MB| 1556 MB
 7| 9 |0 bytes   |128 kB|257 MB |271 MB |7| 51|1416 MB| 1664 MB
(10 rows)

--<3> 查看节点5的网络资源使用
select slice,
     pid,
     state,
     pg_size_pretty(ic_send) ic_send,
     pg_size_pretty(ic_recv) ic_recv,
     ic_retry
from
     stat_perf.sd_stat_activity_detail {5}
where
     sess_id = 2251799813685405;

 slice |  pid  | state  | ic_send |  ic_recv   | ic_retry
-------+-------+--------+---------+------------+----------
     0 | 66553 | active | 0 bytes | 0 bytes    |        0
     1 | 66684 | active | 0 bytes | 0 bytes    |        0
     2 | 66701 | active | 0 bytes | 402 kB     |        0
     3 | 68175 | active | 0 bytes | 113 MB     |        0
     4 | 68190 | active | 198 kB  | 3332 bytes |        0
     5 | 68203 | active | 125 MB  | 1673 kB    |        0
     6 | 72506 | idle   | 1043 kB | 72 MB      |        0
     7 | 72524 | idle   | 79 MB   | 22 MB      |        3
     8 | 72542 | idle   | 21 MB   | 522 kB     |        0
(9 rows)

在这个例子中，首先查看了当前集群共有3个session，每个session在每个executor上分别有1，9，25个进程，然后查看了9个进程的session在各节点的内存，cpu，磁盘，网络资源使用总计情况最后分析了在节点5上各个进程的网络收发情况，slice可以结合计划，分析出哪步操作消耗的网络资源多; state可以过滤空闲的进程

分析当前最占磁盘io的sql

-- <1> 找出最近1s 写磁盘最多的session
select sess_id,disk_write, temp_write
from stat_perf.sd_stat_curr_detail{} 
order by 2 desc;

     sess_id      |  disk_write | temp_write
------------------+-------------|-----------------
 2251799813685268 | 55574528    | 0
                0 |   180224    | 0
 2251799813685251 |        0    | 0
 2251799813685253 |        0    | 0
 2251799813685255 |        0    | 0
 2251799813685269 |        0    | 0
 2251799813685250 |        0    | 0
 2251799813685254 |        0    | 0
(8 rows)

-- <2> 查看这个session当前执行的sql是什么
select query from pg_stat_activity where sess_id = 2251799813685268;
                                       query
-----------------------------------------------------------------------
 COPY catalog_sales FROM 'fdw://tpcds_load/tpcds/100s/catalog_sales.dat';
(1 row)


-- <3> 查看这个sql在各个executor节点上累积磁盘，cpu使用情况
select
    sum(disk_read) as read,
    sum(disk_write) as write,
    sum(cpu_all) cpu_all,
    sc_dbindex_id
from stat_perf.sd_stat_activity{executor}
where sess_id = 2251799813685268
group by sc_dbindex_id;

  read  |   write   | cpu_all | sc_dbindex_id
--------+-----------+---------+---------------
      0 | 429953024 |      88 |             0
      0 | 434151424 |      87 |             1
 106496 | 874668032 |      97 |             2
(3 rows)


-- <4> 查看第二个节点的详细资源使用情况
select
     pid,
     slice,
     pg_size_pretty(disk_read)disk_r,
     pg_size_pretty(disk_write) disk_w,
     cpu_all,
     cpu_user,
     cpu_iowait,
     pg_size_pretty(mem_rss_used)mem_rss
from
     stat_perf.sd_stat_activity_detail {2}
where
     sess_id=2251799813685268;

pid  |slice| disk_r|disk_w |cpu_all|cpu_user|cpu_iowait|mem_rss
-----+-----+-------+-------+-------+--------+----------+---------
24511|  0  |0 bytes|476 MB |  42   |    34  |    0     | 173 MB
28939|  1  |0 bytes|0 bytes|  57   |    51  |    0     | 39 MB
(2 rows)

在这个例子中，首先定位最近1s磁盘使用最大的sql；定位后发现是在执行copy操作（符合预期）；因为是copy操作，所以在磁盘写时没有产生临时文件（符合预期）再分析详细的资源发现cpu的 io wait 并不大，表明这条sql的瓶颈不在磁盘上，反而是user cpu占比最大。分析可能的因素是加载时的block排序造成的。

分析集群中内存使用最多的节点

-- <1> 找出最占内存的节点
select
     sc_hostname,
     sc_dbindex_id,
     pg_size_pretty(sum(mem_rss_used)) rss_mem
from
     stat_perf.sd_stat_activity_detail{executor}
group by
     sc_dbindex_id order by 2 desc;

sc_dbindex_id |  rss_mem   
--------------+------------
            7 | 1197 MB
            9 | 616 MB
           10 | 600 MB
            4 | 643 MB
            8 | 625 MB
           11 | 611 MB
            1 | 624 MB
            2 | 696 MB
            5 | 661 MB
            3 | 655 MB
            0 | 663 MB
            6 | 645 MB
(12 rows)


-- <2> 分析节点7最占资源的session
select sess_id,cpu_all,pg_size_pretty(mem_rss_used) as mem_rss_used
from stat_perf.sd_stat_activity_detail{7};
     sess_id      | cpu_all | mem_rss_used 
------------------+---------+--------------
                0 |       0 |    29 MB
 2251799813685266 |      99 |    155 MB
 2251799813685274 |       0 |    139 MB
                0 |       0 |    29 MB
                0 |       0 |    679 MB
                0 |       0 |    165 MB
(6 rows)

-- <3> 查看session对应的sql
select query from pg_stat_activity where sess_id = 2251799813685266;
                      query                       
--------------------------------------------------
 select count(*) from t1 x, t1 y where x.a = y.a;


-- <4> 查看表的分布情况--数据都倾斜到了第7个节点
select sc_dbindex_id,count(*) from t1{} group by sc_dbindex_id;
 sc_dbindex_id | count  
---------------+--------
             7 | 100000
(1 row)

在这个例子中，先是找到了最繁忙的节点，在该节点上分析所有查询，发现是由于表倾斜造成的单个节点繁忙

##### 分析VE引擎使用的内存

-- <0> 后台执行查询
select count(*) from lineitem, orders where l_orderkey=o_orderkey ;
-- 他的计划为
             QUERY PLAN
------------------------------------------------------------------------------
 Finalize Aggregate
   ->  Gather Motion 3:1  (slice1; executors: 3)
         ->  Partial Aggregate
               ->  Hash Join
                     Hash Cond: (orders.o_orderkey = lineitem.l_orderkey)
                     ->  Seq Scan on orders
                     ->  Hash
                           ->  Seq Scan on lineitem
 Optimizer: SeaboxSQL query optimizer
(9 rows)


-- <1> ve_mem_per_hash_table=512MB 时执行后台查先，执行完成后查看节点0的内存使用
select
    pg_size_pretty(mem_peak) as total,
    pg_size_pretty(mem_peak_ve) as ve,
    pg_size_pretty(ve_arena_peak) as ve_arena,
    pg_size_pretty(ve_hash_peak) as ve_hash,
    pg_size_pretty(ve_col_peak) as ve_col,
    pg_size_pretty(ve_buf_peak) as ve_buf
from stat_perf.sd_stat_activity_detail{0};

 total  |   ve    | ve_arena | ve_hash | ve_col | ve_buf 
--------+---------+----------+---------+--------+--------
 876 MB | 1004 MB | 44 kB    | 632 MB  | 276 MB | 96 MB
(1 row)

-- <2> ve_mem_per_hash_table=128MB 时执行后台查先，执行完成后查看节点0的内存使用
select
    pg_size_pretty(mem_peak) as total,
    pg_size_pretty(mem_peak_ve) as ve,
    pg_size_pretty(ve_arena_peak) as ve_arena,
    pg_size_pretty(ve_hash_peak) as ve_hash,
    pg_size_pretty(ve_col_peak) as ve_col,
    pg_size_pretty(ve_buf_peak) as ve_buf
from stat_perf.sd_stat_activity_detail{0}
where slice>0;
 total  |   ve   | ve_arena | ve_hash | ve_col | ve_buf 
--------+--------+----------+---------+--------+--------
 400 MB | 498 MB | 48 kB    | 258 MB  | 105 MB | 135 MB
(1 row)

在这个例子中，有时看到ve使用的总内存比全局总内存略多，这是由于统计误差造成的。在<1>和<2>里，由于hash表设置不同，导致ve中的hash表内存变化较大。

分析比较各主机的负载

-- <1> 查看最近1s各主机的负载
select disk_busy,sc_hostname from stat_perf.sd_stat_host {{}};
  disk_busy | sc_hostname 
------------+-------------
          0 | mpp-1
         13 | mpp-2
          3 | mpp-3
(3 rows)

-- <2> 查看最近5s各主机的负载
select disk_busy,sc_hostname from stat_perf.sd_stat_get_host(5) {{}};
disk_busy | sc_hostname 
----------+-------------
        0 | mpp-1
       33 | mpp-2
       29 | mpp-3
(3 rows)

-- <3> 查看每个主机的活动sql数
select sc_hostname, count(*)
from pg_stat_activity {executor}
where state != 'idle' group by sc_hostname;
 sc_hostname | count 
-------------+-------
 mpp-2       |    85
 mpp-1       |    65
 mpp-3       |    59
(3 rows)

这个例子中，从最近1s和最近5s看都是mpp-2磁盘最繁忙，再通过查询获得sql数得知mpp-2上的sql最多。

查看溢出临时文件¶

stat_perf.__sd_workfile() 函数

返回当前节点的溢出临时文件信息（函数）

Column	Type	说明
executorid	integer	executor ID
prefix	text	溢出文件前缀
size	bigint	workfile的大小（单位：字节）
optype	text	创建workfile的查询算子类型
slice	integer	查询计划切片。查询计划中正在执行的部分。
sessionid	bigint	会话ID
commandid	integer	查询的命令ID
numfiles	integer	创建的文件数。

stat_perf.sd_workfile 视图

返回当前节点的溢出临时文件信息（函数）

Column	Type	说明
sessionid	bigint	会话ID
slice	integer	查询计划切片。查询计划中正在执行的部分。
commandid	integer	查询的命令ID
prefix	text	溢出文件前缀
size	bigint	workfile的大小（单位：字节）
optype	text	创建workfile的查询算子类型
numfiles	integer	创建的文件数

stat_perf.sd_workfile_with_query 视图

获取当前节点的查询的溢出临时文件信息

Column	Type	说明
pid	integer	进程ID
sess_id	bigint	会话ID
usename	name	用户名
datname	name	数据库名
state	text	查询状态
cmd_id	integer	查询的命令ID
query	text	SQL
slice	integer	查询计划切片
size	bigint	workfile的大小（单位：字节）
numfiles	integer	创建的文件数

stat_perf.sd_workfile_per_query 视图

以查询和分片为单位汇总当前节点的溢出临时文件信息

Column	Type	说明
datid	oid	数据库ID
datname	name	数据库名
pid	integer	进程ID
sess_id	bigint	会话ID
usesysid	oid	用户ID
usename	name	用户名
application_name	text	应用名称
client_addr	inet	客户端地址
client_hostname	text	客户端主机名
client_port	integer	客户端端口号
backend_start	timestamp with time zone	后端连接开始时间
xact_start	timestamp with time zone	事务开始时间
query_start	timestamp with time zone	查询开始时间
state_change	timestamp with time zone	状态变更时间
wait_event_type	text	等待时间类型
wait_event	text	等待的事件
state	text	查询状态
backend_xid	xid	后端xid
backend_xmin	xid	后端xmin
query	text	SQL
backend_type	text	后端类型
rsgid	oid	资源组ID
rsgname	text	资源组名
processed_rows	bigint	影响行数
scanned_rows	bigint	扫描行数
mem_used	bigint	内存使用量（字节）
mem_peak	bigint	内存峰值（字节）
slice	integer	查询计划切片
cmd_id	integer	查询的命令ID
size	numeric	workfile的大小（单位：字节）
numfiles	bigint	创建的文件数

stat_perf.sd_stat_active_backend_queue

查询后端运行队列统计信息，后端运行队列长度参数，请参考sc_active_backend_resqueue_length。

Column	Type	说明
num_config_local	integer	`sc_active_backend_resqueue_length`当前配置值
num_config_global	integer	`sc_global_active_backend_resqueue_length`当前配置值
num_config_actual	integer	当前实际生效的“后端允许执行事务的队列长度”
num_running	integer	当前获得队列锁进入执行状态的事务个数
num_queueing	integer	当前进入后端排队状态的事务个数
num_queued	bigint	从数据库启动到现在，总共发生过的排队次数。
duration_queued	interval	从数据库启动到现在，所有排队的总排队时长。

说明：

num_running：“running“是指：事务获得队列锁进入允许执行状态。获准执行后，事务实际状态可能是“idle in transation”等空闲状态。注意：当sc_active_backend_resqueue_length参数为-1，或者超级用户bypass了查询，则不增加该计数。
num_queued：一旦事务进入过“排队状态”，本计数就加1，计数从进入“排队状态”时功能。
duration_queued：一个事务如果进入“排队状态”，则排队计时开始，排队结束时才更新本计数，因此，正在“排队状态”的查询不会实时更新本计数。

查看正在运行的计划及执行状态¶

sd_queryplan_running_pid函数

输入参数为pid，返回对应pid上正在执行的SQL的计划文本。需要track_queryplan为on。

Column	Type	说明
ssid	bigint	session id
ccnt	integer	command count
nid	integer	计划节点序号
pid	integer	进程号
row_number	integer	计划文本行号
plan	text	计划文本

示例：

seaboxsql=# select * from sd_queryplan_running_pid(25166);
 ssid | ccnt | nid |  pid  | row_number |                                       plan                                        
------+------+-----+-------+------------+-----------------------------------------------------------------------------------
    6 |   32 |   0 | 25166 |          0 | Merge Join  (cost=1587.29..12103.72 rows=523264 width=12 plan_node_id=0)
    6 |   32 |     | 25166 |          1 |   Merge Cond: (t1.a = t2.a)
    6 |   32 |   1 | 25166 |          2 |   ->  Sort  (cost=793.64..819.22 rows=10230 width=4 plan_node_id=1)
    6 |   32 |     | 25166 |          3 |         Sort Key: t1.a
    6 |   32 |   2 | 25166 |          4 |         ->  Seq Scan on t1  (cost=0.00..112.30 rows=10230 width=4 plan_node_id=2)
    6 |   32 |   3 | 25166 |          5 |   ->  Sort  (cost=793.64..819.22 rows=10230 width=4 plan_node_id=3)
    6 |   32 |     | 25166 |          6 |         Sort Key: t2.a
    6 |   32 |   4 | 25166 |          7 |         ->  Seq Scan on t2  (cost=0.00..112.30 rows=10230 width=4 plan_node_id=4)
(8 rows)

sd_queryplan_running_sessid函数

输入参数为session id，返回对应pid上正在执行的SQL的计划文本。需要track_queryplan为on。其他与sd_queryplan_running_pid函数相同。

sd_queryplan_status_detail函数输入参数为session_id，返回查询的执行状态信息。需要track_queryplan_status为on。

Column	Type	说明
ssid	bigint	session id
ccnt	integer	command count
nid	integer	计划节点序号
pid	integer	进程号
segid	integer	节点号
tmid	bigint	节点启动时间的int格式
tuplecount	bigint	计划节点总返回的行数
nloops	integer	计划节点的第几次循环
ntuples	bigint	计划文本每轮循环返回的行数
is_running	bool	计划当前计划节点是否正在执行
is_finished	bigint	计划当前计划节点是否结束
is_start	bigint	计划当前计划节点是否开始
node_type	bigint	计划节点的类型编号，可通过`sd_nodetypeid2name`查看

示例：

seaboxsql=# select *, sd_nodetypeid2name(node_type) from sd_queryplan_status_detail(6);
 ssid | ccnt | nid |  pid  | segid |      tmid       | tuplecount | nloops | ntuples | is_running | is_finished | is_start | node_type | sd_nodetypeid2name 
------+------+-----+-------+-------+-----------------+------------+--------+---------+------------+-------------+----------+-----------+--------------------
    6 |   32 |   2 | 25166 |     0 | 747815105386183 |       1000 |      0 |       0 | t          | t           | t        |       113 | Seq Scan
    6 |   32 |   1 | 25166 |     0 | 747815105386183 |         55 |      0 |       0 | t          | f           | t        |       134 | Sort
    6 |   32 |   4 | 25166 |     0 | 747815105386183 |       1000 |      0 |       0 | t          | t           | t        |       113 | Seq Scan
    6 |   32 |   3 | 25166 |     0 | 747815105386183 |         55 |      0 |       0 | t          | f           | t        |       134 | Sort
    6 |   32 |   0 | 25166 |     0 | 747815105386183 |         54 |      0 |       0 | t          | f           | t        |       131 | Merge Join
(5 rows)

说明： 1. node_type表示的计划文本为大致的计划文本，只能标明大致的算子类型，例如哈希聚集和分组聚集具有相同的编号。

sd_queryplan_status函数

输入参数为session id, 可以查看计划文本及执行状态。需要track_queryplan及track_queryplan_status均为on的情况下使用。

Column	Type	说明
pid	integer	coordinator上的进程号
nid	integer	计划节点号
plan	text	计划文本
tuplecount	bigint	从当前计划节点返回的行数
is_start	bool	当前计划节点是否开始
is_finished	bool	当前计划节点是否结束

示例：

seaboxsql=# select * from sd_queryplan_status(6);
  pid  | nid |                                       plan                                        | tuplecount | is_start | is_finished 
-------+-----+-----------------------------------------------------------------------------------+------------+----------+-------------
 25166 |   0 | Merge Join  (cost=1587.29..12103.72 rows=523264 width=12 plan_node_id=0)          |         29 | t        | f
 25166 |     |   Merge Cond: (t1.a = t2.a)                                                       |            |          | 
 25166 |   1 |   ->  Sort  (cost=793.64..819.22 rows=10230 width=4 plan_node_id=1)               |         30 | t        | f
 25166 |     |         Sort Key: t1.a                                                            |            |          | 
 25166 |   2 |         ->  Seq Scan on t1  (cost=0.00..112.30 rows=10230 width=4 plan_node_id=2) |       1000 | t        | t
 25166 |   3 |   ->  Sort  (cost=793.64..819.22 rows=10230 width=4 plan_node_id=3)               |         30 | t        | f
 25166 |     |         Sort Key: t2.a                                                            |            |          | 
 25166 |   4 |         ->  Seq Scan on t2  (cost=0.00..112.30 rows=10230 width=4 plan_node_id=4) |       1000 | t        | t
(8 rows)