Pg BRIN索引-白红宇的个人博客

Pg BRIN索引

发布日期：2021-06-29 18:40:54 浏览次数：3 分类：技术文章

本文共 4362 字，大约阅读时间需要 14 分钟。

文章目录

9.5增加brin索引，
- 存储被索引字段在块级别的边界值（最大值、最小）及其他统计信息。

原理简单，扫描BRIN的元数据，
- 根据元数据和用户输入的条件比较，
- 过滤不符合条件的HEAPPAGE，
- 只扫描需要扫描的HEAPPAGE

brin索引仅仅适合存储与值线性相关性很好的列

例子：

创两张表，一张表插入线性相关很好的数据，
- 另一张插入离散的数据来测试brin索引的性能

—建表插入数据：

bill=# create table t_brin1(id int);CREATE TABLEbill=# create table t_brin2(id int); CREATE TABLEbill=# insert into t_brin1 select random()*1000000 from generate_series(1,10000000);INSERT 0 10000000bill=# insert into t_brin2 select generate_series(1,10000000);INSERT 0 10000000

在两张表上都创建brin索引

bill=# create index idx_t_brin1 on t_brin1 using brin(id);CREATE INDEXbill=# create index idx_t_brin2 on t_brin2 using brin(id);  CREATE INDEX

pg_stats表查看两张表索引列的线性相关性
- t_brin1表中数据较离散，
- t_brin2中数据线性相关性很好。

bill=# select correlation from pg_stats where tablename='t_brin1';  correlation  --------------- -0.0061385944(1 row)bill=# select correlation from pg_stats where tablename='t_brin2'; correlation -------------           1(1 row)

测试性能

t_brin1表：

bill=# explain (analyze ,verbose,timing,costs,buffers) select * from t_brin1 where id between 1 and 100000;                                                            QUERY PLAN                                                             ----------------------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on bill.t_brin1  (cost=262.18..194509.84 rows=988836 width=4) (actual time=1.332..1244.478 rows=1000823 loops=1)   Output: id   Recheck Cond: ((t_brin1.id >= 1) AND (t_brin1.id <= 100000))   Rows Removed by Index Recheck: 8999177   Heap Blocks: lossy=44248   Buffers: shared hit=16324 read=27932 written=31   ->  Bitmap Index Scan on idx_t_brin1  (cost=0.00..14.97 rows=9999977 width=0) (actual time=1.299..1.299 rows=442880 loops=1)         Index Cond: ((t_brin1.id >= 1) AND (t_brin1.id <= 100000))         Buffers: shared hit=8 Planning Time: 0.083 ms Execution Time: 1302.285 ms(11 rows)

t_brin2表：

bill=# explain (analyze ,verbose,timing,costs,buffers) select * from t_brin2 where id between 1 and 100000;                                                         QUERY PLAN                                                          ----------------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on bill.t_brin2  (cost=26.92..46009.03 rows=91581 width=4) (actual time=0.166..16.945 rows=100000 loops=1)   Output: id   Recheck Cond: ((t_brin2.id >= 1) AND (t_brin2.id <= 100000))   Rows Removed by Index Recheck: 15712   Heap Blocks: lossy=512   Buffers: shared hit=1 read=513   ->  Bitmap Index Scan on idx_t_brin2  (cost=0.00..4.03 rows=115607 width=0) (actual time=0.142..0.142 rows=5120 loops=1)         Index Cond: ((t_brin2.id >= 1) AND (t_brin2.id <= 100000))         Buffers: shared hit=1 read=1 Planning Time: 0.274 ms Execution Time: 22.391 ms(11 rows)

t_brin2表上时间少了60多倍！

t_brin1数据分散比较离散，所以用brin索引扫描时必然要过滤掉大量的数据(过滤8999177条记录)，自然性能就差很多了。

t_brin1的情况就类似我们收网上来一看渔网中鱼没几条，而大部分都是不需要的垃圾，而t_brin2的情况恰恰相反，一渔网拉上来全部是鱼。

继续优化

参数pages_per_range来近一步提升brin索引的性能。
- 每多少个数据块统计一次边界
- 默认128
  - 每128个数据块统计一次边界。

BRIN索引的精确度。

pages_per_range=1，说明边界精确到1个数据块。越小，精度越高，过滤性就越好（注意过滤性越好取决于列的线性相关性很好的情况下，否则就是白瞎）

BRIN索引本身的大小。pages_per_range越小，BRIN索引本身就越大。BRIN越大，单次走BRIN索引扫描BRIN块的成本就越高。
所以可以设置合适的pages_per_range来优化上面查询。

bill=# drop index idx_t_brin2;DROP INDEXbill=# create index idx_t_brin2 on t_brin2 using brin(id) with (pages_per_range=16);CREATE INDEXbill=# explain (analyze ,verbose,timing,costs,buffers) select * from t_brin2 where id between 1 and 100000;                                                         QUERY PLAN                                                          ----------------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on bill.t_brin2  (cost=34.50..44354.49 rows=82814 width=4) (actual time=0.662..14.559 rows=100000 loops=1)   Output: id   Recheck Cond: ((t_brin2.id >= 1) AND (t_brin2.id <= 100000))   Rows Removed by Index Recheck: 1248   Heap Blocks: lossy=448   Buffers: shared hit=462   ->  Bitmap Index Scan on idx_t_brin2  (cost=0.00..13.79 rows=83153 width=0) (actual time=0.648..0.648 rows=4480 loops=1)         Index Cond: ((t_brin2.id >= 1) AND (t_brin2.id <= 100000))         Buffers: shared hit=14 Planning Time: 0.151 ms Execution Time: 20.102 ms(11 rows)

转载地址：https://cyj666.blog.csdn.net/article/details/104923272 如侵犯您的版权，请留言回复原文章的地址，我们会给您删除此文章，给您带来不便请您谅解！

上一篇：3月19日

下一篇：static void gpuscan_add_scan_path(PlannerInfo *root, RelOptInfo *baserel, Index rtinde

发表评论

关于作者

喝酒易醉，品茶养心，人生如梦，品茶悟道，何以解忧？唯有杜康！

-- 愿君每日到此一游！

文章目录

例子：

测试性能

发表评论

最新留言

关于作者

推荐文章