Hive 基本语法操练（二）：视图和索引操作-白红宇的个人博客

发布日期：2021-10-09 07:57:17 浏览次数：7 分类：技术文章

本文共 10406 字，大约阅读时间需要 34 分钟。

1. 视图操作

1) 创建一个测试表。

hive> create table test(id int,name string);OKTime taken: 0.385 secondshive> desc test;OKid                      int                                         name                    string                                      Time taken: 0.261 seconds, Fetched: 2 row(s)

2) 基于表 test 创建一个 test_view 视图。

hive> create view test_view(id,name_length) as select id,length(name) from test;

3) 查看 test_view 视图属性。

hive> desc test_view;OKid                      int                                         name_length             int                                         Time taken: 0.071 seconds, Fetched: 2 row(s)

4) 查看视图结果。

hive> select * from test_view;

2. 索引操作

1) Hive 创建索引。

hive> create table user like group_test;OKTime taken: 0.232 secondshive> create index user_index on table user(uid) as 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' with deferred rebuild IN TABLE user_index_table;OKTime taken: 0.183 seconds

2) 更新数据。

hive> alter index user_index on user rebuild;Query ID = hadoop_20180518043232_ebdf97bd-5984-4310-a3c8-6befef328133Total jobs = 1Launching Job 1 out of 1Number of reduce tasks not specified. Estimated from input data size: 1In order to change the average load for a reducer (in bytes):  set hive.exec.reducers.bytes.per.reducer=
   
    In order to limit the maximum number of reducers:  set hive.exec.reducers.max=
    
     In order to set a constant number of reducers:  set mapreduce.job.reduces=
     
      Starting Job = job_1526553207632_0018, Tracking URL = http://master:8088/proxy/application_1526553207632_0018/Kill Command = /opt/modules/hadoop-2.6.0/bin/hadoop job  -kill job_1526553207632_0018Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 12018-05-18 04:32:55,632 Stage-1 map = 0%,  reduce = 0%2018-05-18 04:33:04,400 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.65 sec2018-05-18 04:33:12,406 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 2.93 secMapReduce Total cumulative CPU time: 2 seconds 930 msecEnded Job = job_1526553207632_0018Loading data to table default.user_index_tableTable default.user_index_table stats: [numFiles=1, numRows=0, totalSize=0, rawDataSize=0]MapReduce Jobs Launched: Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 2.93 sec   HDFS Read: 290 HDFS Write: 50 SUCCESSTotal MapReduce CPU Time Spent: 2 seconds 930 msecOKTime taken: 25.609 seconds

3) 查看索引

hive> show index on user;OKuser_index              user                    uid                     user_index_table        compact                 Time taken: 0.046 seconds, Fetched: 1 row(s)

4) 删除索引

hive> drop index user_index on user;                                                                                                                         OK   Time taken: 0.094 secondshive> show index on user;           OKTime taken: 0.036 seconds

5) 创建表和索引案例

hive> create table index_test(id INT,name STRING) PARTITIONED BY (dt STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';

创建一个索引测试表 index_test，dt作为分区属性，“ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘,’” 表示用逗号分割字符串。

6) 创建一个临时索引表 index_tmp。

hive> create table index_tmp(id INT,name STRING,dt STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';

7) 加载本地数据到 index_tmp 表中。

[hadoop@master test]$ sudo vim test.txt02,female,192.168.1.301,male,192.168.1.2603,male,192.168.1.508,female,192.168.1.6204,male,192.168.1.9hive> load data local inpath '/home/hadoop/test/test.txt' into table index_tmp;Loading data to table default.index_tmpTable default.index_tmp stats: [numFiles=1, totalSize=106]OKTime taken: 0.224 secondshive> select * from index_tmp;OK2   female  192.168.1.31   male    192.168.1.263   male    192.168.1.58   female  192.168.1.624   male    192.168.1.9

设置 Hive 的索引属性来优化索引查询，命令如下。

hive> set hive.exec.dynamic.partition.mode=nonstrict;----设置所有列为 dynamic partitionhive> set hive.exec.dynamic.partition=true;----使用动态分区

8) 查询index_tmp 表中的数据，插入 table_test 表中。

hive> insert overwrite table index_test partition(dt) select id,name,dt from index_tmp;Query ID = hadoop_20180518044343_97e7fe67-a5a1-408b-be8e-e9dadb2f9e48Total jobs = 3Launching Job 1 out of 3Number of reduce tasks is set to 0 since there's no reduce operatorStarting Job = job_1526553207632_0019, Tracking URL = http://master:8088/proxy/application_1526553207632_0019/Kill Command = /opt/modules/hadoop-2.6.0/bin/hadoop job  -kill job_1526553207632_0019Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 02018-05-18 04:43:42,621 Stage-1 map = 0%,  reduce = 0%2018-05-18 04:43:48,835 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.87 secMapReduce Total cumulative CPU time: 870 msecEnded Job = job_1526553207632_0019Stage-4 is selected by condition resolver.Stage-3 is filtered out by condition resolver.Stage-5 is filtered out by condition resolver.Moving data to: hdfs://ns/tmp/hive/hadoop/9f7dd0d3-a14c-4535-9291-557b9cb6259b/hive_2018-05-18_04-43-36_337_559705388802402645-1/-ext-10000Loading data to table default.index_test partition (dt=null)     Time taken for load dynamic partitions : 278    Loading partition {dt=192.168.1.62}    Loading partition {dt=192.168.1.3}    Loading partition {dt=192.168.1.5}    Loading partition {dt=192.168.1.26}    Loading partition {dt=192.168.1.9}     Time taken for adding to write entity : 0Partition default.index_test{dt=192.168.1.26} stats: [numFiles=1, numRows=1, totalSize=7, rawDataSize=6]Partition default.index_test{dt=192.168.1.3} stats: [numFiles=1, numRows=1, totalSize=9, rawDataSize=8]Partition default.index_test{dt=192.168.1.5} stats: [numFiles=1, numRows=1, totalSize=7, rawDataSize=6]Partition default.index_test{dt=192.168.1.62} stats: [numFiles=1, numRows=1, totalSize=9, rawDataSize=8]Partition default.index_test{dt=192.168.1.9} stats: [numFiles=1, numRows=1, totalSize=7, rawDataSize=6]MapReduce Jobs Launched: Stage-Stage-1: Map: 1   Cumulative CPU: 0.87 sec   HDFS Read: 308 HDFS Write: 338 SUCCESSTotal MapReduce CPU Time Spent: 870 msecOKTime taken: 15.225 seconds

9) 使用 index_test 表，在属性 id 上创建一个索引 index1_index_test 。

hive> create index index1_index_test on table index_test(id) as 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' WITH DEFERRED REBUILD; OKTime taken: 0.109 seconds

10) 填充索引数据。

hive> alter index index1_index_test on index_test rebuild;Query ID = hadoop_20180518044545_edc98c3b-03eb-42c0-83d1-ba266d8497e0Total jobs = 5（省略MapReduce过程）Ended Job = job_1526553207632_0024Loading data to table default.default__index_test_index1_index_test__ partition (dt=192.168.1.26)Loading data to table default.default__index_test_index1_index_test__ partition (dt=192.168.1.3)Loading data to table default.default__index_test_index1_index_test__ partition (dt=192.168.1.5)Loading data to table default.default__index_test_index1_index_test__ partition (dt=192.168.1.62)Loading data to table default.default__index_test_index1_index_test__ partition (dt=192.168.1.9)Partition default.default__index_test_index1_index_test__{dt=192.168.1.26} stats: [numFiles=1, numRows=0, totalSize=70, rawDataSize=0]Partition default.default__index_test_index1_index_test__{dt=192.168.1.3} stats: [numFiles=1, numRows=0, totalSize=69, rawDataSize=0]Partition default.default__index_test_index1_index_test__{dt=192.168.1.5} stats: [numFiles=1, numRows=0, totalSize=69, rawDataSize=0]Partition default.default__index_test_index1_index_test__{dt=192.168.1.62} stats: [numFiles=1, numRows=0, totalSize=70, rawDataSize=0]Partition default.default__index_test_index1_index_test__{dt=192.168.1.9} stats: [numFiles=1, numRows=0, totalSize=69, rawDataSize=0]MapReduce Jobs Launched: Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 2.66 sec   HDFS Read: 226 HDFS Write: 191 SUCCESSStage-Stage-5: Map: 1  Reduce: 1   Cumulative CPU: 1.94 sec   HDFS Read: 227 HDFS Write: 187 SUCCESSStage-Stage-9: Map: 1  Reduce: 1   Cumulative CPU: 2.15 sec   HDFS Read: 225 HDFS Write: 187 SUCCESSStage-Stage-13: Map: 1  Reduce: 1   Cumulative CPU: 1.9 sec   HDFS Read: 228 HDFS Write: 191 SUCCESSStage-Stage-17: Map: 1  Reduce: 1   Cumulative CPU: 2.73 sec   HDFS Read: 225 HDFS Write: 187 SUCCESSTotal MapReduce CPU Time Spent: 11 seconds 380 msecOKTime taken: 196.607 seconds

11) 查看创建的索引。

hive> show index on index_test

12) 查看分区信息。

hive> show partitions index_test;

13) 查看索引数据。

[hadoop@master hadoop-2.6.0]$ hadoop fs -ls /user/hive/warehouse/default__index_test_index1_index_test__18/05/18 04:52:21 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicableFound 5 itemsdrwxr-xr-x   - hadoop supergroup          0 2018-05-18 04:46 /user/hive/warehouse/default__index_test_index1_index_test__/dt=192.168.1.26drwxr-xr-x   - hadoop supergroup          0 2018-05-18 04:46 /user/hive/warehouse/default__index_test_index1_index_test__/dt=192.168.1.3drwxr-xr-x   - hadoop supergroup          0 2018-05-18 04:46 /user/hive/warehouse/default__index_test_index1_index_test__/dt=192.168.1.5drwxr-xr-x   - hadoop supergroup          0 2018-05-18 04:48 /user/hive/warehouse/default__index_test_index1_index_test__/dt=192.168.1.62drwxr-xr-x   - hadoop supergroup          0 2018-05-18 04:48 /user/hive/warehouse/default__index_test_index1_index_test__/dt=192.168.1.9

14) 删除索引。

hive> drop index index1_index_test on index_test;OKTime taken: 4.842 secondshive> show index on index_test;OKTime taken: 0.031 seconds

15) 索引数据也被删除。

[hadoop@master hadoop-2.6.0]$ hadoop fs -ls /user/hive/warehouse/default__index_test_index1_index_test__18/05/18 04:53:52 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicablels: `/user/hive/warehouse/default__index_test_index1_index_test__': No such file or directory

&16) 修改配置文件信息。

[hadoop@master hadoop-2.6.0]$ cd /opt/modules/hive1.0.0/conf/[hadoop@master conf]$ sudo vim hive-site.xml（/+字符串快速查找）
       
    
     hive.optimize.index.filter
        
    
     true
    
   
       
    
     hive.optimize.index.groupby
        
    
     true< /value>
    
   
       
    
     hive.optimize.index.filter.compact.minsize
        
    
     5120

hive.optimize.index.filter 和 hive.optimize.index.groupby 参数默认是 false。使用索引的时候必须把这两个参数开启，才能起到作用。

hive.optimize.index.filter.compact.minsize 参数为输入一个紧凑的索引将被自动采用最小尺寸、默认5368709120（以字节为单位）。

以上就是博主为大家介绍的这一板块的主要内容，这都是博主自己的学习过程，希望能给大家带来一定的指导作用，有用的还望大家点个支持，如果对你没用也望包涵，有错误烦请指出。如有期待可关注博主以第一时间获取更新哦，谢谢！　

转载地址：https://blog.csdn.net/py_123456/article/details/80361367 如侵犯您的版权，请留言回复原文章的地址，我们会给您删除此文章，给您带来不便请您谅解！

上一篇：Hive 基本语法操练（三）：分区操作和桶操作

下一篇：Hive 基本语法操练（一）：表操作

发表评论

关于作者

喝酒易醉，品茶养心，人生如梦，品茶悟道，何以解忧？唯有杜康！

-- 愿君每日到此一游！

1. 视图操作

2. 索引操作

发表评论

最新留言

关于作者

推荐文章