培训啦 IT技术

大数据培训_HIve常用的基本语句总结

教培参考

教育培训行业知识型媒体

发布时间: 2024年11月25日 22:32

2024年【IT技术】报考条件/培训费用/专业咨询 >>

IT技术报考条件是什么？IT技术培训费用是多少？IT技术专业课程都有哪些？

点击咨询

大数据培训_HIve常用的基本语句总结

本次记录hive当中最常用的一些语句，助你在hive操作能快速入手。通过常用的SQL语句能对Hive更加理解。本次实例都是一些针对表的管理。包括外部表，内部表，分区表，分桶表。

大数据培训

简单的hive sql语句

#查询数据库
show databases;
#查询表
show tables;
#使用数据库
use database_name;
#查看表结构
desc table_name;
#删除表
drop table table_name;

创建表

格式：

CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name
[(col_name data_type [COMMENT col_comment],...)]
[COMMENT table_comment]
[PARTITIonED BY (col_name data_type [COMMENT col_comment],...)]
[CLUSTERED BY (col_name,col_name,...)
[SORTED BY (col_name [ASC|DESC],...)] INTO num_buckets BUCKETS]
[ROW FORMAT row_format]
[STORED AS file_format]
[LOCATION hdfs_path]

# 字段解释：
[external] # 申明是否为外部表

[if not exists] # 如果表存在，则不创建了

[(col_name data_type [COMMENT col_comment],...)] # 分别是字段名，字段类型，字段注释

[COMMENT table_comment] # 放在()后面，是表注释

[PARTITIonED BY (col_name data_type [COMMENT col_comment],...)] # 建立分区，()里是字段名，字段类型，字段注释，分区里的字段不能包含在字段声明区。

[CLUSTERED BY (col_name,col_name,...)
[SORTED BY (col_name [ASC|DESC],...)] INTO num_buckets BUCKETS] # 在clustered by 里的字段上建立 num_buckets个桶，记录是由 sorted by里的字段排序的。

[ROW FORMAT row_format] # 指定分隔符，可以是以下几个：
: DELIMITED [FIELDS TERMINATED BY char]
[COLLECTION ITEMS TERMINATED BY char]
[MAP KEYS TERMINATED BY char]

[STORED AS file_format] # 指定load文件的类型，分为：
textfile 是纯文本文件
sequence 是压缩文件

[LOCATION hdfs_path] # 向表里加载数据，hdfs_path是一个hdfs上的目录，不能是文件，hive会依据默认配置的hdfs路径，自动将整个目录下的文件都加载到表中。

案例实操

（1）普通创建表

　　create table if not exists student2(
id int,name string
)
row format delimited fields terminated by 't'
stored as textfile
location '/user/hive/warehouse/student2';

（2）根据查询结果创建表（查询的结果会添加到新创建的表中）

create table if not exists student3
as select id,name from student;

（3）根据已经存在的表结构创建表

create table if not exists student4 like student;

（4）查询表的类型
hive (default)> desc formatted student2;
Table Type: MANAGED_TABLE

外部表
1）理论
因为表是外部表，所有 Hive 并非认为其完全拥有这份数据。删除该表并不会删除掉这份数据，不过描述表的元数据信息会被删除掉。
2）管理表和外部表的使用场景：
每天将收集到的网站日志定期流入 HDFS 文本文件。在外部表（原始日志表）的基础上做大量的统计分析，用到的中间表、结果表使用内部表存储，数据通过 SELECT+INSERT进入内部表

使用复杂类型创建表

create external table if not exists T2(
id int,
course array<string>,
score map<string,int>
)
row format
delimited fields terminated by ','
collection items terminated by '|'
map keys terminated by ':'
stored as textfile;
# 数据文件内容
1001,语文|数学|英语,语文|56,语文:102|数学:2033|英语:30
1002,语文|数学|英语,语文|156,语文:120|数学:2033|英语:30
1003,语文|数学|英语,语文|1156,语文:210|数学:3320|英语:30
1004,语文|数学|英语,语文|1156,语文:2210|数学:203|英语:30
1005,语文|数学|英语,语文|5116,语文:22210|数学:230|英语:30
# 导入数据文件
load data local inpath '/home/datanode/hiveTest/test01' overwrite into table t2;

创建一个带分区的内部表

create table if not exists T3(
id int,
name string
)
partitioned by (classid int)
row format
delimited fields terminated by ','
stored as textfile;

创建一个带桶的内部表

create table T4(
id int ,
name string,
sex string,
age int
)
partitioned by (city string)
clustered by(age) sorted by(name) into 5 buckets
row format
delimited fields terminated by ','
stored as textfile;

以上是hive针对表进行操作。请把每一个实例做个笔记。方便日常快速找到。

985大学 211大学全国院校对比专升本

温馨提示：

本文【大数据培训_HIve常用的基本语句总结】由作者教培参考提供。该文观点仅代表作者本人，培训啦系信息发布平台，仅提供信息存储空间服务，若存在侵权问题，请及时联系管理员或作者进行删除。

上一篇: 为什么选华为认证

下一篇: 华为存储认证有用吗