Skip to content

Latest commit

 

History

History
executable file
·
128 lines (115 loc) · 5.71 KB

9.子查询和关联表.md

File metadata and controls

executable file
·
128 lines (115 loc) · 5.71 KB

关联表JOIN

  • 下单用户是谁?如果获取更多信息

    • 创建一个用户表:
    CREATE EXTERNAL TABLE IF NOT EXISTS `bigdata.member` (
        `user_id`       string      COMMENT '用户id',
        `nick_name`     string      COMMENT '昵称',
        `name`          string      COMMENT '姓名',
        `gender`        string      COMMENT '性别',
        `register_time`   bigint    COMMENT '注册时间',
        `birthday`      bigint      COMMENT '生日',
        `device_type`   string      COMMENT '设备类型')
    ROW FORMAT SERDE
        'org.openx.data.jsonserde.JsonSerDe'
    STORED AS INPUTFORMAT
        'org.apache.hadoop.mapred.TextInputFormat'
    OUTPUTFORMAT
        'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
    LOCATION
        '/user/hadoop/hive/member';
    • 执行结果:
    hive> desc member;
    OK
    user_id                 string                  用户id
    nick_name               string                  昵称
    name                    string                  姓名
    gender                  string                  性别
    register_time           bigint                  注册时间
    birthday                bigint                  生日
    device_type             string                  设备类型
    Time taken: 0.05 seconds, Fetched: 7 row(s)
    
    • 关联用户表信息:
    select
        t1.user_id,
        gender,
        register_time
    from
    (select user_id from bigdata.weblog where active_name='order') t1
    join
    (select user_id, gender, register_time from bigdata.member) t2
    on t1.user_id=t2.user_id 
    limit 10;
    hive> select
        >     t1.user_id,
        >     gender,
        >     register_time
        > from
        > (select user_id from bigdata.weblog where active_name='order') t1
        > join
        > (select user_id, gender, register_time from bigdata.member) t2
        > on t1.user_id=t2.user_id limit 10;
    Query ID = 1015146591_20190106140901_6897753b-2c29-4c81-9aaa-c03876e46e1e
    Total jobs = 1
    Execution log at: /tmp/1015146591/1015146591_20190106140901_6897753b-2c29-4c81-9aaa-c03876e46e1e.log
    2019-01-06 14:09:05     Starting to launch local task to process map join;      maximum memory = 508559360
    2019-01-06 14:09:06     Dump the side-table for tag: 1 with group count: 30179 into file: file:/mnt/home/1015146591/apps/hive-1.2.2/tmp/1015146591/56a2b2ad-eab4-4e0a-a49d-f0bfd62e96fc/hive_2019-01-06_14-09-01_666_424795490568777106-1/-local-10003/HashTable-Stage-3/MapJoin-mapfile11--.hashtable
    2019-01-06 14:09:06     Uploaded 1 File to: file:/mnt/home/1015146591/apps/hive-1.2.2/tmp/1015146591/56a2b2ad-eab4-4e0a-a49d-f0bfd62e96fc/hive_2019-01-06_14-09-01_666_424795490568777106-1/-local-10003/HashTable-Stage-3/MapJoin-mapfile11--.hashtable (1395272 bytes)
    2019-01-06 14:09:06     End of local task; Time Taken: 1.491 sec.
    Execution completed successfully
    MapredLocal task succeeded
    Launching Job 1 out of 1
    Number of reduce tasks is set to 0 since there's no reduce operator
    Starting Job = job_1535253853575_21356, Tracking URL = http://bigdata0.novalocal:8088/proxy/application_1535253853575_21356/
    Kill Command = /home/hadoop/hadoop-current/bin/hadoop job  -kill job_1535253853575_21356
    Hadoop job information for Stage-3: number of mappers: 6; number of reducers: 0
    2019-01-06 14:09:13,778 Stage-3 map = 0%,  reduce = 0%
    2019-01-06 14:09:21,078 Stage-3 map = 33%,  reduce = 0%, Cumulative CPU 8.53 sec
    2019-01-06 14:09:23,137 Stage-3 map = 50%,  reduce = 0%, Cumulative CPU 16.27 sec
    2019-01-06 14:09:25,224 Stage-3 map = 54%,  reduce = 0%, Cumulative CPU 48.62 sec
    2019-01-06 14:09:28,319 Stage-3 map = 69%,  reduce = 0%, Cumulative CPU 58.85 sec
    2019-01-06 14:09:30,379 Stage-3 map = 77%,  reduce = 0%, Cumulative CPU 62.09 sec
    2019-01-06 14:09:31,412 Stage-3 map = 78%,  reduce = 0%, Cumulative CPU 68.57 sec
    2019-01-06 14:09:32,465 Stage-3 map = 86%,  reduce = 0%, Cumulative CPU 69.58 sec
    2019-01-06 14:09:34,543 Stage-3 map = 93%,  reduce = 0%, Cumulative CPU 75.94 sec
    2019-01-06 14:09:36,615 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 78.36 sec
    MapReduce Total cumulative CPU time: 1 minutes 18 seconds 360 msec
    Ended Job = job_1535253853575_21356
    MapReduce Jobs Launched:
    Stage-Stage-3: Map: 6   Cumulative CPU: 78.36 sec   HDFS Read: 942484283 HDFS Write: 760 SUCCESS
    Total MapReduce CPU Time Spent: 1 minutes 18 seconds 360 msec
    OK
    8201531897436759        男      1526031234916
    2211531897362630        男      1519311748534
    8431531897389241        男      1520079723099
    3211531897369709        男      1523560308175
    6911531897308202        女      1525233502748
    5041531897225960        男      1527678449938
    7391531897436409        女      1527670790734
    8191531897363226        男      1526696249998
    0851531897431904        女      1524085244386
    7031531897150536        女      1527243924519
    Time taken: 36.225 seconds, Fetched: 10 row(s)
    hive>
    

总结

select
    t1.user_id,
    gender,
    register_time
from
(select user_id from bigdata.weblog where active_name='order') t1
join
(select user_id, gender, register_time from bigdata.member) t2
on t1.user_id=t2.user_id limit 10;

这里t1、t2是子查询的别名,两个子查询语句通过公共字段进行关联。

join分四种情况

Join分类