hive报错

作者: Rainysong | 来源:发表于2020-04-07 17:34 被阅读0次

    1、HQL子查询别名问题
    报错:hive query failed cannot recognize input near '(' 'select' '*' in joinSource
    备注:子查询需要加上别名 否则报错。
    HQL的书写,select * from (select * from table) ;
    应改为:select * from (select * from table) a,执行成功。

    2、left join 若后表为前表的基础上进一步查询的结果,不能直接用前表的临时命名,可用with table1 as (select xxxx) 来实现同一查询结果的复用;否则sql语句是执行到哪里,才有一遍结果,故后表查询时,需要把前表查询语句再执行一遍。
    报错:table name "a" must be "db.table" format

    报错.png
    本质:不能写 (select xx from xx where xx) a left join (select yy from a where yy) b
    执行到第二个a时,a表并不实际存在!!
    解决:
    【法一】再写一遍子查询
    (select xx from xx where xx) a left join (select yy from (select xx from xx where xx) a where yy) b
    【法二】with a as _ 实现同一查询结果多次复用
    with a as (select xx from xx where xx)
    a left join (select yy from a where yy) b

    3、hive使用正则报错
    报错:Hive query failed, Error while compiling statement: FAILED: SemanticException [Error 10002]: Line 28:18 Invalid column reference 'median.*'

    xxx.png

    语句:select a.*, b.`median.*` from a
    left join
    (
    select brand, region
    , percentile(rscore, 0.5) as median_r
    ,percentile(fscore, 0.5) as median_f
    ,percentile(mscore, 0.5) as median_m
    from a
    group by brand, region
    ) b
    on a.brand=b.brand and a.region=b.region

    b.`median.*` 代表b表中,以“median”开头的字段

    解决:需要在代码段前加上:
    set hive.support.quoted.identifiers=none;

    4、insert overwrite 报错
    报错:Caused by: MetaException(message:Exception thrown when executing query : SELECT DISTINCT 'org.apache.hadoop.hive.metastore.model.MPartition' AS NUCLEUS_TYPE,A0.CREATE_TIME,A0.LAST_ACCESS_TIME,A0.PART_NAME,A0.PART_ID FROM PARTITIONS A0 LEFT OUTER JOIN TBLS B0 ON A0.TBL_ID = B0.TBL_ID LEFT OUTER JOIN DBS C0 ON B0.DB_ID = C0.DB_ID WHERE B0.TBL_NAME = ? AND C0.NAME = ? AND A0.PART_NAME = ?)
    原因:由于partition字段异常字符导致,分区字段不支持中文
    根本原因:源表字段值和输出分区值之间的关系是根据位置而不是命名来匹配的!!!
    例子:
    insert overwrite table employees
    partition (country, state)
    select ..., se.cnty, se.st
    from staged_employees se;

    5、cannot recognize input near 报错
    报错:Error: Error while compiling statement: FAILED: ParseException line 10:8 cannot recognize input near '(' 'select' 'statis_id' in joinSource (state=42000,code=40000)
    原因:大多数情况是没有对子查询/join的子表 命别名
    实际原因:所以我一直纠结在别名的检查上,谁能想到是 变量名写错 了呢……T^T

    interval_days变量名写错,为啥报 select statis_id 附近的错啊,这还隔着十万八千里呢T0T.png interval_days变量名写错!!!.png

    6、Error: org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
    解决1:set mapred.reduce.tasks=10; -> set mapreduce.job.reduces=10; (hive2)

    7、Query failed (#20200429_072245_00729_nh2wp): Error opening Hive split s3://bi-oor-com/warehouse/analysis/inter/union_inter_dw.db/dim_mkt_app_info/dayno=29991231/000003_0 (offset=0, length=25082323): Malformed ORC file. Can not read SQL type integer from ORC stream .system_id of type STRING [s3://bi-oor-com/warehouse/analysis/inter/union_inter_dw.db/dim_mkt_app_info/dayno=29991231/000003_0]
    解决:select字段过多,不能 select * ,改成指定某几个字段

    8、SemanticException Column app_id Found in more than One Tables/Subqueries (state=42000,code=40000)
    一般是因为join的时候,含同字段两张表,没有命别名
    但我遇到一个诡异的情况:
    A union all B
    B=aa left join bb left join cc (均已指定别名)
    问题出在,aa中,有 not in (select xx_id from A) 的情况,把该条件拿掉后,不再报上述错误

    解决:搞不清为啥,直接把B表逻辑with tmp as 了,不再报此错误。

    Hive对in的支持有点问题,写出来的sql一直报错,分开跑也报错,一会儿 column more than one tables,一会儿又说语法错误,一会儿又能跑。。。

    所以,以后遇到 in / not in 的查询需求,统一用 join 替换,如下:

    • in:
      select A.x_id, A.xxx,……, from (table) A left join (table) B on A.x_id=B.x_id where B.x_id is not null;
    • not in:
      select A.x_id, A.xxx,……, from (table) A left join (table) B on A.x_id=B.x_id where B.x_id is null;

    9、hive insert overwrite / into 都没报错,但是数据没写进去
    不知道原因>.<
    有可能是分区字段不能含NULL,我把分区字段筛选了不为NULL,就好像写进去了?——<验证不是这个问题>
    现在怀疑是set hive 某些参数,导致本应有数据,但select出来就没有,所以也就没有insert进去了。——对!!!去掉参数设置就有数据了!!!是参数出了问题!!!有可能是以下几个参数:

    • set hive.strict.checks.cartesian.product = false;
    • set hive.mapred.mode = nonstrict;
    • set hive.support.quoted.identifiers = none;

    明天来排查具体是哪个参数的问题。真的太坑了,巨坑,一切的坑都源于省事儿用 in, 用 in 就要求开启这几个配置,开了就忘了关了,就一直写不进去数据(实际上是就没查出来数据)!!!巨坑!!!巨TM坑!!!明天一定要来搞搞清楚!!!

    相关文章

      网友评论

        本文标题:hive报错

        本文链接:https://www.haomeiwen.com/subject/jcgnphtx.html