美文网首页
Hive数据迁移4(内外部表)

Hive数据迁移4(内外部表)

作者: Moon_魔宽 | 来源:发表于2019-06-11 18:56 被阅读0次

    版权声明:本文为博主原创文章,未经博主允许不得转载。https://www.jianshu.com/p/34cc4e2cf181

    #date:2019-05-13

    #author:Wang Kuan

    dbname=$1

    #categorize the table type

    cat /home/tablename | while read table;

    do

      hive -e "use ${dbname};desc $table;" 2>&1 |grep 'Table not found'

      if [ $> -ne 0 ]; then

        hive -e "desc formatted $table;" 2>&1 |grep 'EXTERNAL_TABLE'

        if [ $> -ne 0 ]; then

            echo $table " is internal table"

            echo $table >> internal_table.txt

        else

            echo $table " is external table"

            echo $table >> external_table.txt

        fi

      else

        echo $table " is not exist"

        echo $table >> table_not_exits.txt

      fi

    done

    #repair the external table

    cat /home/external_table.txt | while read externaltable;

    do

        hive -e "msck repair table $externaltable";

        echo $externaltable " have been repaired!"

    done

    #repair the table_not_exist

    cat /home/table_not_exist.txt | while read table_not_exist;

    do

      beeline -u jdbc:hive2://sourceip:10010 -e "use ${dbname};show tables;"|grep -i $table_not_exist

      if [ $? -ne 0 ];then

        echo $table_not_exist " is not exist in source cluster"

      else

        beeline -u jdbc:hive2://ip:10010 --silent=true --outputformat=csv2 -e "show create table $table_not_exist;">> TableDDL.txt

      fi

    done

    #repair the internal table

    start_day="20190303"

    end_day="20190509"

    cat /home/internal_table.txt | while read internaltable

    do

      echo $internaltable "begin to add partition"

      while [[ $start_date < $end_date ]]

      do

        hive -e "use ${dbname};alter table $internaltable add partition(dt=$start_day)";

        start_date=`date -d "+1 day $start_date" +%Y%m%d`

      done

    done

    对于内部表来说,drop表HDFS上数据也会删除,而外部表不会。

    因此如果新建内部表,需要先将其在HDFS上对应的目录临时移动到别的位置,否则创建表时会将原始目录文件清空。待创建好内部表之后,再移回HDFS文件,添加分区即可。

    Hive数据迁移4(内外部表) Hive数据迁移4(内外部表)

    ROW FORMAT SERDE

    'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerde'

    STORED AS INPUTFORMAT

    'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'.....

    相关文章

      网友评论

          本文标题:Hive数据迁移4(内外部表)

          本文链接:https://www.haomeiwen.com/subject/ordcfctx.html