美文网首页我爱编程
数据定时迁移

数据定时迁移

作者: haishuiaa | 来源:发表于2018-03-06 10:10 被阅读101次
    • 数据分布
      原始数据来自oracle,期望将oracle数据先迁移到备份oracle,然后再迁移至mongodb

    • 整体思路
      主oracle到备oracle,使用ogg for oracle实时迁移数据。
      oracle迁移数据到mongodb,使用ETL kettle定时迁移数据,最后做调用数据清洗程序

    • 实施
      受限于主oracle数据不能随意操作,且在开发中,表结构会随意改变。会影响ogg的运行。
      暂时使用oracle dblink实现远程的全量备份和数据导入。其他思路不变

    oracle to oracle

    ogg for oracle请参考另外一篇文章
    使用dblink和expdp.exe以及impdp.exe工具实现

    rem ##########################################
    rem # timed task for master oracle to oracle slave
    rem # v1.0.0
    rem ##########################################
    @echo off
    set date_y=%date:~0,4%
    set date_m=%date:~5,2%
    set date_d=%date:~8,2%
    set date_H=%time:~0,2%
    set date_mm=%time:~3,2%
    set date_s=%time:~6,2%
    set date_name=%date_y%/%date_m%/%date_d% %date_H%:%date_mm%:%date_s%
    set log_name=%date_y%%date_m%%date_d%
    set cur_path=%1%
    set dropTableSql=E:\script\drop_253_oracle_tables.sql
    set DB=test/test@orcl
    set tmp_file_name=TMP_DB_BACKUP.BAK.DMP
    set tmp_file_path=E:\backup
    set tmp_file=%tmp_file_path%\%tmp_file_name%
    set srcDBName=test
    set destDBName=test
    set linkDB=remote_scott
    call sqlplus %DB% @%dropTableSql%
    if exist %tmp_file% (
        del %tmp_file%
    )
    Echo [%date_name%]: begin exec oracle to oracle.>> %cur_path%\%log_name%_timer_remote_exec.log
    expdp.exe %DB% DIRECTORY=expdp_dir network_link=%linkDB% SCHEMAS=%srcDBName% DUMPFILE=%tmp_file_name% logfile=db_backup_output.log >> %cur_path%\%log_name%_timer_remote_exec.log
    impdp.exe %DB% DIRECTORY=expdp_dir network_link=%linkDB% logfile=db_backup_input.log remap_schema=%srcDBName%:%destDBName% >> %cur_path%\%log_name%_timer_remote_exec.log
    Echo [%date_name%]: exec oracle to oracle sucessed. you can read datail log db_backup_input.log or db_backup_output.log>> %cur_path%\%log_name%_timer_remote_exec.log
    

    oracle to mongodb

    使用PsExec.exe和kettle工具实现

    • 首先需要完成ETL job的编写


      风骚的job.png风骚的job.png
    • 远程机器绿化安装kettle,配置java即可使用
    • 远程调用kitchen命令,运行job
    rem # v1.0.0
    rem ##########################################
    set remoteIp=192.168.0.66
    set remoteUser=Administrator
    set remotePasswd=test
    set date_y=%date:~0,4%
    set date_m=%date:~5,2%
    set date_d=%date:~8,2%
    set date_H=%time:~0,2%
    set date_mm=%time:~3,2%
    set date_s=%time:~6,2%
    set date_name=%date_y%/%date_m%/%date_d% %date_H%:%date_mm%:%date_s%
    set log_name=%date_y%%date_m%%date_d%
    PsExec.exe \\%remoteIp% -u %remoteUser% -p %remotePasswd% cmd /c "kitchen /file:C:\job\report_oracle_to_mongodb\switch_etl\main_jobs.kjb">>%cur_path%\%log_name%_timer_remote_exec.log
    

    mongodb数据变换

    mongodb数据变换,使用python和shell代码实现,并且使用ant工具实现远程shell调用

    • 使用python编写数据变换代码
    • 使用shell调用python代码
    #!/bin/bash
    # cron init sample datas for report
    rm -rf log/*
    ps -ef | grep action.py | awk -F " " '{print $2}' | while read line
    do
        kill -9 ${line}
    done
    python action.py -db report -d sample_info_bak #python实现的数据备份删除
    python action.py -rename test #python实现的数据备份
    python action.py -db report -c sample_info #python实现的数据变换
    # 根据日志判断是否成功
    # 如果失败,发送邮件
    find log |xargs grep -ri "failed" | while read line
    do
        mail_content='定时变换mongodb数据错误,详细错误请看日志'
        mail_title='[定时任务]:执行mongodb数据变换出错'
        python common/mail.py -t ${mail_title} -c ${mail_content}
        exit 0
    done
    mail_content='定时变换mongodb数据成功,详细错误请看日志'
    mail_title='[定时任务]:执行mongodb数据变换成功'
    python common/mail.py -t ${mail_title} -c ${mail_content}
    
    • 安装ant,编写远程调用配置xml
    <?xml version="1.0" encoding="utf-8"?>
    
    <project name="remote exec etl" basedir="../" default="sshexec">
        <target name="sshexec">
            <sshexec 
            host="192.168.0.67" 
            username="test" 
            password="test" 
            trust="true" 
            command="cd /home/test/test-data-engine/engine/services/dataRefreshSrv; nohup sh init.sh &gt; /dev/null 2&gt; error.log &lt; /dev/null &amp;" />
        </target>
    </project>
    
    • 执行远程调用脚本
    rem ##########################################
    rem # remote exec windows to windows
    rem # v1.0.0
    rem ##########################################
    set date_y=%date:~0,4%
    set date_m=%date:~5,2%
    set date_d=%date:~8,2%
    set date_H=%time:~0,2%
    set date_mm=%time:~3,2%
    set date_s=%time:~6,2%
    set date_name=%date_y%/%date_m%/%date_d% %date_H%:%date_mm%:%date_s%
    set log_name=%date_y%%date_m%%date_d%
    set cur_path=%1%
    Echo [%date_name%]: exec etl sucessed.>>%cur_path%\%log_name%_timer_remote_exec.log
        ant -f %cur_path%\remote_exec_shell.xml>>%cur_path%\%log_name%_timer_remote_exec.log
        If errorlevel 1 (
            Echo [%date_name%]: exec ant shell failed.>>%cur_path%\%log_name%_timer_remote_exec.log
            exit
        ) Else (
            Echo [%date_name%]: exec task is sucessed.>>%cur_path%\%log_name%_timer_remote_exec.log
        )
    

    流程监控

    • 使用日志监控流程
    • 使用邮件发出告警

    整个流程中,etl定时任务处理不稳定,需要优化

    相关文章

      网友评论

        本文标题:数据定时迁移

        本文链接:https://www.haomeiwen.com/subject/afhlfftx.html