-
数据分布
原始数据来自oracle,期望将oracle数据先迁移到备份oracle,然后再迁移至mongodb -
整体思路
主oracle到备oracle,使用ogg for oracle实时迁移数据。
oracle迁移数据到mongodb,使用ETL kettle定时迁移数据,最后做调用数据清洗程序 -
实施
受限于主oracle数据不能随意操作,且在开发中,表结构会随意改变。会影响ogg的运行。
暂时使用oracle dblink实现远程的全量备份和数据导入。其他思路不变
oracle to oracle
ogg for oracle请参考另外一篇文章
使用dblink和expdp.exe以及impdp.exe工具实现
- 首先需要创建backup的目录,参考数据泵导入导出
- 建立dblink,参考DBLINK创建方式
- 实现脚本: 数据清理->数据远程导出->数据导入
rem ##########################################
rem # timed task for master oracle to oracle slave
rem # v1.0.0
rem ##########################################
@echo off
set date_y=%date:~0,4%
set date_m=%date:~5,2%
set date_d=%date:~8,2%
set date_H=%time:~0,2%
set date_mm=%time:~3,2%
set date_s=%time:~6,2%
set date_name=%date_y%/%date_m%/%date_d% %date_H%:%date_mm%:%date_s%
set log_name=%date_y%%date_m%%date_d%
set cur_path=%1%
set dropTableSql=E:\script\drop_253_oracle_tables.sql
set DB=test/test@orcl
set tmp_file_name=TMP_DB_BACKUP.BAK.DMP
set tmp_file_path=E:\backup
set tmp_file=%tmp_file_path%\%tmp_file_name%
set srcDBName=test
set destDBName=test
set linkDB=remote_scott
call sqlplus %DB% @%dropTableSql%
if exist %tmp_file% (
del %tmp_file%
)
Echo [%date_name%]: begin exec oracle to oracle.>> %cur_path%\%log_name%_timer_remote_exec.log
expdp.exe %DB% DIRECTORY=expdp_dir network_link=%linkDB% SCHEMAS=%srcDBName% DUMPFILE=%tmp_file_name% logfile=db_backup_output.log >> %cur_path%\%log_name%_timer_remote_exec.log
impdp.exe %DB% DIRECTORY=expdp_dir network_link=%linkDB% logfile=db_backup_input.log remap_schema=%srcDBName%:%destDBName% >> %cur_path%\%log_name%_timer_remote_exec.log
Echo [%date_name%]: exec oracle to oracle sucessed. you can read datail log db_backup_input.log or db_backup_output.log>> %cur_path%\%log_name%_timer_remote_exec.log
oracle to mongodb
使用PsExec.exe和kettle工具实现
-
首先需要完成ETL job的编写
风骚的job.png - 远程机器绿化安装kettle,配置java即可使用
- 远程调用kitchen命令,运行job
rem # v1.0.0
rem ##########################################
set remoteIp=192.168.0.66
set remoteUser=Administrator
set remotePasswd=test
set date_y=%date:~0,4%
set date_m=%date:~5,2%
set date_d=%date:~8,2%
set date_H=%time:~0,2%
set date_mm=%time:~3,2%
set date_s=%time:~6,2%
set date_name=%date_y%/%date_m%/%date_d% %date_H%:%date_mm%:%date_s%
set log_name=%date_y%%date_m%%date_d%
PsExec.exe \\%remoteIp% -u %remoteUser% -p %remotePasswd% cmd /c "kitchen /file:C:\job\report_oracle_to_mongodb\switch_etl\main_jobs.kjb">>%cur_path%\%log_name%_timer_remote_exec.log
mongodb数据变换
mongodb数据变换,使用python和shell代码实现,并且使用ant工具实现远程shell调用
- 使用python编写数据变换代码
- 使用shell调用python代码
#!/bin/bash
# cron init sample datas for report
rm -rf log/*
ps -ef | grep action.py | awk -F " " '{print $2}' | while read line
do
kill -9 ${line}
done
python action.py -db report -d sample_info_bak #python实现的数据备份删除
python action.py -rename test #python实现的数据备份
python action.py -db report -c sample_info #python实现的数据变换
# 根据日志判断是否成功
# 如果失败,发送邮件
find log |xargs grep -ri "failed" | while read line
do
mail_content='定时变换mongodb数据错误,详细错误请看日志'
mail_title='[定时任务]:执行mongodb数据变换出错'
python common/mail.py -t ${mail_title} -c ${mail_content}
exit 0
done
mail_content='定时变换mongodb数据成功,详细错误请看日志'
mail_title='[定时任务]:执行mongodb数据变换成功'
python common/mail.py -t ${mail_title} -c ${mail_content}
- 安装ant,编写远程调用配置xml
<?xml version="1.0" encoding="utf-8"?>
<project name="remote exec etl" basedir="../" default="sshexec">
<target name="sshexec">
<sshexec
host="192.168.0.67"
username="test"
password="test"
trust="true"
command="cd /home/test/test-data-engine/engine/services/dataRefreshSrv; nohup sh init.sh > /dev/null 2> error.log < /dev/null &" />
</target>
</project>
- 执行远程调用脚本
rem ##########################################
rem # remote exec windows to windows
rem # v1.0.0
rem ##########################################
set date_y=%date:~0,4%
set date_m=%date:~5,2%
set date_d=%date:~8,2%
set date_H=%time:~0,2%
set date_mm=%time:~3,2%
set date_s=%time:~6,2%
set date_name=%date_y%/%date_m%/%date_d% %date_H%:%date_mm%:%date_s%
set log_name=%date_y%%date_m%%date_d%
set cur_path=%1%
Echo [%date_name%]: exec etl sucessed.>>%cur_path%\%log_name%_timer_remote_exec.log
ant -f %cur_path%\remote_exec_shell.xml>>%cur_path%\%log_name%_timer_remote_exec.log
If errorlevel 1 (
Echo [%date_name%]: exec ant shell failed.>>%cur_path%\%log_name%_timer_remote_exec.log
exit
) Else (
Echo [%date_name%]: exec task is sucessed.>>%cur_path%\%log_name%_timer_remote_exec.log
)
流程监控
- 使用日志监控流程
- 使用邮件发出告警
整个流程中,etl定时任务处理不稳定,需要优化
网友评论