美文网首页
DG库gap故障案例

DG库gap故障案例

作者: 苏水的北 | 来源:发表于2021-09-10 20:57 被阅读0次

环境说明:

主备库均为asm方式建库。数据库版本为11.2.0.4。
主库:192.168.99.237 oracle1
备库:192.168.99.238 oracle2
实验内容:主要先关闭备库,在主库做归档丢失备库无法同步归档文件,备库产生GAP,然后增量备份恢复备库。

实验模拟gap故障:

1、先检查主备库的dg状态,dg状态正常:

主库:

SQL> select switchover_status,database_role,open_mode from v$database;

SWITCHOVER_STATUS    DATABASE_ROLE    OPEN_MODE
-------------------- ---------------- --------------------
TO STANDBY           PRIMARY          READ WRITE
SQL>  select thread#,process,status,sequence#,DELAY_MINS from v$managed_standby;

   THREAD# PROCESS   STATUS        SEQUENCE# DELAY_MINS
---------- --------- ------------ ---------- ----------
         0 ARCH      CONNECTED             0          0
         0 ARCH      CONNECTED             0          0
         1 ARCH      CLOSING             120          0
         0 ARCH      CONNECTED             0          0
         0 RFS       IDLE                  0          0
         1 RFS       IDLE                121          0
         0 RFS       IDLE                  0          0
         1 MRP0      APPLYING_LOG        121          0

备库:

SQL> select switchover_status,database_role,open_mode from v$database;

SWITCHOVER_STATUS    DATABASE_ROLE    OPEN_MODE
-------------------- ---------------- --------------------
NOT ALLOWED          PHYSICAL STANDBY READ ONLY WITH APPLY

2、备库操作:下边取消实时应用,也就是关掉了mpr进程,再把备库关闭。

SQL> alter database recover managed standby database cancel;

Database altered.

SQL> select thread#,process,status,sequence#,DELAY_MINS from v$managed_standby;

   THREAD# PROCESS   STATUS        SEQUENCE# DELAY_MINS
---------- --------- ------------ ---------- ----------
         0 ARCH      CONNECTED             0          0
         0 ARCH      CONNECTED             0          0
         1 ARCH      CLOSING             120          0
         0 ARCH      CONNECTED             0          0
         0 RFS       IDLE                  0          0
         1 RFS       IDLE                121          0
         0 RFS       IDLE                  0          0
SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.

3、主库上操作:模拟归档丢失,这时备库已经关掉,RFS进程无法把主库归档传送过去。

3.1、主库切换产生归档
SQL> alter system switch logfile;

System altered.

SQL> /

System altered.

SQL> /

System altered.

SQL> select max(sequence#) from v$archived_log;

MAX(SEQUENCE#)
--------------
           123
3.2、主库上查看刚才产生的归档日志文件:
image.png
3.3、备库上查看之前的归档文件信息:
image.png
3.4、删除主库中的122和123归档:
ASMCMD> rm -rf thread_1_seq_122.303.1082899809
ASMCMD> rm -rf thread_1_seq_123.304.1082899811

备注:这里让主库模拟丢失的本来是121和122和123,但是备库一直出现不了GAP的状态,后来一看备库日志说的是121已经再被运输中(in transit),所以说这里如果备库重新启动121是直接被应用的,如果把121号归档rm掉,是产生不了GAP的,所以要模拟rm掉归档 就rm 122号归档,也就是切换的第二个归档,这里需要注意下。

4、备库上:启动备库,查看GAP

4.1、启动备库,打开实时同步,检查gap信息:
SQL> alter database recover managed standby database using current logfile disconnect from session;

Database altered.

SQL> select thread#,process,status,sequence#,DELAY_MINS from v$managed_standby;

   THREAD# PROCESS   STATUS        SEQUENCE# DELAY_MINS
---------- --------- ------------ ---------- ----------
         0 ARCH      CONNECTED             0          0
         0 ARCH      CONNECTED             0          0
         0 ARCH      CONNECTED             0          0
         1 ARCH      CLOSING             121          0
         0 RFS       IDLE                  0          0
         0 RFS       IDLE                  0          0
         1 RFS       IDLE                125          0
         1 MRP0      WAIT_FOR_GAP        122          0

8 rows selected.
SQL> select * from v$archive_gap;

   THREAD# LOW_SEQUENCE# HIGH_SEQUENCE#
---------- ------------- --------------
         1           122            123
4.2、备库:查看alert日志信息,也能看到报错:
image.png
4.3、备库:在备库查看一下缺失的归档文件,发现和主库丢失的一样,所以就出现了gap错误。
image.png

5、主库:查询到122号归档的scn号,然后做增量备份,这里要做的是在主库上查询到122号归档的scn号,也就是在对122号归档做操作之后的记录,然后在备库增量恢复。

5.1、查询到24号归档之前的操作
SQL> select FIRST_CHANGE#  from v$archived_log where SEQUENCE#=122;

FIRST_CHANGE#
-------------
      1062975
5.2、查看当前的归档号
SQL> select current_scn from v$database;

CURRENT_SCN
-----------
    1066169

6、模拟在test01表空间下创建2个数据文件:

SQL> alter  tablespace test01 add datafile '+datadg' size 100M;

Tablespace altered.

SQL> alter  tablespace test01 add datafile '+datadg' size 200M;

Tablespace altered.
image.png

7、主库上做基于1062975的备份(也就是SEQUENCE#=122号的scn号)。

RMAN> backup as compressed backupset incremental from SCN 1062975 database format '/home/oracle/standby_%d_%T_%U.bak' include current controlfile for standby filesperset=5 tag 'FOR STANDBY';

Starting backup at 10-SEP-21
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=130 device type=DISK
channel ORA_DISK_1: starting compressed full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00001 name=+DATADG/orcl/datafile/system.260.1082804591
input datafile file number=00002 name=+DATADG/orcl/datafile/sysaux.261.1082804595
input datafile file number=00008 name=+DATADG/orcl/datafile/test01.273.1082877021
input datafile file number=00004 name=+DATADG/orcl/datafile/users.264.1082804607
input datafile file number=00009 name=+DATADG/orcl/datafile/test01.274.1082904145
channel ORA_DISK_1: starting piece 1 at 10-SEP-21
channel ORA_DISK_1: finished piece 1 at 10-SEP-21
piece handle=/home/oracle/standby_ORCL_20210910_0r08njpo_1_1.bak tag=FOR STANDBY comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:03
channel ORA_DISK_1: starting compressed full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00003 name=+DATADG/orcl/datafile/undotbs1.262.1082804599
input datafile file number=00005 name=+DATADG/orcl/datafile/test01.266.1082807589
input datafile file number=00010 name=+DATADG/orcl/datafile/test01.275.1082904199
input datafile file number=00006 name=+DATADG/orcl/datafile/test01.267.1082807665
input datafile file number=00007 name=+DATADG/orcl/datafile/test01.272.1082876997
channel ORA_DISK_1: starting piece 1 at 10-SEP-21
channel ORA_DISK_1: finished piece 1 at 10-SEP-21
piece handle=/home/oracle/standby_ORCL_20210910_0s08njps_1_1.bak tag=FOR STANDBY comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:03
channel ORA_DISK_1: starting compressed full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
including standby control file in backup set
channel ORA_DISK_1: starting piece 1 at 10-SEP-21
channel ORA_DISK_1: finished piece 1 at 10-SEP-21
piece handle=/home/oracle/standby_ORCL_20210910_0t08njpv_1_1.bak tag=FOR STANDBY comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
Finished backup at 10-SEP-21

这里查看一下备份的文件:

[root@oracle1 oracle]# ll
-rw-r----- 1 oracle asmadmin 1409024 Sep 10 14:46 standby_ORCL_20210910_0r08njpo_1_1.bak
-rw-r----- 1 oracle asmadmin 1302528 Sep 10 14:46 standby_ORCL_20210910_0s08njps_1_1.bak
-rw-r----- 1 oracle asmadmin 1114112 Sep 10 14:46 standby_ORCL_20210910_0t08njpv_1_1.bak

8、主库:把主库备份集传输到备库/home/oracle下

[root@oracle1 oracle]# scp standby_ORCL_20210910_0*  oracle@192.168.99.238:/home/oracle
oracle@192.168.99.238's password:
Permission denied, please try again.
oracle@192.168.99.238's password:
standby_ORCL_20210910_0r08njpo_1_1.bak                                                 100% 1376KB   1.3MB/s   00:00
standby_ORCL_20210910_0s08njps_1_1.bak                                                 100% 1272KB   1.2MB/s   00:00
standby_ORCL_20210910_0t08njpv_1_1.bak                                                 100% 1088KB   1.1MB/s   00:00

9、备库:在备库上进行恢复scn号,首先恢复控制文件:

9.1、关闭备库,启动到nomount状态下:
SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup nomount
ORACLE instance started.

Total System Global Area 2471931904 bytes
Fixed Size                  2255752 bytes
Variable Size             620758136 bytes
Database Buffers         1828716544 bytes
Redo Buffers               20201472 bytes
9.2、先用备份集恢复控制文件,并把数据库启动到mount阶段:
RMAN> restore standby controlfile from  '/home/oracle/standby_ORCL_20210910_0t08njpv_1_1.bak';

Starting restore at 10-SEP-21
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=131 device type=DISK
channel ORA_DISK_1: restoring control file
channel ORA_DISK_1: restore complete, elapsed time: 00:00:01
output file name=+DATADG/orcl2/controlfile/control01.ctl
output file name=+FLASHDG/orcl2/controlfile/control02.ctl
Finished restore at 10-SEP-21

RMAN> alter database mount;
database mounted
released channel: ORA_DISK_1
9.3、注册备库原有数据文件:
RMAN> catalog start with '+datadg';

Starting implicit crosscheck backup at 10-SEP-21
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=135 device type=DISK
Crosschecked 1 objects
Finished implicit crosscheck backup at 10-SEP-21

Starting implicit crosscheck copy at 10-SEP-21
using channel ORA_DISK_1
Crosschecked 2 objects
Finished implicit crosscheck copy at 10-SEP-21

searching for all files in the recovery area
cataloging files...
cataloging done

List of Cataloged Files
9.4、注册增量备文件:
RMAN> catalog start with '/home/oracle';

searching for all files that match the pattern /home/oracle

List of Files Unknown to the Database
=====================================
File Name: /home/oracle/standby_ORCL_20210910_0t08njpv_1_1.bak
File Name: /home/oracle/.bash_history
File Name: /home/oracle/.bashrc
File Name: /home/oracle/.bash_logout
File Name: /home/oracle/standby_ORCL_20210910_0r08njpo_1_1.bak
File Name: /home/oracle/.bash_profile
File Name: /home/oracle/.kshrc
File Name: /home/oracle/.ssh/known_hosts
File Name: /home/oracle/standby_ORCL_20210910_0s08njps_1_1.bak
File Name: /home/oracle/.viminfo

Do you really want to catalog the above files (enter YES or NO)? yes
cataloging files...
cataloging done

List of Cataloged Files
=======================
File Name: /home/oracle/standby_ORCL_20210910_0t08njpv_1_1.bak
File Name: /home/oracle/standby_ORCL_20210910_0r08njpo_1_1.bak
File Name: /home/oracle/standby_ORCL_20210910_0s08njps_1_1.bak
9.5、还原增量数据文件(如果没有新增数据文件 跳过):

主库查询增加的数据文件号:(上面模拟的是添加的9号和10号文件)

SQL> select  file_name,file_id from dba_data_files;

FILE_NAME                                                       FILE_ID
------------------------------------------------------------ ----------
+DATADG/orcl/datafile/system.260.1082804591                           1
+DATADG/orcl/datafile/sysaux.261.1082804595                           2
+DATADG/orcl/datafile/undotbs1.262.1082804599                         3
+DATADG/orcl/datafile/users.264.1082804607                            4
+DATADG/orcl/datafile/test01.266.1082807589                           5
+DATADG/orcl/datafile/test01.267.1082807665                           6
+DATADG/orcl/datafile/test01.272.1082876997                           7
+DATADG/orcl/datafile/test01.273.1082877021                           8
+DATADG/orcl/datafile/test01.274.1082904145                           9
+DATADG/orcl/datafile/test01.275.1082904199                          10

恢复9号和10号数据文件:

RMAN> restore datafile 9;

Starting restore at 10-SEP-21
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=194 device type=DISK

channel ORA_DISK_1: starting datafile backup set restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_DISK_1: restoring datafile 00009 to +DATADG/orcl2/datafile/test01.274.1082904145
channel ORA_DISK_1: reading from backup piece /home/oracle/standby_ORCL_20210910_0r08njpo_1_1.bak
channel ORA_DISK_1: piece handle=/home/oracle/standby_ORCL_20210910_0r08njpo_1_1.bak tag=FOR STANDBY
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:00:01
Finished restore at 10-SEP-21

RMAN> restore datafile 10;

Starting restore at 10-SEP-21
using channel ORA_DISK_1

channel ORA_DISK_1: starting datafile backup set restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_DISK_1: restoring datafile 00010 to +DATADG/orcl2/datafile/test01.275.1082904199
channel ORA_DISK_1: reading from backup piece /home/oracle/standby_ORCL_20210910_0s08njps_1_1.bak
channel ORA_DISK_1: piece handle=/home/oracle/standby_ORCL_20210910_0s08njps_1_1.bak tag=FOR STANDBY
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:00:03
Finished restore at 10-SEP-21
9.6、recover问题:recover增量恢复备份文件,发现系统提示,需要把所有数据文件都要restore。
RMAN> recover database noredo;

Starting recover at 10-SEP-21
using channel ORA_DISK_1
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of recover command at 09/10/2021 15:06:58
RMAN-06094: datafile 1 must be restored

RMAN> restore datafile 1;

Starting restore at 10-SEP-21
using channel ORA_DISK_1

channel ORA_DISK_1: restoring datafile 00001
input datafile copy RECID=9 STAMP=1082904975 file name=+DATADG/orcl2/datafile/system.258.1082872823
destination for restore of datafile 00001: +DATADG/orcl2/datafile/system.260.1082804591
channel ORA_DISK_1: copied datafile copy of datafile 00001
output file name=+DATADG/orcl2/datafile/system.306.1082905653 RECID=0 STAMP=0
Finished restore at 10-SEP-21

RMAN> recover database noredo;

Starting recover at 10-SEP-21
using channel ORA_DISK_1
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of recover command at 09/10/2021 15:07:49
RMAN-06094: datafile 2 must be restored
9.7、把剩下所有数据文件全部restore:
RMAN> restore datafile 1;
RMAN> restore datafile 2;
RMAN> restore datafile 3;
RMAN> restore datafile 4;
RMAN> restore datafile 5;
RMAN> restore datafile 6;
RMAN> restore datafile 7;
RMAN> restore datafile 8;
9.8、再次recover增量恢复备份文件:
RMAN>  recover database noredo;

Starting recover at 10-SEP-21
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=70 device type=DISK
channel ORA_DISK_1: starting incremental datafile backup set restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
destination for restore of datafile 00001: +DATADG/orcl2/datafile/system.306.1082905653
destination for restore of datafile 00002: +DATADG/orcl2/datafile/sysaux.307.1082905779
destination for restore of datafile 00004: +DATADG/orcl2/datafile/users.309.1082905809
destination for restore of datafile 00008: +DATADG/orcl2/datafile/test01.313.1082905851
channel ORA_DISK_1: reading from backup piece /home/oracle/standby_ORCL_20210910_0r08njpo_1_1.bak
channel ORA_DISK_1: piece handle=/home/oracle/standby_ORCL_20210910_0r08njpo_1_1.bak tag=FOR STANDBY
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:00:01
channel ORA_DISK_1: starting incremental datafile backup set restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
destination for restore of datafile 00003: +DATADG/orcl2/datafile/undotbs1.308.1082905795
destination for restore of datafile 00005: +DATADG/orcl2/datafile/test01.310.1082905821
destination for restore of datafile 00006: +DATADG/orcl2/datafile/test01.311.1082905833
destination for restore of datafile 00007: +DATADG/orcl2/datafile/test01.312.1082905841
channel ORA_DISK_1: reading from backup piece /home/oracle/standby_ORCL_20210910_0s08njps_1_1.bak
channel ORA_DISK_1: piece handle=/home/oracle/standby_ORCL_20210910_0s08njps_1_1.bak tag=FOR STANDBY
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:00:01

Finished recover at 10-SEP-21
9.9、备库:打开数据库,并启动数据库至mount状态下,打开闪回区:
SQL> alter database open;

Database altered.

SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup mount
ORACLE instance started.
Database mounted.
SQL> alter database flashback on;

Database altered.

SQL> alter database open;

Database altered.

10、备库:打开MRP进程,并查看进程状态:

10.1、备库下打开MPR进程(MPR进程状态为APPLYING_LOG),查看到所有进程正常:
SQL> alter database recover managed standby database using current logfile disconnect from session;

Database altered.
SQL> select thread#,process,status,sequence#,DELAY_MINS from v$managed_standby;

   THREAD# PROCESS   STATUS        SEQUENCE# DELAY_MINS
---------- --------- ------------ ---------- ----------
         0 ARCH      CONNECTED             0          0
         0 ARCH      CONNECTED             0          0
         0 ARCH      CONNECTED             0          0
         1 ARCH      CLOSING             128          0
         0 RFS       IDLE                  0          0
         0 RFS       IDLE                  0          0
         1 RFS       IDLE                129          0
         1 MRP0      APPLYING_LOG        129          0

8 rows selected.
10.2、备库:查看gap状态为空,证明gap故障解决:
SQL> select * from v$archive_gap;

no rows selected

11、查询dg两个库的状态:

11.1、主库:
[oracle@oracle1 tmp]$ sqlplus / as sysdba
SQL> select switchover_status,database_role,open_mode from v$database;

SWITCHOVER_STATUS    DATABASE_ROLE    OPEN_MODE
-------------------- ---------------- --------------------
TO STANDBY           PRIMARY          READ WRITE
11.2、备库:
SQL> select switchover_status,database_role,open_mode from v$database;

SWITCHOVER_STATUS    DATABASE_ROLE    OPEN_MODE
-------------------- ---------------- --------------------
NOT ALLOWED          PHYSICAL STANDBY READ ONLY WITH APPLY

12、验证:查看之前dg损坏后,在主库创建的2和文件是否存在:

image.png

13、截止到目前我们把gap故障已经解决,数据库dg状态正常。

相关文章

  • DG库gap故障案例

    环境说明: 主备库均为asm方式建库。数据库版本为11.2.0.4。主库:192.168.99.237 orac...

  • 职场里,对数据库要有敬畏之心!

    前言: 时常有听到各公司数据库故障的案例,比如数据库宕机了、误删数据了、恶意删库了等等。可能还有更多的故障没有披露...

  • 【Oracle】DG重做操作记录

    前言:之前由于数据库没有监控,DG早就出问题了,然而主库上归档早就被清理,没办法,只能手动重做DG。 1. 删库 ...

  • 彻底理解Redis持久化

    本文转自杨奇龙老师的公众号(yangyidba),他长期关注于数据库技术以及性能优化,故障案例分析,数据库运维技术...

  • 一文了解Docker

    本文转自杨奇龙老师的公众号(yangyidba),他长期关注于数据库技术以及性能优化,故障案例分析,数据库运维技术...

  • 故障案例

    故障案例一、NFS故障,造成系统cpu使用率低而负载极高。 https://www.jianshu.com/p/3...

  • 2022-05-18

    遇到没修过的故障 一定要多看故障案例。

  • 11g DG中的参数

    11g DG中的SEC_CASE_SENSITIVE_LOGON参数 在一次配置dg完成后,备库会时不时的无法自动...

  • MySQL:一个innodb_thread_concurrenc

    源码版本:5.7.22 一、问题来源 这是一个朋友问我的典型案例。整个故障现象表现为,MySQL数据库频繁的出现大...

  • 九,主从复制高级进阶

    1,延时从库 介绍 配置 延时从库处理逻辑故障 故障演练 主库: 从库 恢复到从库 2,过滤复制 例子 3,GTI...

网友评论

      本文标题:DG库gap故障案例

      本文链接:https://www.haomeiwen.com/subject/qrrnwltx.html