Oracle 12c DataGuard: Roll Forward a Physical Standby with RECOVER FROM SERVICE

Executive Summary

When a physical standby database in Oracle DataGuard falls behind the primary – because archived logs were lost or redo transport was interrupted – it can be rolled forward without a full resynchronisation using the RECOVER…FROM SERVICE command. This is faster than rebuilding the standby from scratch and avoids the overhead of shipping all diverged archived logs. The command uses an incremental backup approach to close the gap between the standby’s current SCN and the primary’s current SCN. This article demonstrates the complete procedure: verifying the configuration with the primary (boston), far sync (bostonFS), and standby (london); simulating archived log loss; identifying out-of-sync datafiles on the standby; and executing and verifying the RECOVER FROM SERVICE recovery.

Rolling Forward a Physical Standby Database Using the RECOVER Command

A standby database is a transactionally-consistent copy of the production database. It enables production Oracle database to survive disasters and data corruption. If the production database becomes unavailable because of a planned or an unplanned outage, Data Guard can switch a standby database to the production role, minimizing the downtime associated with the outage. Moreover, performance of production database can be improved by offloading resource-intensive backup and reporting operations to standby systems. As you can see, it’s always desirable to have standby database synchronized with the primary database.

A standby database might lag behind the primary for various reasons like:

Unavailability of or insufficient network bandwidth between primary and standby database
Unavailability of Standby database
Corruption / Accidental deletion of Archive Redo Data on primary

If standby database lags behind the primary database:

Switchover will take more time.
Failover will result in data loss.
An attempt to issue Real Time Query against the standby database will result in error
ORA-03172 : STANDBY_MAX_DATA_DELAY of n seconds exceeded.

Synchronizing the standby and primary databases can be done by copying and applying the archived logs from the primary database but this process is quite time consuming as it will first apply both the COMMITED and the NON COMMITED transactions followed by rolling back uncommitted transactions. Employing incremental backups of the primary database containing changes since the standby database was last refreshed is a faster alternative which will recover the standby database much faster as it will apply only the COMMITED transactions on the standby database. Moreover, incremental backups are also useful in cases when there are missing archived logs on Primary which have not been applied to the standby database

Prior to 12c, in order to roll forward the standby database using incremental backups you would need to:

Create a control file for the standby database on the primary database.
Take an incremental backup on the primary starting from the SCN# of the standby database.
Copy the incremental backup to the standby host and catalog it with RMAN.
Mount the standby database with newly created standby control file.
Cancel managed recovery of the standby database and apply incremental backup to the standby database.
Start managed recovery of standby database.

In 12c, this procedure has been dramatically simplified. Now you can use the RECOVER … FROM SERVICE command to synchronize the physical standby database with the primary database. This command does the following:

Creates an incremental backup containing the changes to the primary database. All changes to data files on the primary database, beginning with the SCN in the standby data file header, are included in the incremental backup.
Transfers the incremental backup over the network to the physical standby database.
Applies the incremental backup to the physical standby database.

This results in rolling forward the standby data files to the same point-in-time as the primary. However, the standby control file still contains old SCN values which are lower than the SCN values in the standby data files. Therefore, to complete the synchronization of the physical standby database, the standby control file needs to be refreshed to update the SCN#.

Now I will demonstrate the steps to refresh the physical standby database with changes made to the primary database using the RECOVER…FROM SERVICE command.

Overview:

View current configuration and verify that primary database, far sync for primary and standby database are in sync
Simulate loss of archived logs on the primary database
Refresh the physical standby using the RECOVER…FROM SERVICE command

Setting up the example

View Current configuration

DGMGRL> show configuration;

Configuration - drsolution

Protection Mode: MaxPerformance

Databases:

boston - Primary database

bostonFS - Far Sync

london - Physical standby database

london2 - Logical standby database (disabled)

londonfs - Far Sync (inactive)

Fast-Start Failover: DISABLED

Configuration Status:

SUCCESS

It can be verified that primary database (boston), far sync for primary (bostonFS) and physical standby (london) are in sync.

-- Primary (boston) --

BOSTON>select max(sequence#) from v$archived_log;

MAX(SEQUENCE#)

--------------

190

-- Far Sync for primary (bostonFS) --

BOSTONFS>select max(sequence#) from v$archived_log;

MAX(SEQUENCE#)

--------------

190

-- Physical standby (london) --

LONDON>select max(sequence#) from v$archived_log;

MAX(SEQUENCE#)

--------------

190

Simulate loss of archived logs on primary database (boston)

Let’s stop redo transport from primary (boston) and switch logs on primary so that far sync (bostonFS ) and physical standby (london) lag behind primary (boston).

DGMGRL> show database boston

Database - boston

Role: PRIMARY

Intended State: TRANSPORT-ON

Instance(s):

boston

Database Status:

SUCCESS

DGMGRL> edit database boston set state='TRANSPORT-OFF';

Succeeded.

BOSTON>alter system switch logfile;

alter system switch logfile;

System altered.

We can verify that logs are not being transported to far sync bostonFS and standby london. Although sequence# of the latest archived log is 194 on primary, archived logs up to sequence# 191 only have reached far sync and physical standby.

BOSTON>select max(sequence#) from v$archived_log;

MAX(SEQUENCE#)

--------------

194

BOSTONFS>select max(sequence#) from v$archived_log;

MAX(SEQUENCE#)

--------------

191

LONDON>select max(sequence#) from v$archived_log;

MAX(SEQUENCE#)

--------------

191

Next we’ll find out the names of archived logs generated on primary which have not been transported to standby:

BOSTON>select sequence#, name from v$archived_log where sequence# >191;

SEQUENCE# NAME

---------- ----------------------------------------------------------------

192 /u01/app/oracle/fast_recovery_area/BOSTON/archivelog/2015_02_09/o1_mf_1_192_bfjgx50j_.arc

193 /u01/app/oracle/fast_recovery_area/BOSTON/archivelog/2015_02_09/o1_mf_1_193_bfjgx6tb_.arc

194 /u01/app/oracle/fast_recovery_area/BOSTON/archivelog/2015_02_09/o1_mf_1_194_bfjgx6yh_.arc

In a real-time environment, archived logs on primary could be lost due to:

their deletion if archivelog deletion policy has not configured properly
their corruption

In our experimental setup, I’ll simulate loss of archived logs on primary by renaming them:

BOSTON>ho mv /u01/app/oracle/fast_recovery_area/BOSTON/archivelog/2015_02_09/o1_mf_1_192_bfjgx50j_.arc

/home/oracle/arch_192.log

BOSTON>ho mv /u01/app/oracle/fast_recovery_area/BOSTON/archivelog/2015_02_09/o1_mf_1_193_bfjgx6tb_.arc

/home/oracle/arch_193.log

BOSTON>ho mv /u01/app/oracle/fast_recovery_area/BOSTON/archivelog/2015_02_09/o1_mf_1_194_bfjgx6yh_.arc

/home/oracle/arch_194.log

Now even if we restart redo transport from primary, gap in redo logs on far_sync / standby cannot be resolved as some logs are missing on primary.

DGMGRL> edit database boston set state='Transport-on';

DGMGRL> show configuration;

Configuration - drsolution

Protection Mode: MaxPerformance

Databases:

boston - Primary database

Error: ORA-16724: cannot resolve gap for one or more standby databases

bostonfs - Far Sync

london - Physical standby database

london2 - Logical standby database (disabled)

londonfs - Far Sync (inactive

Fast-Start Failover: DISABLED

Configuration Status:

ERROR

DGMGRL> show far_sync bostonFS

Far Sync - bostonfs

Transport Lag: 3 minutes 43 seconds (computed 1 second ago)

Instance(s):

bostonFS

Far Sync Status:

SUCCESS

DGMGRL> show database london

Database - london

Role: PHYSICAL STANDBY

Intended State: APPLY-ON

Transport Lag: 3 minutes 53 seconds (computed 0 seconds ago)

Apply Lag: 3 minutes 53 seconds (computed 0 seconds ago)

Apply Rate: 759.00 KByte/s

Real Time Query: OFF

Instance(s):

london

Database Status:

SUCCESS

It can be verified that SCN# (3717618) of standby (london) is lagging behind that (3718999) of the primary (boston)

BOSTON>select current_scn from v$database;

CURRENT_SCN

-----------

3718999

LONDON>select current_scn from v$database;

CURRENT_SCN

----------

3717618

Refresh the physical standby using the RECOVER…FROM SERVICE command

First of all let us identify the datafiles on standby database which are out of sync with respect to primary.

On checking checkpoint_change# in datafile headers on primary (boston) and standby (london), we note that whereas checkpoint_change# of datafiles 5,7,8,9,10,11 match on primary and standby, for rest of the datafiles (1,3,4,6) standby is lagging behind primary.

BOSTON>select file#, checkpoint_change# from v$datafile;

FILE# CHECKPOINT_CHANGE#

---------- ------------------

1 3717658

3 3717658

4 3717658

5 1910129

6 3717658

7 1910129

8 3690127

9 3690127

10 3690127

11 3690127

10 rows selected.

LONDON>select file#, checkpoint_change# from v$datafile;

FILE# CHECKPOINT_CHANGE#

---------- ------------------

1 3717619

3 3717619

4 3717619

5 1910129

6 3717619

7 1910129

8 3690127

9 3690127

10 3690127

11 3690127

10 rows selected.

In order to synchronize the standby we will stop the managed recovery processes on the physical standby database and place the physical standby database in MOUNT mode.

LONDON>recover managed standby database cancel;

shutdown immediate;

startup mount;

Start RMAN and connect as target to the physical standby database. Refresh the data files on the physical standby database by using an incremental backup of the data files on the primary database.

The following command creates a compressed multi-section incremental backup on the primary database that is then used to refresh the standby data files. boston is the net service name of the primary database that is used to refresh the standby database. The NOREDO clause specifies that the archived redo log files must not be applied during recovery.

[oracle@host03 ~]$ . oraenv

ORACLE_SID = [london] ?

[oracle@host03 ~]$ rman target /

RMAN> recover database from service boston noredo using compressed backupset section size 100m;

Starting recover at 09-FEB-15

Starting implicit crosscheck backup at 09-FEB-15

allocated channel: ORA_DISK_1

channel ORA_DISK_1: SID=41 device type=DISK

Crosschecked 9 objects

Finished implicit crosscheck backup at 09-FEB-15

Starting implicit crosscheck copy at 09-FEB-15

using channel ORA_DISK_1

Crosschecked 2 objects

Finished implicit crosscheck copy at 09-FEB-15

— Catalogues all the existing backups and archivelogs available on standby —

searching for all files in the recovery area

cataloging files...

cataloging done

List of Cataloged Files

=======================

— Backups —

File Name: /u01/app/oracle/fast_recovery_area/LONDON/autobackup/2015_01_23/o1_mf_s_869760430_bd492tm3_.bkp

File Name: /u01/app/oracle/fast_recovery_area/LONDON/autobackup/2015_01_23/o1_mf_s_869760892_bd4b06ws_.bkp

...

File Name: /u01/app/oracle/fast_recovery_area/LONDON/autobackup/2015_01_22/o1_mf_s_869676552_bd1qongl_.bkp

— Archived logs —

File Name: /u01/app/oracle/fast_recovery_area/LONDON/archivelog/2015_02_06/o1_mf_1_175_bf919zvx_.arc

File Name: /u01/app/oracle/fast_recovery_area/LONDON/archivelog/2015_02_06/o1_mf_1_172_bf8pvrhn_.arc

...

File Name: /u01/app/oracle/fast_recovery_area/LONDON/archivelog/2015_01_21/o1_mf_1_54_bcyn59o8_.arc

File Name: /u01/app/oracle/fast_recovery_area/LONDON/archivelog/2015_01_21/o1_mf_1_60_bczt9p7o_.arc

— Skips all the datafiles (5, 7, 8, 9, 10, 11) whose checkpoint_change# on standby match that of primary are skipped for restore operation —

using channel ORA_DISK_1

skipping datafile 5; already restored to SCN 1910129

skipping datafile 7; already restored to SCN 1910129

skipping datafile 8; already restored to SCN 3690127

skipping datafile 9; already restored to SCN 3690127

skipping datafile 10; already restored to SCN 3690127

skipping datafile 11; already restored to SCN 3690127

— Restores multi-section compressed backupsets with section size of 100m of all the data files (1, 3, 4, 6) whose checkpoint_change# on standby is behind primary over network. —