Thursday, November 1, 2007

Switchover from Primary to Standby in RAC

Recently we were asked to perform a switchover in a RAC environment and although it seemed difficult initially, it turned out to be rather quick and simple (provided, of course you did things in the right order!).

Following the documentation at http://download.oracle.com/docs/cd/B19306_01/server.102/b14239/role_management.htm#i1030646
in preparation for a switchover, I shut instance 2 of standby down (the one that was not involved in the media recovery).

Per the documentation:
Verify that all but one RAC instance on the standby databases in a Real Application Clusters configuration are shut down.
For a Real Application Clusters database, only one RAC instance on the standby database can be online during the role transition. Shut down all other instances before starting the role transition. Then, after the role transition completes, bring these instances back online.
I ensured that the primary and standby databases were in sync and then went ahead with the ALTER DATABASE COMMIT TO SWITCHOVER.. command on one of the two primary instances --

GCPROD1_SQL > ALTER DATABASE COMMIT TO SWITCHOVER TO PHYSICAL STANDBY with session shutdown;
ALTER DATABASE COMMIT TO SWITCHOVER TO PHYSICAL STANDBY with session shutdown
*
ERROR at line 1:
ORA-01105: mount is incompatible with mounts by other instances


After doing some research, I realized that you need to shutdown ALL but one instance on the primary side also (other words, before running the switchover command in a RAC env, keep only one instance running at primary site and one instance at the DR site).

On primary node 1
After shutting down instance 2 at the primary site, I was able to run the SWITCHOVER successfully on instance 1 of the primary.
[Note: After the switchover, the instance is brought up in NOMOUNT stage. So I did a shutdown and mounted it again]

GCPROD1_SQL > ALTER DATABASE COMMIT TO SWITCHOVER TO PHYSICAL STANDBY WITH SESSION SHUTDOWN;


Database altered.

GCPROD1_SQL > shutdown immediate
ORA-01507: database not mounted

ORACLE instance shut down.

GCPROD1_SQL > startup NOMOUNT
ORACLE instance started.

Total System Global Area 1543503872 bytes
Fixed Size 2072960 bytes
Variable Size 369102464 bytes
Database Buffers 1157627904 bytes
Redo Buffers 14700544 bytes
Database mounted.

GCPROD1_SQL > SELECT SWITCHOVER_STATUS FROM V$DATABASE;


SWITCHOVER_STATUS
--------------------
SESSIONS ACTIVE

Notice that I used the "..WITH SESSION SHUTDOWN" option as I was unable to bring a few active internal processes down.

Now that our old primary has become the new standby, we need to fix a few INIT parameters related to log shipping.

a) LOG_ARCHIVE_DEST_n setting

Here's our old settings --
GCPROD1_SQL > show parameter log_archive_dest

NAME TYPE VALUE
------------------------------------ -------- ------------------------------
log_archive_dest_1 string location="+FLASHDG", valid_for
=(ONLINE_LOGFILES, PRIMARY_ROLE)
log_archive_dest_10 string
log_archive_dest_2 string service="GCPROD_STBY", LGWR AS
YNC NOAFFIRM delay=0 OPTIONAL
max_failure=10 max_connection
s=4 reopen=15 db_unique_name="
GCPROD" register net_timeout
=180 valid_for=(online_logfile
,primary_role)
...
...
[removed for brevity]

We can remove log_archive_dest_2 as its not needed on a standby database. Also log_archive_dest_1 needs to be changed.

GCPROD1_SQL > ALTER SYSTEM SET log_archive_dest_1='LOCATION=+FLASHDG' SCOPE=BOTH;
GCPROD1_SQL > ALTER SYSTEM SET log_archive_dest_2='';

[Note: For, log_archive_dest_1 we may also set as LOCATION=+FLASHDG valid_for=(STANDBY_LOGFILES, STANDBY_ROLE)'. That should also work ok]


b) FAL settings
FAL_SERVER should point to the primary database (tns string) and FAL_CLIENT should point to the Standby database (self)

GCPROD1_SQL > ALTER SYSTEM SET fal_client='GCPROD_STBY' SCOPE=BOTH;
GCPROD1_SQL > ALTER SYSTEM SET fal_server='GCPROD_PRIM' SCOPE=BOTH;


Make sure we update the TNSNAMES.ora file for GCPROD_STBY and GCPROD_PRIM. GCPROD_STBY should resolve to the standby database and GCPROD_PRIM should resolve to the primary (ie. the erstwhile standby; that we haven't switched to the primary role yet -- we'll do that shortly)

After making these change, we put the new standby database in recovery mode --

GCPROD1_SQL > alter database recover managed standby database disconnect from session;

Database altered.




On standby node 1
Next, we perform a switchover of the old standby database to make it the new PRIMARY.
[Note: I had to perform media recovery as shown below as I had accidentally canceled the recovery earlier -- this might not be needed if the (old) standby was already in recovery mode]

GCPROD1_SQL > ALTER DATABASE COMMIT TO SWITCHOVER TO PRIMARY;
ALTER DATABASE COMMIT TO SWITCHOVER TO PRIMARY
*
ERROR at line 1:
ORA-16139: media recovery required

GCPROD1_SQL > alter database recover managed standby database disconnect from session;

Database altered.

GCPROD1_SQL > ALTER DATABASE COMMIT TO SWITCHOVER TO PRIMARY;
Database altered.

GCPROD1_SQL > alter database open;
Database altered.


At this point our new primary database is open and available

Now we've to setup the log shipping related INIT parameters on our new primary.

a) LOG_ARCHIVE_DEST_n setting

Here's our old settings --
GCPROD1_SQL > show parameter log_archive_dest

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
log_archive_dest string
log_archive_dest_1 string LOCATION=+FLASHDG
log_archive_dest_10 string
log_archive_dest_2 string
log_archive_dest_3 string

...
...
[removed for brevity]

GCPROD1_SQL > ALTER SYSTEM SET log_archive_dest_1='location="+FLASHDG", valid_for=(ALL_LOGFILES, ALL_ROLES)' SCOPE=BOTH;

GCPROD1_SQL > ALTER SYSTEM SET log_archive_dest_2='service="GCPROD_STBY", LGWR ASYNC NOAFFIRM delay=0 OPTIONAL max_failure=10 max_connections=4 reopen=15 db_unique_name="GCPROD" register net_timeout=180 valid_for=(online_logfile,pr
imary_role)' SCOPE=BOTH;

Also don't forget to enable the destination (else the logs don't ship automatically).

GCPROD1_SQL > ALTER SYSTEM SET log_archive_dest_state_2='ENABLE' SCOPE=BOTH;


b) FAL settings
Primary database doesn't need any FAL settings, so we may remove those.

GCPROD1_SQL > ALTER SYSTEM SET fal_client='' SCOPE=BOTH;
GCPROD1_SQL > ALTER SYSTEM SET fal_server='' SCOPE=BOTH;

This completes the SWITCHOVER activity pretty much.
If the DB_UNIQUE names are different for some reason on the primary and DR sides, we also need to update LOG_ARCHIVE_CONFIG parameter for dg_config setting otherwise we might see the following errors in the alert log of the primary database --

Errors in file /u01/app/oracle/admin/GCPROD/bdump/gcprod1_arc1_1316.trc:
ORA-16047: DGID mismatch between destination setting and standby
...
PING[ARC1]: Heartbeat failed to connect to standby 'GCPROD_STBY'. Error is 16047.


The solution is to set the LOG_ARCHIVE_CONFIG parameter to show the db_unique_name (if that is different than db_name, for any reason)

GCPROD1_SQL > ALTER SYSTEM SET log_archive_config='dg_config=(GCPROD, GCPRODDR)' SCOPE=BOTH;

[where GCPROD is db_unique_name on primary side and GCPRODDR is the db_name on the DR side]

All is set now. We can verify it by performing a few log switches on the primary side
The only remaining item would be to make sure the TNSNAMES file is updated on both sides.

3 comments:

Lior said...

This is a great post.
Thank you.

John said...

Thanks, great post. Had the same issue with the mount error in 11g!

Cheers

Ravi Gaur said...

Thanks John and Lior!
Appreciate the feedback.