Monday, March 23, 2009

SRVCTL fails to start a RAC database

SRVCTL has problems while starting the database (SRVCTL STOP DATABASE works ok).
Also, the database can be started and stopped fine using SQL*Plus. No problems there.

$ srvctl start database -d HLPR
PRKP-1001 : Error starting instance HLPR1 on node db01-lisqa
CRS-0215: Could not start resource 'ora.HLPR.HLPR1.inst'.
PRKP-1001 : Error starting instance HLPR2 on node db02-lisqa
CRS-0215: Could not start resource 'ora.HLPR.HLPR2.inst'.
PRKP-1001 : Error starting instance HLPR3 on node db03-lisqa
CRS-0215: Could not start resource 'ora.HLPR.HLPR3.inst'.
PRKP-1001 : Error starting instance HLPR4 on node db04-lisqa
CRS-0215: Could not start resource 'ora.HLPR.HLPR4.inst'.
$

As seen above, it is a 4-node RAC database.

2 issues here

1) REQUIRED_RESOURCE missing for one of the nodes
Doing a CRS_STAT on the above resources, shows that the REQUIRED_RESOURCES for node 4 instance (HLPR4) is missing. For the other 3 nodes the ASM dependency is defined. It is already verified that ASM is up and running on the 4th node (also there are other RAC databases on the same cluster that do not have any issues).
$ crs_stat -p ora.HLPR.HLPR1.inst|grep REQUIRED
REQUIRED_RESOURCES=ora.db01-lisqa.ASM1.asm
$ crs_stat -p ora.HLPR.HLPR2.inst|grep REQUIRED
REQUIRED_RESOURCES=ora.db02-lisqa.ASM2.asm
$ crs_stat -p ora.HLPR.HLPR3.inst|grep REQUIRED
REQUIRED_RESOURCES=ora.db03-lisqa.ASM3.asm
$ crs_stat -p ora.HLPR.HLPR4.inst|grep REQUIRED
REQUIRED_RESOURCES=
$


Fix

Add the REQUIRED_RESOURCE using "srvctl modify". First lets look at the syntax (-h for help) -
$ srvctl modify instance -h
Usage: srvctl modify database -d [-n ] [-m ] [-p ] [-r {PRIMARY | PHYSICAL_STANDBY | LOGICAL_STANDBY}] [-s ] [-y {AUTOMATIC | MANUAL}]
-d Unique name for the database
-n Database name (DB_NAME), if different from the unique name given by the -d option
-o ORACLE_HOME path
-m Domain for cluster database
-p Server parameter file path
-r Role of the database (primary, physical_standby, logical_standby)
-s Startup options for the database
-y Management policy for the database (automatic, manual)
-h Print usage
$
$ srvctl modify instance -d HLPR -i HLPR4 -s +ASM4
$
$ crs_stat -p ora.HLPR.HLPR4.inst|grep REQUIRE
REQUIRED_RESOURCES=ora.db04-lisqa.ASM4.asm
$



2) SPFILE was incorrectly defined in the OCR
The imon log for the database under /log//racg shows this --
$ cd $ORACLE_HOME/log//racg
$ vi imon_HLPR.log
...
...
2009-03-19 20:36:02.729: [ RACG][1] [8400][1][ora.HLPR.HLPR1.inst]: racgimon exiting

2009-03-19 20:43:34.680: [ RACG][1] [12432][1][ora.HLPR.HLPR1.inst]: racgimon started

2009-03-19 20:43:37.137: [ RACG][6] [12432][6][ora.HLPR.HLPR1.inst]:
SQL*Plus: Release 10.2.0.4.0 - Production on Thu Mar 19 20:43:36 2009

Copyright (c) 1982, 2007, Oracle. All Rights Reserved.
Enter user-name: Connected to an idle instance.

SQL> ORA-01565: error in identifying file '+RAC_REDO02/HLPR/spfileHLPR.ora'

2009-03-19 20:43:37.137: [ RACG][6] [12432][6][ora.HLPR.HLPR1.inst]: ORA-17503: ksfdopn:2 Failed to open file
+RAC_REDO02/HLPR/spfileHLPR.ora
ORA-15056: additional error message
ORA-17503: ksfdopn:DGOpenFile05 Failed to open file +RAC_REDO02/hlpr/spfile
ORA-01078: failure in processing system parameters
SQL> Disconnected
This shows that the SPFILE was incorrectly defined in the OCR

Fix
Modified OCR for the correct location of the SPFILE -
$ cat $ORACLE_HOME/dbs/initHLPR1.ora
spfile='+HLPR_DATA/HLPR/PARAMETERFILE/spfile.286.640265465'
$
$ srvctl modify database -d HLPR -p +HLPR_DATA/HLPR/PARAMETERFILE/spfile.286.640265465
$


This fixed the OCR and database could then be started using SRVCTL.

There is also one more issue that I noticed today in the listener configuration w/ CRS. On node4, the listener was up under db_home (typically it is seen under ASM home). Doing a "ps -ef|grep tns" will tell what OH is being used for the listener home. Also, the crs_stat -p output shows that.
[pwqa@db04-lisqa /home/pwqa] $ ps -ef|grep tns
pwqa 1982 1 0 May 13 ? 2:06 /d01/app/pwqa/product/10.2/pwqadb/bin/tnslsnr LISTENER_DB04-LISQA -inherit
pwqa 1965 1 0 May 13 ? 5:49 /d01/app/pwqa/product/10.2/pwqadb/bin/tnslsnr LISTENER_CGQA_DB04-LISQA -inherit
pwqa 2007 1 0 May 13 ? 4:33 /d01/app/pwqa/product/10.2/pwqadb/bin/tnslsnr LISTENER_HLPR_DB04-LISQA -inherit
pwqa 2041 1 0 May 13 ? 5:43 /d01/app/pwqa/product/10.2/pwqadb/bin/tnslsnr LISTENER_PWQA_DB04-LISQA -inherit
pwqa 2024 1 0 May 13 ? 6:33 /d01/app/pwqa/product/10.2/pwqadb/bin/tnslsnr LISTENER_PLQA_DB04-LISQA -inherit
pwqa 22092 23479 0 11:18:42 pts/1 0:00 grep tns



As we can see from above, listener "
LISTENER_DB04-LISQA" is running under the database home (/d01/app/pwqa/product/10.2/pwqadb). The crs_stat -p output for this resource shows this --
[pwqa@db04-lisqa /home/pwqa] $ crs_stat -p ora.db04-lisqa.LISTENER_DB04-LISQA.lsnr
NAME=ora.db04-lisqa.LISTENER_DB04-LISQA.lsnr
TYPE=application
ACTION_SCRIPT=/d01/app/pwqa/product/10.2/pwqadb/bin/racgwrap
ACTIVE_PLACEMENT=0
AUTO_START=1
CHECK_INTERVAL=600
DESCRIPTION=CRS application for listener on node
FAILOVER_DELAY=0
FAILURE_INTERVAL=0
FAILURE_THRESHOLD=0
HOSTING_MEMBERS=db04-lisqa
OPTIONAL_RESOURCES=
PLACEMENT=restricted
REQUIRED_RESOURCES=ora.db04-lisqa.vip
RESTART_ATTEMPTS=5
SCRIPT_TIMEOUT=600
START_TIMEOUT=0
STOP_TIMEOUT=0
UPTIME_THRESHOLD=7d
USR_ORA_ALERT_NAME=
USR_ORA_CHECK_TIMEOUT=0
USR_ORA_CONNECT_STR=/ as sysdba
USR_ORA_DEBUG=0
USR_ORA_DISCONNECT=false
USR_ORA_FLAGS=
USR_ORA_IF=
USR_ORA_INST_NOT_SHUTDOWN=
USR_ORA_LANG=
USR_ORA_NETMASK=
USR_ORA_OPEN_MODE=
USR_ORA_OPI=false
USR_ORA_PFILE=
USR_ORA_PRECONNECT=none
USR_ORA_SRV=
USR_ORA_START_TIMEOUT=0
USR_ORA_STOP_MODE=immediate
USR_ORA_STOP_TIMEOUT=0
USR_ORA_VIP=

[pwqa@db04-lisqa /home/pwqa] $


The action script clearly points to the database home.
To change this, we again use the "srvctl modify" command.

[pwqa@db04-lisqa /home/pwqa] $ srvctl modify listener -h
Usage: srvctl modify listener -n [-l ] -o
-n Node name
-l "" Comma separated listener names
-o ORACLE_HOME path
-h Print usage
[pwqa@db04-lisqa /home/pwqa] $

[pwqa@db04-lisqa /home/pwqa] $ srvctl modify listener -n db04-lisqa -l LISTENER_DB04-LISQA -o /d01/app/pwqa/product/10.2/ASM

[pwqa@db04-lisqa /home/pwqa] $ crs_stat -p ora.db04-lisqa.LISTENER_DB04-LISQA.lsnr
NAME=ora.db04-lisqa.LISTENER_DB04-LISQA.lsnr
TYPE=application
ACTION_SCRIPT=/d01/app/pwqa/product/10.2/ASM/bin/racgwrap
ACTIVE_PLACEMENT=0
AUTO_START=1
CHECK_INTERVAL=600
DESCRIPTION=CRS application for listener on node
FAILOVER_DELAY=0
FAILURE_INTERVAL=0
FAILURE_THRESHOLD=0
HOSTING_MEMBERS=db04-lisqa
OPTIONAL_RESOURCES=
PLACEMENT=restricted
REQUIRED_RESOURCES=ora.db04-lisqa.vip
RESTART_ATTEMPTS=5
SCRIPT_TIMEOUT=600
START_TIMEOUT=0
STOP_TIMEOUT=0
UPTIME_THRESHOLD=7d
USR_ORA_ALERT_NAME=
USR_ORA_CHECK_TIMEOUT=0
USR_ORA_CONNECT_STR=/ as sysdba
USR_ORA_DEBUG=0
USR_ORA_DISCONNECT=false
USR_ORA_FLAGS=
USR_ORA_IF=
USR_ORA_INST_NOT_SHUTDOWN=
USR_ORA_LANG=
USR_ORA_NETMASK=
USR_ORA_OPEN_MODE=
USR_ORA_OPI=false
USR_ORA_PFILE=
USR_ORA_PRECONNECT=none
USR_ORA_SRV=
USR_ORA_START_TIMEOUT=0
USR_ORA_STOP_MODE=immediate
USR_ORA_STOP_TIMEOUT=0
USR_ORA_VIP=

[pwqa@db04-lisqa /home/pwqa] $


The CRS registration has been changed.
Now we only need to stop the listener and restart it under the ASM home.