Oracle RAC Clusterware Startup Sequence in detail

This post describe about Oracle RAC Cluster startup sequence in detail and the step by step explanation which oracle follow to bring up the clusterware.

What is Oracle Clusterware – RAC ?

Oracle Real Application Clusters known as Oracle RAC uses Oracle Clusterware as the infrastructure that binds multiple nodes that then operate as a single server. In an Oracle RAC environment, Oracle Clusterware monitors all Oracle components such as instances and listeners. If a failure occurs, then Oracle Clusterware automatically attempts to restart the failed component and also redirects operations to a surviving component.

Must Read: How to Drop and Recreate Temp Tablespace in Oracle

Oracle Cluster Registry (OCR) :

OCR is a file which maintains cluster configuration information about any cluster database within the cluster. The OCR contains information such as information about the cluster node list and instance-to-node mapping information. OCR also contains information about Clusterware resource profiles for resources that you have customized. The OCR resides on shared storage that is accessible by all the nodes in your cluster. Oracle Clusterware can multiplex, or maintain multiple copies of, the OCR and Oracle recommends that you use this feature to ensure high availability.

Oracle Local Registry – OLR

Each node in a cluster also has a local copy of the OCR, called an Oracle Local Registry – OLR, which is created when Oracle Clusterware is installed. Multiple processes on each node have simultaneous read and write access to the OLR particular to the node on which they reside, whether Oracle Clusterware is fully functional. By default, OLR is located at Grid_home/cdata/$HOSTNAME.olr

Voting Disk

The voting disk manage node membership information. It is a shared disks that will be accessed by all the member of the nodes in the cluster. It is used as central reference for all the nodes and keeps the heartbeat information between the nodes. If any of the node is unable to ping the voting disk,cluster immediately recognize the communication failure and evicts the node from the cluster.

Brief explanation of the startup sequence.

Image Credit: Oracle

Once the Operating system starts and finish the boot scrap process it reads /etc/init.d file via the initialisation daemon called init or init.d. The init tab file is the one it triggers oracle high availability service daemon.

  1. When a node of an Oracle Clusterware cluster starts, OHASD is started by platform-specific means like init.d in Linux. OHASD is the root for bringing up Oracle Clusterware. OHASD has access to the OLR (Oracle Local Registry) stored on the local file system. OLR provides needed data to complete OHASD initialization.
  2. OHASD brings up GPNPD and CSSD ( Cluster synchronization Service Daemon ). CSSD has access to the GPNP Profile stored on the local file system. This profile contains the following vital bootstrap data:
    a. ASM Diskgroup Discovery String
    b. ASM SPFILE location (Diskgroup name)
    c. Name of the ASM Diskgroup containing the Voting Files
  3. The Voting Files locations on ASM Disks are accessed by CSSD with well-known pointers in the ASM Disk headers and CSSD is able to complete initialization and start or join an existing cluster.
  4. OHASD starts an ASM instance and ASM can now operate with CSSD initialized and operating. The ASM instance uses special code to locate the contents of the ASM SPFILE, assuming it is stored in a Diskgroup.
  5. With an ASM instance operating and its Diskgroups mounted, access to Clusterware’s OCR is available to CRSD.
  6. OHASD starts CRSD with access to the OCR in an ASM Diskgroup.
  7. Clusterware completes initialization and brings up other services under its control.

As Per Oracle doc below are the high level steps for clusterware initialization.

INIT spawns init.ohasd (with respawn) which in turn starts the OHASD process (Oracle High Availability Services Daemon). This daemon spawns 4 processes.

Level 1: OHASD Spawns:
• cssdagent – Agent responsible for spawning CSSD.
• orarootagent – Agent responsible for managing all root owned ohasd resources.
• oraagent – Agent responsible for managing all oracle owned ohasd resources.
• cssdmonitor – Monitors CSSD and node health (along wth the cssdagent).

Level 2: OHASD rootagent spawns:
• CRSD – Primary daemon responsible for managing cluster resources.
• CTSSD – Cluster Time Synchronization Services Daemon
• Diskmon
• ACFS (ASM Cluster File System) Drivers

Level 3: OHASD oraagent spawns:
• MDNSD – Used for DNS lookup
• GIPCD – Used for inter-process and inter-node communication
• GPNPD – Grid Plug & Play Profile Daemon
• EVMD – Event Monitor Daemon
• ASM – Resource for monitoring ASM instances

Level 4: CRSD spawns:
• orarootagent – Agent responsible for managing all root owned crsd resources.
• oraagent – Agent responsible for managing all oracle owned crsd resources.
Level 4: CRSD rootagent spawns:
• Network resource – To monitor the public network
• SCAN VIP(s) – Single Client Access Name Virtual IPs
• Node VIPs – One per node
• ACFS Registery – For mounting ASM Cluster File System
• GNS VIP (optional) – VIP for GNS

Level 5: CRSD oraagent spawns:
• ASM Resource – ASM Instance(s) resource
• Diskgroup – Used for managing/monitoring ASM diskgroups.
• DB Resource – Used for monitoring and managing the DB and instances
• SCAN Listener – Listener for single client access name, listening on SCAN VIP
• Listener – Node listener listening on the Node VIP
• Services – Used for monitoring and managing services
• ONS – Oracle Notification Service
• eONS – Enhanced Oracle Notification Service
• GSD – For 9i backward compatibility
• GNS (optional) – Grid Naming Service – Performs name resolution

The following command will display the status of all cluster resources:

$ ./crsctl status resource -t

Clusterware Important Log File Locations
Clusterware daemon logs are all under : grid_home/log/node_name. Structure under grid_home/log/node_name: alert_node_name.log. You can look here first for most clusterware issues:


./admin:
./agent:
./agent/crsd:
./agent/crsd/oraagent_oracle:
./agent/crsd/ora_oc4j_type_oracle:
./agent/crsd/orarootagent_root:
./agent/ohasd:
./agent/ohasd/oraagent_oracle:
./agent/ohasd/oracssdagent_root:
./agent/ohasd/oracssdmonitor_root:
./agent/ohasd/orarootagent_root:
./client:
./crsd:
./cssd:
./ctssd:
./diskmon:
./evmd:
./gipcd:
./gnsd:
./gpnpd:
./mdnsd:
./ohasd:
./racg:
./racg/racgeut:
./racg/racgevtf:
./racg/racgmain:
./srvm:
ASM logs live under $ORACLE_BASE/diag/asm/+asm//trace