Monday, July 15, 2013

DBA Guard Broker



Data Guard Broker
The broker is part of Oracle's Database Enterprise Manager and an integral part of Data Guard, it is the management framework. You can of course use SQL*Plus to manage Data Guard but the broker when used with the Enterprise Manager becomes a powerful tool offering the following
  • Configuration
  • Monitoring
  • Alerting
  • Performance Analysis
  • Manual switchover's
  • Automatic switchover's
Before I discuss the integration of the broker into the Enterprise Manager lets first discuss the broker itself. The broker can make changes to a Data Guard setup (physical or logical database), the type of changes are configuration, transport modes, apply setup and role change services and the protection mode. You can also monitor the health of any configuration, it is also responsible for implementing and managing the automatic failover capability also know as Fast-Start Failover (FSFO).
The broker has three parts, a set of background processes on each database, a set of configuration files and a commandline interface (CLI) called DGMGRL.
The broker has a number of processes running, the broker manages these processes automatically you have no control over them
  • Data Guard Monitor (DMON) - this is the main process and is responsible for coordinating all broker actions as well as maintaining the configuration files, this process is enabled or disabled with the DG_BROKER_START parameter
  • Broker Resource Manager (RSM) - is responsible for handling any SQL commands used by the broker that need to be executed on one of the databases in the configuration. These SQL commands are made as a result of a change to the configuration made through DGMGRL or are the configuration commands executed by the broker during the database startup
  • Data Guard Net Server (NSVn) - from 1 to n can exist, they are responsible for making contact with the remote databases and sending across any work items to the remote database.
  • DRCn - these network receiver processes establish the connection from the source database NSVn process, this is a similar connection to the LogWriter Network Service (LNS) to a Remote File Server (RFS) connection for redo transport. When the broker needs to send something (data or SQL) between databases it uses this NSV to DRC connection, these connection are started as needed.
  • Configuration files - the configuration is stored in two binary command files. The files contain the complete configuration setup, the database states, how to connect to each one and what parameters to setup when each database starts up.
In a broker configuration it is the Data Guard Monitor (DMON) process on the primary database that is the owner of the configuration, all orders will come from the primary even if use a DGMGRL CLI from another server, the standby database will receive all configuration changes via the primary.
When the DMON process communicates with the standby database it uses one of the NSV processes to send the work to the standby, this protects the DMON from a hang if the network should go down. Whenever the DMON needs to execute some SQL it will enlist the aid of the RSM process on the primary database, if the SQL is for the primary database it will execute it directly, however if the SQL is for a standby database the RSM process asks the NSV process to send it to the standby database again this protects the RSM from a network hang.
Each NSV process has a DRC partner process on the standby database, which will perform the work of the NSV process, the DRC process on the standby will return the results or status back to the NSV on the primary.
On startup the DMON will communicate with each standby via the NSV-DRC connection process pair to establish communication and configuration information so that the standby can start the apply services, if there are any problems check the alert.log after the NVS section to see if the are any TNS errors.
When using a RAC environment there are another set of process called the internode servers (INSV) which maintain a connection between the nodes in the cluster to ensure that the broker on each node knows the state of the cluster. A INSV process will always be started on the primary even if the primary is not in a RAC.
Lastly you may see one more process called the Fast-Start-Failover Process (FSFP), which is used only when the primary database is under the control of Data Guard automatic failover feature Fast-Start Failover. We will be discussing FSFP in another section switchover and failover.
Configuration Files
Each database has two copies of the configuration, stored in the below parameter locations
  • DG_BROKER_CONFIG_FILE1
  • DG_BROKER_CONFIG_FILE2
By default they are stored in $ORACLE_HOME/dbs directory with the filename of dr1<DB_UNIQUE_NAME>.dat and dr2<DB_UNIQUE_NAME>.dat, if using a RAC these should be keep on a shared location as only one set of copies can be exist for the entire RAC.
The primary has the master copies of the configuration files, all changes to these files are done by the primary, even if you are on the standby server the configuration changes will be passed to the primary via NSV-DRC processes and the DMON process will make the change which is then propagated back to the standby databases. The reason for the multiple copies of the configuration file is if the primary has a problem then each standby database has a copy, when a failover occurs the standby that becomes the primary will then become the master of those configuration files.
Broker
You have two choices to interact with Data Guard either Enterprise Manager (EM) or the broker CLI DGMGRL you can swap between the two with a few simple configuration changes. To gain full full functionality of Data Guard through Grid Control you must use the broker.
Broker CLI DGMGRL comes with Enterprise Manager, you can run the broker on any platform you wish it does not need to be the same platform as the primary or standby databases.
You can access the DGMGRL command line using the below
DGMGRL CLI
[primargydg01] > dgmgrl sys/oracle
[primargydg01] > dgmgrl sys/oracle@PROD1
When you start the DGMGRL commandline it does not connect you to a database, only the Data Guard configuration. Before you start to use the broker there are a number of things to perform first
  • Configure the Broker parameters
  • The broker and the listener
  • RAC and the broker
  • Connecting to the broker
Even if you use Enterprise Manager you should really have a understanding on how the CLI works, as mentioned above there are two configuration files, both the systems parameters must be set on the primary and all standby databases, you put set this files anywhere you like
data Guard configuration files location
# Directory
alter system set DG_BROKER_CONFIG_FILE1 = '<directory>/<file>';
alter system set DG_BROKER_CONFIG_FILE2 = '<directory>/<file>';
# ASM
alter system set DG_BROKER_CONFIG_FILE1 = '+DATA/PROD1/Broker/dr1PROD1.dat';
alter system set DG_BROKER_CONFIG_FILE1 = '+FLASH/PROD1/Broker/dr2PROD1.dat';
Note: if you are using ASM then the directories must be created already, use the asmcmd command to do this, the broker will create symlinks to the actual config files but the directories must be there
In a RAC environment they should be in a shared area and all nodes should point to the same files, only one set of files must exist in a RAC.
Once the directory has been created you are ready to startup the broker, do this on all databases (primary and standby)
Start the Broker
alter system set DG_BROKER_START=TRUE SCOPE=BOTH;
When you start the broker it does not configure any files yet because you have to use the DGMGRL command, all it does is start the necessary processes we mentioned earlier. If you need to change the location of the files the steps are below
  1. stop the broker using the DG_BROKER_START parameter by setting this to false (on all nodes in the cluster)
  2. change the DG file destination parameters
  3. copy the files to the new location
  4. then re-start the broker, again set the DG_BROKER_START parameter to true (on all nodes in the cluster)
The broker uses the Oracle Net Services to make connections to the databases, setup both redo transport and archive gap resolution and perform role transitions. We need to create a special static entry in the listener.ora file for each database in the broker configuration, this entry makes it possible for the broker to connect to an idle instance using a remote SYSDBA connection and perform the necessary startup. Here is an example
Broker listener.ora static entry
## Primary and all standby databases
SID_LIST_LISTENER =
  (SID_LIST =
     (SID_DESC =
        (GLOBAL_DBNAME = PROD1DR_DGMGRL)
        (ORACLE_HOME = /scratch/OracleHomes/OraHome111)
        (SID_NAME = PROD1DR)
     )
  )

SID_LIST_LISTENER =
  (SID_LIST =
     (SID_DESC =
        (GLOBAL_DBNAME = PROD1_DGMGRL)
        (ORACLE_HOME = /scratch/OracleHomes/OraHome111)
        (SID_NAME = PROD1)
     )
  )
Once the above has been done we are ready to create a broker configuration, there is no difference in a non-RAC or RAC environment, to connect to the configuration you use the dgmgrl command but one word of advise make sure that you use a specific user and password and not the / as the user, this has know to cause problems at a later date
DGMGRL CLI
[primargydg01] > dgmgrl sys/oracle
[primargydg01] > dgmgrl sys/oracle@PROD1
There are four main area's to DGMGRL CLI
  • Connection and Help - connect, help and exit
  • Creation and Editing - create, add, enable, edit and convert
  • Monitoring - show
  • Role Transition - switchover, failover and reinstate
If you used the Enterprise Manager/Grid Control to setup your standby database then the next steps would have already been performed for you, if however you used the Power User Method then you need to follow the next steps to configure the broker.
The first step is to create the base configuration
Create base configuration
# Primary Database server

DGMGRL> connect sys/password
DGMGRL> create configuration prod1 as
> primary database is prod1
> connect identifier is prod1;

Configuration "prod1" created with primary database "prod1"
The above command would have created a base configuration, you can display the configuration using the below command
Display configuration
# Primary Database server

DGMGRL> connect sys/password
DGMGRL> show configuration
Configuration - prod1
Protection Mode: MaxPerformance
Databases:
prod1 - Primary database
prod1dr - Physical standby database
Fast-Start Failover: DISABLED
Configuration Status:
SUCCESS
## Here is the same screen shot after adding a logical server
DGMGRL> show configuration;
Configuration - prod1
Protection Mode: MaxPerformance
Databases:
prod1 - Primary database
prod1dr - Physical standby database
prod1lr - Logical standby database
Fast-Start Failover: DISABLE
Configuration Status:
SUCCESS
Next we add the standby database to the configuration, if you receive an error stating that the properties could not be imported from the database, you need to check DB_UNIQUE_NAME also the TNSNAME and the transport parameter (LOG_ARCHIVE_DEST_n), so what is the proper setup, you must have your redo transport parameter defined using the DB_UNIQUE_NAME method, meaning that each redo transport parameter must contain the DB_UNIQUE_NAME=<name> attribute. The broker will search all of your LOG_ARCHIVE_DEST_n parameters looking for a database with a unique name that matches the database name you entered for the command. If you have not done this then you need to use the full set of arguments to add the database
Add the standby database
# Primary Database server - if you have setup db_unique_name, tnsname and log_archive_dest_n

DGMGRL> connect sys/password
DGMGRL> add database prod1dr;

# Primary Database server - the full command set

DGMGRL> connect sys/password
DGMGRL> add database prod1dr

> as connect identifier is prod1dr
> maintained as physical;
Database "prod1dr" added
# if you want to add a logical standby you can use the following
DGMGRL> add database prod1lr
> as connect identifier is prod1lr
> maintained as logical;

Database "prod1lr" added
When you display the configuration you should see that the standby database has been added
Display configuration
# Primary Database server

DGMGRL> connect sys/password
DGMGRL> show configuration
Configuration - prod1
Protection Mode: MaxPerformance
Databases:
prod1 - Primary database
prod1dr - Physical standby database
Fast-Start Failover: DISABLED
Configuration Status:
DISABLED
## Because we have not setup any parameters for Broker all parameters defaults will be presumed,
## some parameters are set by examining the database others are Broker default values.

DGMGRL> show database verbose prod1
Database - prod1
Role: PRIMARY
Intended State: OFFLINE
Instance(s):
PROD1
Properties:
DGConnectIdentifier = 'prod1'
ObserverConnectIdentifier = ''
LogXptMode = 'ASYNC'
DelayMins = '0'
Binding = 'optional'
MaxFailure = '0'
MaxConnections = '1'
ReopenSecs = '300'
NetTimeout = '30'
RedoCompression = 'DISABLE'
LogShipping = 'ON'
PreferredApplyInstance = ''
ApplyInstanceTimeout = '0'
ApplyParallel = 'AUTO'
StandbyFileManagement = 'AUTO'
ArchiveLagTarget = '0'
LogArchiveMaxProcesses = '4'
LogArchiveMinSucceedDest = '1'
DbFileNameConvert = ''
LogFileNameConvert = ''
FastStartFailoverTarget = ''
StatusReport = '(monitor)'
InconsistentProperties = '(monitor)'
InconsistentLogXptProps = '(monitor)'
SendQEntries = '(monitor)'
LogXptStatus = '(monitor)'
RecvQEntries = '(monitor)'
HostName = 'primarydg01'
SidName = 'PROD1'
StaticConnectIdentifier = '(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=primarydg01)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=PROD1_DGMGRL)(INSTANCE_NAME=PROD1)(SERVER=DEDICATED)))'
StandbyArchiveLocation = 'USE_DB_RECOVERY_FILE_DEST'
AlternateLocation = ''
LogArchiveTrace = '0'
LogArchiveFormat = '%t_%s_%r.dbf'
TopWaitEvents = '(monitor)'
Database Status:
DISABLED
DGMGRL> show database verbose prod1dr
Database - prod1dr
Role: PHYSICAL STANDBY
Intended State: OFFLINE
Transport Lag: (unknown)
Apply Lag: (unknown)
Real Time Query: OFF
Instance(s):
PROD1DR
Properties:
DGConnectIdentifier = 'prod1dr'
ObserverConnectIdentifier = ''
LogXptMode = 'ASYNC'
DelayMins = '0'
Binding = 'OPTIONAL'
MaxFailure = '0'
MaxConnections = '1'
ReopenSecs = '300'
NetTimeout = '30'
RedoCompression = 'DISABLE'
LogShipping = 'ON'
PreferredApplyInstance = ''
ApplyInstanceTimeout = '0'
ApplyParallel = 'AUTO'
StandbyFileManagement = 'AUTO'
ArchiveLagTarget = '0'
LogArchiveMaxProcesses = '4'
LogArchiveMinSucceedDest = '1'
DbFileNameConvert = ''
LogFileNameConvert = ''
FastStartFailoverTarget = ''
StatusReport = '(monitor)'
InconsistentProperties = '(monitor)'
InconsistentLogXptProps = '(monitor)'
SendQEntries = '(monitor)'
LogXptStatus = '(monitor)'
RecvQEntries = '(monitor)'
HostName = 'physicaldg01'
SidName = 'PROD1DR'
StaticConnectIdentifier = '(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=physicaldg01)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=PROD1DR_DGMGRL)(INSTANCE_NAME=PROD1DR)(SERVER=DEDICATED)))'
StandbyArchiveLocation = 'USE_DB_RECOVERY_FILE_DEST'
AlternateLocation = ''
LogArchiveTrace = '0'
LogArchiveFormat = '%t_%s_%r.dbf'
TopWaitEvents = '(monitor)'
Database Status:
DISABLED
One parameter to watch for is the local archiving parameter on the primary and standby databases, the broker will modify this parameter and if necessary add the VALID_FOR attribute in preparation for the archival of your standby redo log files.
One area to watch is if you add an additional standby database, the broker could presume the wrong parameter values, for instance you may want one standby to use SYNC and another to use ASYNC, but if you configure the second standby database using the short method it may default to SYNC, so when you do add any additional standby databases always double check the parameters are set correctly for the type of configuration that you want, any changes should be done before enabling the additional standby database.
Once we are happy that all standby databases have been added and their parameters have been check we are ready to enable the configuration, the command will issue alter system commands on both the primary and standby databases, start the redo transport to the standby databases and the apply services, before examining the configuration give it sometime to complete as it does take a while, otherwise you see some ORA-16610 errors.
Enabling the configuration
# Primary Database server

DGMGRL> connect sys/password
DGMGRL> enable configuration
Enabled.
You can tail the alert log file and see what action is taking place, hopefully after a few minutes you should see a successful configuration running, you can also check the redo transport and apply services by viewing the v$managed_standby view.
Display configuration
# Primary Database server

DGMGRL> connect sys/password
DGMGRL> show configuration;
Configuration - prod1
Protection Mode: MaxPerformance
Databases:
prod1 - Primary database
prod1dr - Physical standby database
Fast-Start Failover: DISABLED
Configuration Status:
SUCCESS
# Standby database - check the LGWR - RFS connection process and the MRP0 is in the applying state

sql> select client_process, process, thread#, sequence#, status from v$managed_standby;
Broker Properties
As I mentioned above the broker has a number of properties which can be changed, to change the property we use the EDIT command in the DGMGRL CLI
Edit configuration
EDIT CONFIGURATION SET PROPERTY <name>=<value>
EDIT DATABASE <db_name> SET PROPERTY <name>=<value>
EDIT INSTANCE <in_name> SET PROPERTY <name>=value>
Configuration Properties
BystandersFollowRoleChange

FastStartFailoverAutoReinstate

FastStartFailoverLagLimit

FastStartFailoverPmyShutdown

FastStartFailoverThreshold

CommunicationTimeout
the amount of time the broker will wait for a response before giving up (default 180 seconds), you can remove any timeout by setting this to 0
Database Properties
FastStartFailoverTarget

ObserverConnectIdentifier

ApplyInstanceTimeout
defines how long the broker should wait until moving the apply process to another instance in a standby RAC, default is 0 when means immediately
PreferredApplyInstance
allows you to tell the broker where you would like the apply to run when you have a RAC, by default it is empty which means it can choose any instance
ArchiveLagTarget
alters database parameter: ARCHIVE_LAG_TARGET
DbFileNameConvert
alters database parameter: DB_FILE_NAME_CONVERT
LogArchiveMaxProcesses
alters database parameter: LOG_ARCHIVE_MAX_PROCESSES
LogArchiveMinSuccessDest
alters database parameter: LOG_ARCHIVE_IN_SUCCEED_DEST
LogFileNameConvert
alters database parameter: LOG_FILE_NAME_CONVERT
LogShipping
(standby role only)
enables or defers redo transport to that standby database.
alters database parameter: LOG_ARCHIVE_DEST_STATE_n
StandbyFileManagement
(standby role only)
alters database parameter: STANDBY_FILE_MANAGEMENT
Instance Properties
HostName
No explaination needed here
SidName
No explaination needed here
LogArchiveTrace
alters database parameter: LOG_ARCHIVE_TRACE
LogArchiveFormat
alters database parameter: LOG_ARCHIVE_FORMAT
StandbyArchiveLocation
alters database parameter: LOG_ARCHIVE_DEST_n
AlternateLocation
alters database parameter: LOG_ARCHIVE_DEST_n
LsbyMaxSga
alters database parameter: MAX_SGA
LsbyMaxServers
alters database parameter: MAX_SERVERS
Broker State and Monitoring
There are a number of commands that you can use to change the state of the database
turn off/on the redo transport service for all standby databases
Primary
DGMGRL> edit database prod1 set state=transport-off;

DGMGRL> edit database prod1 set state=transport-on;
turn off/on the apply state
Standby
DGMGRL> edit database prod1dr set state=apply-off;

DGMGRL> edit database prod1dr set state=apply-on;
put a database into a real-time query mode
Standby
DGMGRL> edit database prod1dr set state=apply-off;
sql> alter database open read only;
DGMGRL> edit database prod1dr set state=apply-on;
change the protection mode
Primary
# Choose what level of protection you require
sql> alter database set standby to maximize performance;
sql> alter database set standby to maximize availability;
sql> alter database set standby to maximize protection;
# display the configuration
DGMGRL> show configuration
There are a number of useful monitoring commands and log files that can help with diagnosing problems
configuration
DGMGRL> show configuration;
database
DGMGRL> show database prod1;
DGMGRL> show database prod1dr;

# There are a number of specific information commands, here are the most used
DGMGRL> show database prod1 statusreport;
DGMGRL> show database prod1 inconsistentProperties;
DGMGRL> show database prod1 inconsistentlogxptProps;
DGMGRL> show database prod1 logxptstatus;
DGMGRL> show database prod1 latestlog;
Logfiles
# change the instance name to reflect the one you have choosen

prod1 (alert log): /u01/app/oracle/diag/rdbms/prod1/PROD1/trace/alert_PROD1.log
prod1 (DG log): /u01/app/oracle/diag/rdbms/prod1/PROD1/trace/drcPROD1.log
prod1dr (alert log): /u01/app/oracle/diag/rdbms/prod1dr/PROD1DR/trace/alert_PROD1DR.log
prod1dr (DG log): /u01/app/oracle/diag/rdbms/prod1dr/PROD1DR/trace/drcPROD1DR.log
Lastly there is a utility called the "DG Menu Utility", this can be downloaded from the Oracle Data Guard 11g handbook web site http://www.dataguardbook.com, I will be covering this in my monitoring section but below is a screen shot of the menu
DG Menu Utility
DG Menu
========================================
# ------------------------------------------------------------------------- #
# #
# Data Guard Check List - primarydg01
# #
# 0. Review database information and status #
# 1. Check for password file #
# 2. Check for forced logging and unrecoverable activities #
# ------------------------------------------------------------------------- #
# 3. Check for archive log mode #
# 4. Check for standby redo logs #
# 5. Check current SCN on primary and standby databases #
# 6. Check archive log destinations #
# ------------------------------------------------------------------------- #
# 7. Check Data Guard Status View for errors and fatal messages #
# 8. Check Managed Recovery Process Status #
# 9. Check for missing archive logs #
# 10. Check archive log gaps on the standby database #
# 11. Check average apply rate / active apply rate #
# 12. Check transport / apply lag #
# 13. How far behind is my Data Guard in terms of time? #
# #
# ------------------------------------------------------------------------- #
# 20. Launch the Logical Standby Data Guard Submenu #
# ------------------------------------------------------------------------- #
# 21. Generate init.ora entries for primary database #
# 22. Generate init.ora entries for standby database #
# 23. Generate tnsnames.ora entries for primary and standby databases #
# 24. Generate SQL syntax to create standby redo logs #
# #
# ------------------------------------------------------------------------- #
# 30. Generate syntax to duplicate standby database from active database #
# #
# x. Exit #
# ------------------------------------------------------------------------- #
# Enter Task Number:

# ------------------------------------------------------------------------- #
# #
# Logical Standby Data Guard Check List - primarydg01
# #
# 1. Check Logical Progress - View Overall Progress Of SQL Apply #
# 2. Check Logical Events - History on Logical Standby Apply Activity #
# 3. Check Logical Events - Detailed View #
# 4. Check Logical Stats - Logical Standby Stats #
# 5. Check Logical Parameters - Logical Standby Parameters #
# 6. Look At What The Logical Standby Processes Are Doing #
# Coordinator, Reader, Builder, Preparer, Analyzer, Applier ... #
# 7. Look At The Status Codes For The Logical Standby Processes #
# 8. Look At Events The Applier Process Is Stuck On #
# ------------------------------------------------------------------------- #
# 10. Check the LCR - Look At Bytes Paged Out #
# 11. Generate Syntax To Skip Transactions #
# Based On MAX(EVENT_TIME) FROM DBA_LOGSTDBY_EVENTS #
# DO NOT SKIP DML STATEMENTS #
# 12. Diagnostic Script Per Metalink Note ID: 241512.1 #
# Look for output in logical_diag_[ORACLE_SID_MONDD_HHMM.out] format #
# ------------------------------------------------------------------------- #
# 20. Review What Is NOT Supported In Your Logical Standby Database #
# 21. Review Tables That Do NOT have Unique Identifiers #
# 22. Check Primary Database For Supplemental Logging #
# #
# ------------------------------------------------------------------------- #
# 30. Start Logical Standby Database #
# 40. Stop Logical Standby Database - PLEASE BE CAREFUL !!!!! #
# THIS WILL STOP THE LOGICAL STANDBY APPLY PROCESS #
# ------------------------------------------------------------------------- #
# #
# x. Exit #
# ------------------------------------------------------------------------- #
# Enter Task Number:
Removing the Broker

Finally if you wish remove the broker follow the steps below
Remove the broker
  1. Connect to the primary database using DGMGRL
  2. run in dgmgrl "remove configuration perserve destinations"
  3. Connect to the primary using SQLPLUS
  4. using SQL set the parameter "dg_broker_start=false"
  5. then run "alter system set dg_broker_start=false"
  6. repeat steps 4 and 5 for all the standby databases
  7. repeat step 2 for all the standby databases
  8. remove the two broker configuration files for all databases

No comments:

Post a Comment