One Can Succeed at Almost Anything For Which He Has Enthusiasm...: Data Guard Architecture Oracle 11g Part-II

Monday, October 17, 2011

Data Guard Architecture Oracle 11g Part-II

LNS (log-write network-server) and ARCH (archiver) processes running on the primary database select archived redo logs and send them to the standby database, where the RFS (remote file server) background process within the Oracle instance performs the task of receiving archived redo-logs originating from the primary database .

The LNS process support two modes as
1.) Synchronous and
2.) Asynchronous.

1.) Synchronous Mode : Synchronous transport (SYNC) is also referred to as "zero data loss" method because the LGWR is not allowed to acknowledge a commit has succeeded until the LNS can confirm that the redo needed to recover the transaction has been written at the standby site. In the below diagram, the phases of a transaction are :

The user commits a transaction creating a redo record in the SGA, the LGWR reads the redo record from the log buffer and writes it to the online redo log file and waits for confirmation from the LNS. The LNS reads the same redo record from the buffer and transmits it to the standby database using Oracle Net Services, the RFS receives the redo at the standby database and writes it to the SRL. When the RFS receives a write complete from the disk, it transmits an acknowledgment back to the LNS process on the primary database which in turns notifies the LGWR that the transmission is complete, the LGWR then sends a commit acknowledgment to the user.

This setup really does depend on network performance and can have a dramatic impact on the primary databases, low latency on the network will have a big impact on response times. The impact can be seen in the wait event "LNS wait on SENDREQ" found in the v$system_event dynamic performance view.

2.) Asynchronous Mode : Asynchronous transport (ASYNC) is different from SYNC in that it eliminates the requirement that the LGWR waits for a acknowledgment from the LNS, creating a "near zero" performance on the primary database regardless of distance between the primary and the standby locations. The LGWR will continue to acknowledge commit success even if the bandwidth prevents the redo of previous transaction from being sent to the standby database immediately. If the LNS is unable to keep pace and the log buffer is recycled before the redo is sent to the standby, the LNS automatically transitions to reading and sending from the log file instead of the log buffer in the SGA. Once the LNS has caught up it then switches back to reading directly from the buffer in the SGA .

The log buffer ratio is tracked via the view X$LOGBUF_READHIST a low hit ratio indicates that the LNS is reading from the log file instead of the log buffer, if this happens try increasing the log buffer size.

The drawback with ASYNC is the increased potential for data loss, if a failure destroys the primary database before the transport lag is reduced to zero, any committed transactions that are part of the transport lag are lost. So again make sure that the network bandwidth is adequate and that get the lowest latency possible.

A log file gap occurs whenever a primary database continues to commit transactions while the LNS process has ceased transmitting redo to the standby database (network issues). The primary database continues writing to the current log file, fills it, and then switches to a new log file, then archiving kicks in and archives the file, before we know it there are a number of archive and log files that need to be processed by the the LNS basically creating a large log file gap.

Data Guard uses an ARCH process on the primary database to continuously ping the standby database during the outage, when the standby database eventually comes back, the ARCH process queries the standby control file (via the RFS process) to determine the last complete log file that the standby received from the primary. The ARCH process will then transmit the missing files to the standby database using additional ARCH processes, at the very next log switch the LNS will attempt and succeed in making a connection to the standby database and will begin transmitting the current redo while the ACH processes resolve the gap in the background. Once the standby apply process is able to catch up to the current redo logs the apply process automatically transitions out of reading the archive redo logs and into reading the current SRL. The whole process can be seen in the diagram below :

Click Here for Data Guard Architecture Oracle 11g Part-III

Enjoy :-)

Monday, October 17, 2011

Data Guard Architecture Oracle 11g Part-II

No comments: