One Can Succeed at Almost Anything For Which He Has Enthusiasm...

Monday, October 17, 2011

Data Guard Architecture Oracle 11g Part-II

LNS (log-write network-server) and ARCH (archiver) processes running on the primary database select archived redo logs and send them to the standby database, where the RFS (remote file server) background process within the Oracle instance performs the task of receiving archived redo-logs originating from the primary database .

The LNS process support two modes as
1.) Synchronous and
2.) Asynchronous.

1.) Synchronous Mode : Synchronous transport (SYNC) is also referred to as "zero data loss" method because the LGWR is not allowed to acknowledge a commit has succeeded until the LNS can confirm that the redo needed to recover the transaction has been written at the standby site. In the below diagram, the phases of a transaction are :

The user commits a transaction creating a redo record in the SGA, the LGWR reads the redo record from the log buffer and writes it to the online redo log file and waits for confirmation from the LNS. The LNS reads the same redo record from the buffer and transmits it to the standby database using Oracle Net Services, the RFS receives the redo at the standby database and writes it to the SRL. When the RFS receives a write complete from the disk, it transmits an acknowledgment back to the LNS process on the primary database which in turns notifies the LGWR that the transmission is complete, the LGWR then sends a commit acknowledgment to the user.

This setup really does depend on network performance and can have a dramatic impact on the primary databases, low latency on the network will have a big impact on response times. The impact can be seen in the wait event "LNS wait on SENDREQ" found in the v$system_event dynamic performance view.

2.) Asynchronous Mode : Asynchronous transport (ASYNC) is different from SYNC in that it eliminates the requirement that the LGWR waits for a acknowledgment from the LNS, creating a "near zero" performance on the primary database regardless of distance between the primary and the standby locations. The LGWR will continue to acknowledge commit success even if the bandwidth prevents the redo of previous transaction from being sent to the standby database immediately. If the LNS is unable to keep pace and the log buffer is recycled before the redo is sent to the standby, the LNS automatically transitions to reading and sending from the log file instead of the log buffer in the SGA. Once the LNS has caught up it then switches back to reading directly from the buffer in the SGA .

The log buffer ratio is tracked via the view X$LOGBUF_READHIST a low hit ratio indicates that the LNS is reading from the log file instead of the log buffer, if this happens try increasing the log buffer size.

The drawback with ASYNC is the increased potential for data loss, if a failure destroys the primary database before the transport lag is reduced to zero, any committed transactions that are part of the transport lag are lost. So again make sure that the network bandwidth is adequate and that get the lowest latency possible.

A log file gap occurs whenever a primary database continues to commit transactions while the LNS process has ceased transmitting redo to the standby database (network issues). The primary database continues writing to the current log file, fills it, and then switches to a new log file, then archiving kicks in and archives the file, before we know it there are a number of archive and log files that need to be processed by the the LNS basically creating a large log file gap.

Data Guard uses an ARCH process on the primary database to continuously ping the standby database during the outage, when the standby database eventually comes back, the ARCH process queries the standby control file (via the RFS process) to determine the last complete log file that the standby received from the primary. The ARCH process will then transmit the missing files to the standby database using additional ARCH processes, at the very next log switch the LNS will attempt and succeed in making a connection to the standby database and will begin transmitting the current redo while the ACH processes resolve the gap in the background. Once the standby apply process is able to catch up to the current redo logs the apply process automatically transitions out of reading the archive redo logs and into reading the current SRL. The whole process can be seen in the diagram below :

Click Here for Data Guard Architecture Oracle 11g Part-III

Enjoy :-)

Data Guard Architecture Oracle 11g Part-I

I have decided to post the Architecture of the Standby Database, although there are lots of stuff on the Internet but most of them are lengthy and are not so juicy . I have read a good notes on Standby Database Architecture and further decided to post it . Though, I have modified few topics to make it more clear , juicy and interesting .Hope you all find helpful and enjoy this after reading.

Oracle Data Guard is the most effective and comprehensive data availability, data protection and disaster recovery solution for enterprise databases. It provides a method for customers to actively utilize their disaster recovery configuration for read-only queries and reports while it is in standby role. Additionally, a standby database can be used to offload backups from production databases or for Quality Assurance and other test activities that require read-write access to an exact replica of production. These capabilities are unique to Oracle .

Oracle Data Guard is the management, monitoring, and automation software infrastructure that creates,maintains, and monitors one or more standby databases to protect enterprise data from failures, disasters, errors, and corruptions.Data Guard is basically a ship redo and then apply redo, as we know redo is the information needed to recover a database transaction. A production database referred to as a primary database transmits redo to one or more independent replicas referred to as standby databases. Standby databases are in a continuous state of recovery, validating and applying redo to maintain synchronization with the primary database. A standby database will also automatically re-synchronize if it becomes temporary disconnected to the primary due to power outages, network problems, etc.

The diagram below shows the overview of Data Guard, firstly the redo transport services transmits redo data from the primary to the standby as it is generated, secondly services apply the redo data and update the standby database files, thirdly independently of Data Guard the database writer process updates the primary database files and lastly Data Guard will automatically re-synchronize the standby database following power or network outages using redo data that has been archived at the primary.

Redo records contain all the information needed to reconstruct changes made to a database. During recovery the database will read the change vectors in the redo records and apply the changes to the relevant blocks.Redo records are buffered in a circular fashion in the redo log buffer of the SGA, the log writer process (LGWR) is the background process that handles redo log buffer management. The LGWR at specific times writes redo log entries into a sequential file (online redo log file) to free space in the buffer, the LGWR writes the following.

1.) A commit record : When ever a transaction is committed the LGWR writes the transaction redo records from the buffer to the log file and assigns a system change number (SCN), only when this process is complete is the transaction said to be committed.

2.) Redo log buffers : If the redo log becomes a third full or if 3 seconds have passed sine the last time the LGWR wrote to the log file, all redo entries in the buffer will be written to the log file. This means that redo records can be written to the log file before the transaction has been committed and if necessary media recovery will rollback these changes using undo that is also part of the redo entry.

Remember that the LGWR can write to the log file using "group" commits, basically entire list of redo entries of waiting transactions (not yet committed) can be written to disk in one operation, thus reducing I/O. Even through the data buffer cache has not been written to disk, Oracle guarantees that no transaction will be lost due to the redo log having successfully saved any changes.

Data Guard Redo Transport Services coordinate the transmission of redo from the primary database to the standby database, at the same time the LGWR is processing redo, a separate Data Guard process called the Log Network Server (LNS) is reading from the redo buffer in the SGA and passes redo to Oracle Net Services from transmission to a standby database, it is possible to direct the redo data to nine standby databases, we can also use Oracle RAC and they don't all need to be a RAC setup. The process Remote File Server (RFS) receives the redo from LNS and writes it to a sequential file called a standby redo log file (SRL).

Click Here for Data Guard Architecture Oracle 11g Part-II

Enjoy :-)

Friday, October 14, 2011

ORA-01000: Maximum Open Cursors Exceeded

Once our client report that they are facing error “ORA-01000: maximum open cursors exceeded” while running a application . As it seems from error that the error is related to cursors limits i.e, open cursors are exceeding from it's defaults values. To solving this issue ,let's have a look on the open_cursor i.e, what is open_cursor and how it impact into database.

Open cursors take up space in the shared pool, in the library cache. To keep a renegade session from filling up the library cache, or clogging the CPU with millions of parse requests, we set the parameter OPEN_CURSORS.

OPEN_CURSORS sets the maximum number of cursors each session can have open, per session. For example, if OPEN_CURSORS is set to 1000, then each session can have up to 1000 cursors open at one time. If a single session has OPEN_CURSORS # of cursors open, it will get an ora-1000 error when it tries to open one more cursor.

To solve this issue we can either increase the no. of open_cursors or kill the inactive session which has open the large number of cursors. Now we connect to the database and check the open_cursors limits :

C:\>sqlplus sys/xxxx@noida as sysdba
SQL*Plus: Release 11.2.0.1.0 Production on Fri Oct 14 18:05:30 2011
Copyright (c) 1982, 2010, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> sho parameter open_cursors
NAME TYPE VALUE
----------------- --------- -----------
open_cursors integer 300

Since the no. of open_cursors is 300. So we list the top 10 sessions which are currently opening most cursors

SQL> select * from ( select ss.value, sn.name, ss.sid
from v$sesstat ss, v$statname sn
where ss.statistic# = sn.statistic#
and sn.name like '%opened cursors current%'
order by value desc) where rownum < 11 ;

VALUE NAME SID
----- ----------------- ----------
300 opened cursors current 131
300 opened cursors current 125
300 opened cursors current 143
300 opened cursors current 149
300 opened cursors current 17
300 opened cursors current 132
300 opened cursors current 23
300 opened cursors current 1
300 opened cursors current 9
300 opened cursors current 10
10 rows selected.

Now we check what make session 131 open to many cursors?

SQL> select sid, status, event, seconds_in_wait state "wait(s)" , blocking_session "blk_sesn", prev_sql_id "SQL_ID" from v$session where sid=131;

SID STATUS EVENT WAIT(s) STATE BLK_SESN SQL_ID
--- ---------- ------------------ -------- --------- --------- ---------------
131 INACTIVE rdbms ipc message 8745 WAITING 6mqvntr9ytnga

Since the status of the cursor is INACTIVE so we can we kill the session by using the below command :

SQL> alter system kill session 'sid,serial#' immediate;

The other alternatives is to increase the no. of the open_cursors parameter as :

SQL> alter system set open_cursors=1500 scope=spfile;

In my case i have increased the values of the open_cursors and issue got solved.

Enjoy :-)

Wednesday, October 12, 2011

Open Standby in Read-write Mode When Primary is Lost

There may be scenario where Primary database is lost and we are only left with the standby database . In this scenario's we have to open the standby database in read-write mode. Below are the steps to convert standby database to Primary database.

1.) Open standby database in mount state :
SQL> select name,open_mode from v$database;
NAME OPEN_MODE
------ -------------
NOIDA READ ONLY

SQL> shut immediate
Database closed.
Database dismounted.
ORACLE instance shut down.

SQL> startup mount
ORACLE instance started.

Total System Global Area 263639040 bytes
Fixed Size 1373964 bytes
Variable Size 230689012 bytes
Database Buffers 25165824 bytes
Redo Buffers 6410240 bytes
Database mounted.

SQL> select open_mode ,protection_mode , database_role from v$database ;
OPEN_MODE PROTECTION_MODE DATABASE_ROLE
--------- ---------------------- ----------------
MOUNTED MAXIMUM PERFORMANCE PHYSICAL STANDBY

2.) Recover if there is any archive logs:

SQL>recover standby database;
ORA-01153: an incompatible media recovery is active

To solve this issue, we cancel the media recovery by using the below command .

SQL> alter database recover managed standby database cancel;
Database altered.

SQL> recover standby database
ORA-00279: change 2698969 generated at 10/05/2011 16:46:58 needed for thread
ORA-00289: suggestion : D:\ARCHIVE\ARC0000000133_0761068614.0001
ORA-00280: change 2698969 for thread 1 is in sequence #133

Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
cancel
Media recovery cancelled.

3.) Finish the Recovery process :

The below command will perform the role transition as quickly as possible with little or no data loss and without rendering other standby databases unusable and to open the database in read-write mode we fire the below command :

SQL>alter database recover managed standby database finish;
Database altered.

4.) Activate the Standby Database :

SQL> alter database activate physical standby database ;
Database altered.

5.) Check the new status

SQL> select open_mode ,protection_mode , database_role from v$database ;

OPEN_MODE PROTECTION_MODE DATABASE_ROLE
--------- ---------------------- ---------------------
MOUNTED MAXIMUM PERFORMANCE PHYSICAL STANDBY

6.) Open the Database
SQL> alter database open ;
Database altered.

SQL> select open_mode ,protection_mode , database_role from v$database ;

OPEN_MODE PROTECTION_MODE DATABASE_ROLE
--------- ---------------------- --------------------
READ WRITE MAXIMUM PERFORMANCE PHYSICAL STANDBY

Enjoy :-)

Monday, October 10, 2011

Journeyman On OTN

Hello All

It is a good day for me. Today i become the Journeyman on OTN site . It's great pleasure while being the journeyman . Working with Oracle is exciting, challenging and great fun.

I am Passionate about Oracle.Though i am working with oracle for few years and find that oracle is a great Database . whenever I ask a question to myself how much i know about oracle , the answer is "NOTHING" . Now my aim is know "something" about oracle . There are lots to be learn and the journey is too long.

The only advise I will like to give others and try to follow myself is that always work or choose the job that you are enthusiastic or passionate about .If you are motivated or enthusiastic ,you will be able to learn more and enjoy while working . I beleive that knowledge is not only about learning but about sharing to others.

Enjoy :-)