One Can Succeed at Almost Anything For Which He Has Enthusiasm...

Monday, November 28, 2011

ORA-7445 Internal Error

An ORA-7445 is a generic error, and can occur from anywhere in the Oracle code. The precise location of the error is identified by the core file and/or trace file it produces. Whenever an ORA-7445 error is raised a core file is generated. There may be a trace file generated with the error as well.

Prior to 11g, the core files are located in the CORE_DUMP_DEST directory. Starting with 11g, there is a new advanced fault diagnosability infrastructure to manage trace data. Diagnostic files are written into a root directory for all diagnostic data called the ADR home. Core files at 11g will go to the ADR HOME/cdump directory

For more Indispensability check the below :

1. Check the Alert Log : The alert log may indicate additional errors or other internal errors at the time of the problem. In some cases, the ORA-7445 error will occur along with ORA-600, ORA-3113, ORA-4030 errors. The ORA-7445 error can be side effects of the other problems and we should review the first error and associated core file or trace file and work down the list of errors.

If the ORA-7445 errors are not associated with other error conditions, ensure the trace data is not truncated. If we see a message at the end of the file “MAX DUMP FILE SIZE EXCEEDED" . the MAX_DUMP_FILE_SIZE parameter is not setup high enough or to ‘unlimited’. There could be vital diagnostic information missing in the file and discovering the root issue may be very difficult. Set the MAX_DUMP_FILE_SIZE appropriately and regenerate the error for complete trace information.

2. Search 600/7445 Lookup Tool : Visit My Oracle Support to access the ORA-00600 Lookup tool (Note 7445.1). The ORA-600/ORA-7445 Lookup tool may lead you to applicable content in My Oracle Support on the problem and can be used to investigate the problem with argument data from the error message or we can pull out key stack pointers from the associated trace file to match up against known bugs.

3. “Fine tune” searches in Knowledge Base : As the ORA-7445 error indicates an unhandled exception in the Oracle source code, our search in the Oracle Knowledge Base will need to focus on the stack data from the core file or the trace file.

Keep in mind that searches on generic argument data will bring back a large result set . The more we can learn about the environment and code leading to the errors, the easier it will be to narrow the hit list to match our problem.

4 . If assistance is required from Oracle : Should it become necessary to get assistance from Oracle Support on an ORA-7445 problem, please provide at a minimum, the

Alert log
Associated tracefile(s) or incident package at 11g
Patch level information
Core file(s)
Information about changes in configuration and/or application prior to issues
If error is reproducible, a self-contained reproducible testcase: Note.232963.1 How to Build a Testcase for Oracle Data Server Support to Reproduce ORA-600 and ORA-7445 Errors.
RDA report or Oracle Configuration Manager information

Enjoy :-)

Saturday, November 26, 2011

Estimate the Optimal UNDO Tablespace size in Oracle

We can size the undo tablespace appropriately either by using automatic extension of the undo tablespace or by using the Undo Advisor for a fixed sized tablespace.If we have decided on a fixed-size undo tablespace, the Undo Advisor can help us to estimate needed capacity.The Undo Advisor relies for its analysis on data collected in the Automatic Workload Repository (AWR). It is therefore important that the AWR have adequate workload statistics available so that the Undo Advisor can make accurate recommendations.

Below is the script which will estiate the size of the undo tablespace required and also provide the detail when the database face the Number of "ORA-01555 (Snapshot too old)" encountered since the last startup of the instance

SQL> SELECT d.undo_size/(1024*1024) “ACTUAL UNDO SIZE [MByte]“,
SUBSTR(e.value,1,25) “UNDO RETENTION [Sec]“,
(TO_NUMBER(e.value) * TO_NUMBER(f.value) *
g.undo_block_per_sec) / (1024*1024)  “NEEDED UNDO SIZE [MByte]“
FROM (SELECT SUM(a.bytes) undo_size  FROM v$datafile a,  v$tablespace b,
dba_tablespaces c
WHERE c.contents = 'UNDO'
AND c.status = 'ONLINE'
AND b.name = c.tablespace_name
AND a.ts# = b.ts#  ) d, v$parameter e, v$parameter f,
( SELECT MAX(undoblks/((end_time-begin_time)*3600*24))  undo_block_per_sec
FROM v$undostat  ) g
WHERE e.name = 'undo_retention'
AND f.name = 'db_block_size'
/
set feedback off
set heading off
set lines 132
declare
cursor get_undo_stat is
select d.undo_size/(1024*1024) "C1",
substr(e.value,1,25) "C2",
(to_number(e.value) * to_number(f.value) *
g.undo_block_per_sec) / (1024*1024) "C3",
round((d.undo_size / (to_number(f.value) *
g.undo_block_per_sec))) "C4"
from (select sum(a.bytes) undo_size
from v$datafile a,
v$tablespace b,
dba_tablespaces c
where c.contents = 'UNDO'
and c.status = 'ONLINE'
and b.name = c.tablespace_name
and a.ts# = b.ts#) d,
v$parameter e,
v$parameter f,
(select max(undoblks/((end_time-begin_time)*3600*24))
undo_block_per_sec from v$undostat) g
where e.name = 'undo_retention'
and f.name = 'db_block_size';
begin
dbms_output.put_line(chr(10)||chr(10)||chr(10)||chr(10) ||
'To optimize UNDO we have two choices :');
dbms_output.put_line('==========================================' || chr(10));
for rec1 in get_undo_stat loop
dbms_output.put_line('A)
Adjust UNDO tablespace size according to UNDO_RETENTION :'
|| chr(10));
dbms_output.put_line(rpad('ACTUAL UNDO SIZE ',65,'.')|| ' : ' ||
TO_CHAR(rec1.c1,'999999') || ' MEGS');
dbms_output.put_line(rpad('OPTIMAL UNDO SIZE WITH ACTUAL UNDO_RETENTION
(' || ltrim(TO_CHAR(rec1.c2/60,'999999'))
|| ' MINUTES)
',65,'.') || ' : ' || TO_CHAR(rec1.c3,'999999') || ' MEGS');
dbms_output.put_line(chr(10));
dbms_output.put_line('B) Adjust UNDO_RETENTION according to UNDO tablespace size :'
|| chr(10));
dbms_output.put_line(rpad('ACTUAL UNDO RETENTION ',65,'.') ||
' : ' || TO_CHAR(rec1.c2/60,'999999') || ' MINUTES');
dbms_output.put_line(rpad('OPTIMAL UNDO RETENTION WITH ACTUAL UNDO SIZE
(' || ltrim(TO_CHAR(rec1.c1,'999999')) || ' MEGS) ',65,'.') || ' : ' ||
TO_CHAR(rec1.c4/60,'999999') || ' MINUTES');
end loop;
dbms_output.put_line(chr(10)||chr(10));
end;
/
select 'Number of "ORA-01555 (Snapshot too old)" encountered since
the last startup of the instance : ' || sum(ssolderrcnt)
from v$undostat;

Enjoy :-)

How to resize redolog file in oracle

Once , i receive the e-mail regarding the resize of the redo log file . The Sender want the easiest way to size the redo log file something like 'alter database logfile group 1 '?\redo01.log resize 100m ' or using some other trick .

We cannot resize the redo log files. We must drop the redolog file and recreate them .This is only method to resize the redo log files. A database requires atleast two groups of redo log files,regardless the number of the members. We cannot the drop the redo log file if its status is current or active . We have change the status to "inactive" then only we can drop it.

When a redo log member is dropped from the database, the operating system file is not deleted from disk. Rather, the control files of the associated database are updated to drop the member from the database structure. After dropping a redo log file, make sure that the drop completed successfully, and then use the appropriate operating system command to delete the dropped redo log file. In my case i have four redo log files and they are of 50MB in size .I will resize to 100 MB. Below are steps to resize the redo log files.

Step 1 : Check the Status of Redo Logfile
SQL> select group#,sequence#,bytes,archived,status from v$log;
GROUP# SEQUENCE# BYTES ARC STATUS
---------- ---------- ---------- ----- -------------
1 5 52428800 YES INACTIVE
2 6 52428800 YES ACTIVE
3 7 52428800 NO CURRENT
4 4 52428800 YES INACTIVE

Here,we cannot drop the current and active redo log file .

Step 2 : Forcing a Checkpoint :

The SQL statement alter system checkpoint explicitly forces Oracle to perform a checkpoint for either the current instance or all instances. Forcing a checkpoint ensures that all changes to the database buffers are written to the datafiles on disk .A global checkpoint is not finished until all instances that require recovery have been recovered.

SQL> alter system checkpoint global ;
system altered.

SQL> select group#,sequence#,bytes,archived,status from v$log;

GROUP# SEQUENCE# BYTES ARC STATUS
---------- ---------- ---------- ----- ----------------
1 5 52428800 YES INACTIVE
2 6 52428800 YES INACTIVE
3 7 52428800 NO CURRENT
4 4 52428800 YES INACTIVE
Since the status of group 1,2,4 are inactive .so we will drop the group 1 and group 2 redo log file.

Step 3 : Drop Redo Log File :
SQL> alter database drop logfile group 1;
Database altered.

SQL> alter database drop logfile group 2;
Database altered.

SQL> select group#,sequence#,bytes,archived,status from v$log;
GROUP# SEQUENCE# BYTES ARC STATUS
---------- ---------- ---------- --- ----------------
3 7 52428800 NO CURRENT
4 4 52428800 YES INACTIVE

Step 4 : Create new redo log file

If we don't delete the old redo logfile by OS command when creating the log file with same name then face the below error . Therefore to solve it delete the file by using OS command .

SQL> alter database add logfile group 1 'C:\app\neerajs\oradata\orcl\redo01.log' size 100m;
alter database add logfile group 1 'C:\app\neerajs\oradata\orcl\redo01.log' size 100m
*
ERROR at line 1:
ORA-00301: error in adding log file 'C:\app\neerajs\oradata\orcl\redo01.log' - file cannot be created
ORA-27038: created file already exists
OSD-04010: <create> option specified, file already exists

SQL> alter database add logfile group 1 'C:\app\neerajs\oradata\orcl\redo01.log' size 100m;
Database altered.

SQL> alter database add logfile group 2 'C:\app\neerajs\oradata\orcl\redo02.log' size 100m;
Database altered.

SQL> select group#,sequence#,bytes,archived,status from v$log;
GROUP# SEQUENCE# BYTES ARC STATUS
---------- ---------- ---------- --- ----------------
1 0 104857600 YES UNUSED
2 0 104857600 YES UNUSED
3 7 52428800 NO CURRENT
4 4 52428800 YES INACTIVE

Step 5 : Now drop the remaining two old redo log file
SQL> alter system switch logfile ;
System altered.

SQL> alter system switch logfile ;
System altered.

SQL> select group#,sequence#,bytes,archived,status from v$log;
GROUP# SEQUENCE# BYTES ARC STATUS
---------- ---------- ---------- --- ----------------
1 8 104857600 YES ACTIVE
2 9 104857600 NO CURRENT
3 7 52428800 YES ACTIVE
4 4 52428800 YES INACTIVE

SQL> alter system checkpoint global;
System altered.

SQL> select group#,sequence#,bytes,archived,status from v$log;
GROUP# SEQUENCE# BYTES ARC STATUS
---------- ---------- ---------- --- ----------------
1 8 104857600 YES INACTIVE
2 9 104857600 NO CURRENT
3 7 52428800 YES INACTIVE
4 4 52428800 YES INACTIVE

SQL> alter database drop logfile group 3;
Database altered.

SQL> alter database drop logfile group 4;
Database altered.

SQL> select group#,sequence#,bytes,archived,status from v$log;
GROUP# SEQUENCE# BYTES ARC STATUS
---------- ---------- ---------- --- ----------------
1 8 104857600 YES INACTIVE
2 9 104857600 NO CURRENT

Step 6 : Create the redo log file
SQL> alter database add logfile group 3 'C:\app\neerajs\oradata\orcl\redo03.log' size 100m;
Database altered.

SQL> alter database add logfile group 4 'C:\app\neerajs\oradata\orcl\redo04.log' size 100m;
Database altered.

SQL> select group#,sequence#,bytes,archived,status from v$log;
GROUP# SEQUENCE# BYTES ARC STATUS
---------- ---------- ---------- --- ----------------
1 8 104857600 YES INACTIVE
2 9 104857600 NO CURRENT
3 0 104857600 YES UNUSED
4 0 104857600 YES UNUSED

Enjoy :-)

Friday, November 25, 2011

How Often Redo Log file should switch ?

Redo log file switch has good impact on the performance of the database. Frequent log switches may lead to the slowness of the database .If the log file switches after long times then there may be chances of lossing data when the redo log file get corrupt . Oracle documents suggests to resize the redolog files so that log switches happen more like every 15-30 min (roughly depending on the architecture and recovery requirements).

But what happen when there in bulk load ?? since we cannot resize the redolog file size every time because it's seems to be silly. Generally we donot load the data in bulk on regular basis . it's very often twice or thrice in a week . So what should be the accurate size ??

Here is a very good explanation of this question by "howardjr".

One of my database have very large logs which are not intended to fill up under normal operation. They are actually big enough to cope with a peak load we get every week. previously, we had two or three log switches recorded under the one alert log timestamp! Now, they switch every 10 minutes or so, even under the heaviest load.

So big logs are good for slowing things own under load. But I don't want to sit there with 5 hours of redo sitting in my current log during non-peak-load normal running. Therefore, I set archive_lag_target to 1800 (seconds = 30 minutes), and I know that in the worst possible case, I will only lose 30 minutes of redo.

I see LOADS of advantages for using archive_lag_target even for standalone instances. Actually especially for standalone instances. I want logs big enough not to cause rapid log switching. But I have bulk loads. Therefore, I have to have enormous logs to prevent rapid log switching during those times. In fact, on one database I am connected to right now, I have 2GB redo logs which nevertheless manage to switch every 8 minutes on a Friday night. We can imagine the frequency of log switches we had when those logs were originally created at 5MB each! And the number of redo allocation retries.

I'd like 8GB logs to get it down to a log switch every 30 minutes or so on a Friday night, but with multiple members and groups, that's just getting silly.But now I have an enormous log that will take forever and a day to fill up and switch when I'm NOT doing bulk loads. Ordinarily, without a forced log switch, my 2GB log takes 3 days to fill up.

How FAST_START_MTTR_TARGET affect the redolog file in case of recovery?

If I were to have a catastrophic hardware failure, I could lose my current redo log. Fast_start_mttr_target can't do anything to ameliorate that loss: flushing the dirty buffers to disk regularly doesn't protect my data, actually. In fact, there is no way to recover transactions that are sitting in the current redo log if that log is lost. Therefore, having an enormous log full of hours and hours (in my case, about 72 hours'-worth) of redo is a massive data loss risk, and not one I'm prepared to take.forcing log switches is a good thing for everyone to be able to do, when appropriate, even if they're not using Data Guard and standby databases.

That huge log files are necessary. That a forced log switch is essential thereafter to data security. We can certainly try to minimise the risk: that's what redo log multiplexing is all about. But if we lose all copies of your current log, then we have lost the only copy of that redo, and that means we have lost data.

Frequent checkpoints can help minimise the amount of redo that is vulnerable to loss, but they do nothing to minimise the risk of that loss occurring. Redundant disks (mirroring), redundant controllers, multiplexing: those are the only things that can help protect the current redo log and thus actually reduce the risk of failure occurring in the first place. Frequent checkpointing simply reduces the damage that the loss of all currrent logs would inevitably cause, but it doesn't (and cannot) reduce it to zero. It's therefore not a protection mechanism at all, in the slightest.

Checkpoints set a limit on potential data loss from redo log damage, absolutely they do. But no matter how frequently we checkpoint, we cannot reduce potential data loss to zero and reducing the potential cost of a disaster should it strike doesn't count as reducing the risk of the disaster happening. Buying car insurance doesn't reduce our risk of having a car accident: it simply means we can pay the bills when the accident eventually happens. Therefore, checkpoints cannot reasonably be called a "current redo logfile protection mechanism" . Mirroring, multiplexing and redundant hardware are the only ways to actually protect the current redo log Safety and performance always have to be traded off against each other, and we cannot realistically propose going for just one or the other without appreciating the impact on the other.

Enjoy :-)

"log file sync" while performing COMMIT and ROLLBACK

When the performance of the database go slow then we go for the ADDM and looks at the symptoms and immediately starts changing the system to fix those symptoms. In my case , i found that the "commit and rollback" has major impact on the slowness of the database . The finding points says :

Waits on event “log file sync” while performing COMMIT and ROLLBACK operations were consuming significant database time.

The Recommended action suggest is

Investigate application logic for possible reduction in the number of COMMIT operations by increasing the size of transactions.

Since,the application is committing too ,so we have to reduce the frequent commit . The reason for the log sync waits occur when sessions wait for redo data to be written to disk. Typically this is caused by slow writes or committing too frequently in the application. Checking the "user commits" section in the AWR report can reveal if the issue is related to frequent committing.

The following tips may help to reduce log file sync when writes are slow :

Tune LGWR to get good throughput to disk . eg: Do not put redo logs on RAID 5 .
If there are lots of short duration transactions, see if it is possible to BATCH transactions together so there are fewer distinct COMMIT operations. Each commit must confirmed that the relevant REDO is on disk before it can complete. Although commits can be "piggybacked" by Oracle, reducing the overall number of commits by batching transactions can have a very beneficial effect.
Check the size of the log buffer size
Check the estimated size of the redo log file . Also check how fast the redo logfile is switching
Check if any of the processing can use the COMMIT NO-WAIT option

In 11g the commit_write parameter is deprecated . It is retained for backward compatibility only. It is replaced by the COMMIT_LOGGING and COMMIT_WAIT parameters. It is retained for backward compatibility only. It is replaced by the COMMIT_LOGGING and COMMIT_WAIT parameters.For more check the meta-link [857576.1]

As in my case i have perform the following steps and find that the performance of the database has increased and the impact of "COMMIT and ROLLBACK" is too low. Here are steps

1.) Estimate the size of the Redolog file : The value for optimal_logfile_size is expressed in MB. This value changes frequently based on the DML load on your database.Use the below query to calculate the estimated size of the redo logfile .

SQL> select TARGET_MTTR "trgt_mttr", ESTIMATED_MTTR "est_mttr", WRITES_MTTR "wrt_mttr", WRITES_LOGFILE_SIZE "wrt_lg_size", OPTIMAL_LOGFILE_SIZE "opt_lg_size" from v$instance_recovery ;

trgt_mttr est_mttr wrt_mttr wrt_lg_size opt_lg_size
---------- ---------- -------- ------------ -----------
0 227 0 52 300

After this i find that redolog file switch after every 3-4 minutes. Hence, I have increased the size of the redo logfile to 300 MB and the check the log switch. I found that the redo log file switches after approax every 20min .

Since ,I have set the sga_target ,therefore cannot change the log buffer size. In my case my redolog file is approax 5.5MB . Some expert also suggest to increase the size of log buffer to 10MB or more to reduce the impact of the "commit and rollback ". Though , I have not tested this and i will come later on this .

This is what the workaround , i have perform to reduce the "commit and rollback" impact .

Enjoy :-)