Friday, March 4, 2011

Database Monitoring and Checklist

What is monitoring ?
The Monitering of predefined events that generates a message or warning when a certain  threshold has been exceeded.This is done in an effort to ensure that an issue doesn't become a problem.The database monitering is required for the following reason : 
  • Supporting production !!!
  • Keeping an eye on development, i.e. disabled PKs | FKs.
  • Database performance
  • In Support of an SLA (service level agreement)
Here are steps which are required to moniter. 

1.)  Daily Procedures

A.)  Verify all instances are up :  Make sure the database is available. Log into each instance and run daily reports or test scripts. Use the below queries to check where database is up or not.
SQL> select name,open_mode from v$database ;

B.)  Look for any new alert log entries : The alert log file is a best DBA's friend and could be a true lifesaver. Go to the background dump destination or diag (in oracle 11g) and check alert log file . If any ORA- errors have appeared since the previous time we looked note them investigate it and take the steps to resolve the errors .

C.) Verify DBSNMP is running : Log on to each managed machine to check for the 'dbsnmp' process.
For Unix: at the command line, type ps –ef | grep dbsnmp. There should be two dbsnmp processes running. If not, restart DBSNMP. (Some sites have this disabled on purpose; if this is the case, remove this item from our list, or change it to "verify that DBSNMP is NOT running".)

D.) Verify success of database backup : Check the physical location of the database backup and ensure that the database backup is successful .

E.) Verify enough resources for acceptable performance : Check the space status of the tablespace i.e, free spaces and database the below monitoring link to check the tablespaces spaces .

F.) Check the instance status of the database : Check the memory component ie, buffer cache ratio, library hits , shared pool and physical reads and logical reads . To check all run the below monitoring scripts .
Monitoring scripts 

G.) Processes to review contention for CPU, memory, network or disk resources :  To check CPU utilization, go to x:\web\phase2\default.htm =>system metrics=>CPU utilization page. 400 is the maximum CPU utilization because there are 4 CPUs on phxdev and phxprd machine. We need to investigate if CPU utilization keeps above 350 for a while.

II. Nightly Procedures

Most production databases (and many development and test databases) will benefit from having certain nightly batch processes run.

A.)  Collect volumetric data : This example collects table row counts. This can easily be extended to other objects such as indexes, and other data such as average row sizes . Analyze schemas and collect data as in case of oracle 11g the optimizer collect the analyse the whole database and collect the fresh statistics .In case of Oracle 9i , collect fresh statistics by running the below command :
SQL> exec dbms_stats.gather_schemas_stats ;
SQL> exec dbms_stats.gather_schemas_stats('SCOTT') ;

III. Weekly Procedures

A.) Check the invalids objects :  Check the invalid objects and remove them from the database .The the utlrp.sql script to remove the invalid objects . This scripts is present in  "$ORACLE_HOME\rdbms\admin" folder .

B.) Check the Growth of the tablespace in the database : Check the growth of the each tablespace and in the database. Run the scripts to check  the  growth of tablespace . Tablespace Growth Scripts

IV. Monthly Procedures

A.)  Look for Harmful Growth Rates : Review changes in segment growth when compared to previous reports to identify segments with a harmful growth rate.

B.)  Review Tuning Opportunities : Review common Oracle tuning points such as cache hit ratio, latch contention, and other points dealing with memory management. Compare with past reports to identify harmful trends or determine impact of recent tuning adjustments.

C.) Look for I/O Contention : Review database file activity. Compare to past output to identify trends that could lead to possible contention.

D.) Review Fragmentation : Investigate fragmentation (e.g. row chaining, etc.).

E.) Perform Tuning and Maintenance : Make the adjustments necessary to avoid contention for system resources. This may include scheduled down time or request for additional resources.

Enjoy    J J J

No comments: