STATSPACK
|
Create PERFSTAT Tablespace The STATSPACK utility requires an isolated tablespace to obtain all of the objects and data. For uniformity, it is suggested that the tablespace be called PERFSTAT, the same name as the schema owner for the STATSPACK tables. It is important to closely watch the STATSPACK data to ensure that the stats$sql_summary table is not taking an inordinate amount of space. SQL> CREATE TABLESPACE perfstat Run the Create Scripts Now that the tablespace exists, we can begin the installation process of the STATSPACK software. Note that you must have performed the following before attempting to install STATSPACK.
$ cd $ORACLE_HOME/rdbms/admin Choose the PERFSTAT user’s password Enter value for perfstat_password: perfstat Choose the Default tablespace for the PERFSTAT user Choose the PERFSTAT users’s default tablespace. This is the tablespace TABLESPACE_NAME CONTENTS STATSPACK DEFAULT TABLESPACE Pressing <return> will result in STATSPACK’s recommended default Enter value for default_tablespace: PERFSTAT Choose the Temporary tablespace for the PERFSTAT user Choose the PERFSTAT user’s Temporary tablespace. TABLESPACE_NAME CONTENTS DB DEFAULT TEMP TABLESPACE Pressing <return> will result in the database’s default Temporary Enter value for temporary_tablespace: TEMP ….. Package created. No errors. Package body created. No errors. NOTE: Check the Logfiles: spcpkg.lis, spctab.lis, spcusr.lis Adjusting the STATSPACK Collection Level STATSPACK has two types of collection options, level and threshold. The level parameter controls the type of data collected from Oracle, while the threshold parameter acts as a filter for the collection of SQL statements into the stats$sql_summary table. SQL> SELECT * FROM stats$level_description ORDER BY snap_level;
You can change the default level of a snapshot with the statspack.snap function. The i_modify_parameter => ‘true’ changes the level permanent for all snapshots in the future. SQL> exec statspack.snap(i_snap_level => 6, i_modify_parameter => ‘true’); Create, View and Delete Snapshots sqlplus perfstat/perfstat SQL> @?/rdbms/admin/sppurge; Create the Report sqlplus perfstat/perfstat Statspack at a Glance What if you have this long STATSPACK report and you want to figure out if everything is running smoothly? Here, we will review what we look for in the report, section by section. We will use an actual STATSPACK report from our own Oracle 10g system. Statspack Report Header STATSPACK report for DB Name DB Id Instance Inst Num Release RAC Host Snap Id Snap Time Sessions Curs/Sess Comment Cache Sizes (end) Note that this section may appear slightly different depending on your version of Oracle. For example, the Curs/Sess column, which shows the number of open cursors per session, is new with Oracle9i (an 8i Statspack report would not show this data). Here, the item we are most interested in is the elapsed time. We want that to be large enough to be meaningful, but small enough to be relevant (15 to 30 minutes is OK). If we use longer times, we begin to lose the needle in the haystack. Statspack Load Profile Load Profile % Blocks changed per Read: 151.59 Recursive Call %: 99.56 Here, we are interested in a variety of things, but if we are looking at a “health check”, three items are important:
This gives an overall view of the load on the server. In this case, we are looking at a very good hard parse number and a fairly light system load (1 – 4 transactions per second is low). Statspack Instance Efficiency Percentage Next, we move onto the Instance Efficiency Percentages section, which includes perhaps the only ratios we look at in any detail: Instance Efficiency Percentages (Target 100%) Shared Pool Statistics Begin End The three in bold are the most important: Library Hit, Soft Parse % and Execute to Parse. All of these have to do with how well the shared pool is being utilized. Time after time, we find this to be the area of greatest payback, where we can achieve some real gains in performance. Here, in this report, we are quite pleased with the Library Hit and the Soft Parse % values. If the library Hit ratio was low, it could be indicative of a shared pool that is too small, or just as likely, that the system did not make correct use of bind variables in the application. It would be an indicator to look at issues such as those. OLTP System The Soft Parse % value is one of the most important (if not the only important) ratio in the database. For a typical OLTP system, it should be as near to 100% as possible. You quite simply do not hard parse after the database has been up for a while in your typical transactional / general-purpose database. The way you achieve that is with bind variables. In a regular system like this, we are doing many executions per second, and hard parsing is something to be avoided. Data Warehouse In a data warehouse, we would like to generally see the Soft Parse ratio lower. We don’t necessarily want to use bind variables in a data warehouse. This is because they typically use materialized views, histograms, and other things that are easily thwarted by bind variables. In a data warehouse, we may have many seconds between executions, so hard parsing is not evil; in fact, it is good in those environments. The moral of this is … … to look at these ratios and look at how the system operates. Then, using that knowledge, determine if the ratio is okay given the conditions. If we just said that the execute-to-parse ratio for your system should be 95% or better, that would be unachievable in many web-based systems. If you have a routine that will be executed many times to generate a page, you should definitely parse once per page and execute it over and over, closing the cursor if necessary before your connection is returned to the connection pool. Statspack Top 5 Timed Events Moving on, we get to the Top 5 Timed Events section (in Oracle9i Release 2 and later) or Top 5 Wait Events (in Oracle9i Release 1 and earlier). Top 5 Timed Events -> s – second This section is among the most important and relevant sections in the Statspack report. Here is where you find out what events (typically wait events) are consuming the most time. In Oracle9i Release 2, this section is renamed and includes a new event: CPU time.
SQL ordered by Gets Here you will find the most CPU-Time consuming SQL statements SQL ordered by Gets DB/Inst: AKI1/AKI1 Snaps: 5-6 CPU Elapsd Old Tablespace IO Stats Tablespace Rollback Segment Stats ->A high value for “Pct Waits” suggests more rollback segments may be required Trans Table Pct Undo Bytes Rollback Segment Storage ->Optimal Size should be larger than Avg Active RBS No Segment Size Avg Active Optimal Size Maximum Size Generate Execution Plan for given SQL statement If you have identified one or more problematic SQL statement, you may want to check the execution plan. Remember the “Old Hash Value” from the report above (1279400914), then execute the scrip to generate the execution plan. sqlplus perfstat/perfstat SQL Text Known Optimizer Plan(s) for this Old Hash Value First First Plan Plans in shared pool between Begin and End Snap Ids ——————————————————————————– Resolving Your Wait Events The following are 10 of the most common causes for wait events, along with explanations and potential solutions: 1. DB File Scattered Read This generally indicates waits related to full table scans. As full table scans are pulled into memory, they rarely fall into contiguous buffers but instead are scattered throughout the buffer cache. A large number here indicates that your table may have missing or suppressed indexes. Although it may be more efficient in your situation to perform a full table scan than an index scan, check to ensure that full table scans are necessary when you see these waits. Try to cache small tables to avoid reading them in over and over again, since a full table scan is put at the cold end of the LRU (Least Recently Used) list. 2. DB File Sequential Read This event generally indicates a single block read (an index read, for example). A large number of waits here could indicate poor joining orders of tables, or unselective indexing. It is normal for this number to be large for a high-transaction, well-tuned system, but it can indicate problems in some circumstances. You should correlate this wait statistic with other known issues within the Statspack report, such as inefficient SQL. Check to ensure that index scans are necessary, and check join orders for multiple table joins. The DB_CACHE_SIZE will also be a determining factor in how often these waits show up. Problematic hash-area joins should show up in the PGA memory, but they’re also memory hogs that could cause high wait numbers for sequential reads. They can also show up as direct path read/write waits. 3. Free Buffer This indicates your system is waiting for a buffer in memory, because none is currently available. Waits in this category may indicate that you need to increase the DB_BUFFER_CACHE, if all your SQL is tuned. Free buffer waits could also indicate that unselective SQL is causing data to flood the buffer cache with index blocks, leaving none for this particular statement that is waiting for the system to process. This normally indicates that there is a substantial amount of DML (insert/update/delete) being done and that the Database Writer (DBWR) is not writing quickly enough; the buffer cache could be full of multiple versions of the same buffer, causing great inefficiency. To address this, you may want to consider accelerating incremental checkpointing, using more DBWR processes, or increasing the number of physical disks. 4. Buffer Busy This is a wait for a buffer that is being used in an unshareable way or is being read into the buffer cache. Buffer busy waits should not be greater than 1 percent. Check the Buffer Wait Statistics section (or V$WAITSTAT) to find out if the wait is on a segment header. If this is the case, increase the freelist groups or increase the pctused to pctfree gap. If the wait is on an undo header, you can address this by adding rollback segments; if it’s on an undo block, you need to reduce the data density on the table driving this consistent read or increase the DB_CACHE_SIZE. If the wait is on a data block, you can move data to another block to avoid this hot block, increase the freelists on the table, or use Locally Managed Tablespaces (LMTs). If it’s on an index block, you should rebuild the index, partition the index, or use a reverse key index. To prevent buffer busy waits related to data blocks, you can also use a smaller block size: fewer records fall within a single block in this case, so it’s not as “hot.” When a DML (insert/update/ delete) occurs, Oracle Database writes information into the block, including all users who are “interested” in the state of the block (Interested Transaction List, ITL). To decrease waits in this area, you can increase the initrans, which will create the space in the block to allow multiple ITL slots. You can also increase the pctfree on the table where this block exists (this writes the ITL information up to the number specified by maxtrans, when there are not enough slots built with the initrans that is specified). 5. Latch Free Latches are low-level queuing mechanisms (they’re accurately referred to as mutual exclusion mechanisms) used to protect shared memory structures in the system global area (SGA). Latches are like locks on memory that are very quickly obtained and released. Latches are used to prevent concurrent access to a shared memory structure. If the latch is not available, a latch free miss is recorded. Most latch problems are related to the failure to use bind variables (library cache latch), redo generation issues (redo allocation latch), buffer cache contention issues (cache buffers LRU chain), and hot blocks in the buffer cache (cache buffers chain). There are also latch waits related to bugs; check MetaLink for bug reports if you suspect this is the case. When latch miss ratios are greater than 0.5 percent, you should investigate the issue. 6. Enqueue An enqueue is a lock that protects a shared resource. Locks protect shared resources, such as data in a record, to prevent two people from updating the same data at the same time. An enqueue includes a queuing mechanism, which is FIFO (first in, first out). Note that Oracle’s latching mechanism is not FIFO. Enqueue waits usually point to the ST enqueue, the HW enqueue, the TX4 enqueue, and the TM enqueue. The ST enqueue is used for space management and allocation for dictionary-managed tablespaces. Use LMTs, or try to preallocate extents or at least make the next extent larger for problematic dictionary-managed tablespaces. HW enqueues are used with the high-water mark of a segment; manually allocating the extents can circumvent this wait. TX4s are the most common enqueue waits. TX4 enqueue waits are usually the result of one of three issues. The first issue is duplicates in a unique index; you need to commit/rollback to free the enqueue. The second is multiple updates to the same bitmap index fragment. Since a single bitmap fragment may contain multiple rowids, you need to issue a commit or rollback to free the enqueue when multiple users are trying to update the same fragment. The third and most likely issue is when multiple users are updating the same block. If there are no free ITL slots, a block-level lock could occur. You can easily avoid this scenario by increasing the initrans and/or maxtrans to allow multiple ITL slots and/or by increasing the pctfree on the table. Finally, TM enqueues occur during DML to prevent DDL to the affected object. If you have foreign keys, be sure to index them to avoid this general locking issue. 7. Log Buffer Space This wait occurs because you are writing the log buffer faster than LGWR can write it to the redo logs, or because log switches are too slow. To address this problem, increase the size of the log files, or increase the size of the log buffer, or get faster disks to write to. You might even consider using solid-state disks, for their high speed. 8. Log File Switch All commit requests are waiting for “logfile switch (archiving needed)” or “logfile switch (Checkpoint. Incomplete).” Ensure that the archive disk is not full or slow. DBWR may be too slow because of I/O. You may need to add more or larger redo logs, and you may potentially need to add database writers if the DBWR is the problem. 9. Log File Sync When a user commits or rolls back data, the LGWR flushes the session’s redo from the log buffer to the redo logs. The log file sync process must wait for this to successfully complete. To reduce wait events here, try to commit more records (try to commit a batch of 50 instead of one at a time, for example). Put redo logs on a faster disk, or alternate redo logs on different physical disks, to reduce the archiving effect on LGWR. Don’t use RAID 5, since it is very slow for applications that write a lot; potentially consider using file system direct I/O or raw devices, which are very fast at writing information. 10. Idle Event. There are several idle wait events listed after the output; you can ignore them. Idle events are generally listed at the bottom of each section and include such things as SQL*Net message to/from client and other background-related timings. Idle events are listed in the stats$idle_event table. Remove STATSPACK from the Database After a STATSPACK session you want to remove the STATSPACK tables. sqlplus “/ as sysdba” |