Determining the Optimal Number of Redo Log Groups- Managing Control Files, Online Redo Logs,and Archivelogs
Oracle requires at least two redo log groups to function. But, having just two groups sometimes isn’t enough. To understand why this is so, remember that every time a log switch occurs, it initiates a checkpoint. As part of a checkpoint the database writer writes all modified (dirty) blocks from the SGA to the data files on disk. Also recall that the online redo logs are written to in a round-robin fashion and that eventually the information in a given log is overwritten.
Before the log writer can begin to overwrite information in an online redo log, all modified blocks in the SGA associated with the redo log must first be written to a data file. If not all modified blocks have been written to the data files, you see this message in the alert.log file:
Thread 1 cannot allocate new log, sequence <sequence number> Checkpoint not complete
Another way to explain this issue is that Oracle needs to store in the online redo logs any information that would be required to perform a crash recovery. To help you visualize this, see Figure 5-3.
Figure 5–3. Redo protected until the modified (dirty) buffer is written to disk
At time 1, Block A is read from Data File AA into the buffer cache and modified. At time 2, the redo-change vector information (how the block changed) is written to the log buffer. At time 3, the log-writer process writes the Block A change-vector information to online redo log 1. At time 4, a log switch occurs, and online redo log 2 becomes the current online redo log.
Now, suppose that online redo log 2 fills up quickly and another log switch occurs, at which point the log-writer attempts to write to online redo log 1. The log writer isn’t allowed to overwrite information in online redo log 1 until the database writer writes Block A to Data File AA. Until Block A is written to Data File AA, Oracle needs information in the online redo logs to recover this block in the event of a power failure or shutdown abort. Before Oracle overwrites information in the online redo logs, it ensures that blocks protected by redo have been written to disk. If these modified blocks haven’t been written to disk, Oracle temporarily suspends processing until this occurs. There are a few ways to resolve this issue:
• Add more redo log groups.
• Lower the value of FAST_START_MTTR_TARGET. Doing so causes the database-writer process to write older modified blocks to disk in a shorter time frame.
• Tune the database-writer process (modify DB_WRITER_PROCESSES).
If you notice that the “Checkpoint not complete” message is occurring often, several times a day, it is recommended that you add one or more log groups to resolve the issue. Adding an extra redo log gives the database writer more time to write modified blocks in the database buffer cache to the data files before the associated redo with a block is overwritten.
There is little downside to adding more redo log groups. The main concern is that you could bump up against the MAXLOGFILES value that was used when you created the database. If you need to add more groups and have exceeded the value of MAXLOGFILES, then you must re-create your control file and specify a higher value for this parameter.
If adding more redo log groups doesn’t resolve the issue, you should carefully consider lowering the value of FAST_START_MTTR_TARGET. When you lower this value, you can potentially see more I/O because the database-writer process is more actively writing modified blocks to data files. Ideally, it would be nice to verify the impact of modifying FAST_START_MTTR_TARGET in a test environment before making the change in production. You can modify this parameter while your instance is up; this means you can quickly modify it back to its original setting if there are unforeseen side effects.
Finally, consider increasing the value of the DB_WRITER_PROCESSES parameter. Carefully analyze the impact of modifying this parameter in a test environment before you apply it to production. This value requires that you stop and start your database; therefore, if there are adverse effects, downtime is required to change this value back to the original setting. If the waits are on the archiving process, LOG_ARCHIVE_MAX_ PROCESSES can also be increased.