Introduction:
Recently another environment came under surveillance ( or was it showed up on the radar) due to performance issues within a specific time interval ( app. 20:00 – 22:00). A quick scan showed that on that box various Rac Instances where present and they all have Rman level 0 and level 1 backups . First urgent request by customer was to reschedule the various backups so they would not run together anymore. After that first adoption the performance already changed for the better. Yet it was time again to wear the thinking hat and investigate the environment closer. Fortunately we have awr running on that box so it was a nice challenge to go find out. In this blog you will find the feedback of those analyses.
Summary
Most important in this matter was to have AWR reports running on a frequent (hour) bases. Check with your licences if you are allowed to use the awr reports. From those Awr Reports we saw Peek Disk IO during level 1 backups. Well That as not supposed to be the case .. So a dive into the details of the database showed that there was no block_change_tracking enabled. After doing so the level 1 backup was so much more balanced in its io – needs . So when you have large Databases ( mine is app 1.2 TB) make sure that you or your fellow Dbas have enabled this block_change_tracking if you do level 1 backups with Rman.
Details.
Environment we are working with is a Real application Cluster on RedHat Linux with multiple Instances on the Nodes of the cluster. Backups are performed with Rman to Tape with Level 0 and Level 1 on a daily basis . Customer complained about performance going down the tube in a specific time interval ( 20.00 ) – ( 22.00). And as always the backups where first prime suspects causing the performance drain. Frankly at first I was frowning with the idea.. ‘Let’s blame the backups again for the poor performance of the database’.. Investigations however showed a couple of shocks to me.
First it came to my notice that the database (11.2.0.3) was setup as a Real application cluster and it was 1.2 TB in size. Frankly I could not understand why this Database was pushed to tape with Rman backup instead of using disk backups or using Snapshot technology to mount the database on a backup server and prevent every kind of overhead on the disks. But as they say once you are in the kitchen for the short-term you need to cook with the tools you have present.
Block change tracking:
I came across this blog and that was once again an eye Opener: http://arup.blogspot.nl/2008/09/magic-of-block-change-tracking.html
From the official oracle Documentation:
http://docs.oracle.com/cd/E11882_01/backup.112/e10642/rcmbckba.htm#BRADV8125
So once it is enabled Oracle uses a file to keep track of the blocks that are being changed in the database with the goal to ease the burden of incremental backups.
This is how you can find out if that is already in place:
select filename, status, bytes from v$block_change_tracking;
If block change tracking is in place you will see something like this ( in my case the file is stored in an ASM diskgroup since I am working with a Rac database) :
FILENAME ------------------------------------------------------------------------------------------------------------------------------------------------ STATUS BYTES ---------- ---------- +MYDB_DATA01/mydb/changetracking/ctf.407.833981253 ENABLED 158401024
The smallest size for this file is 10MB and that it will expand with 10 Mb intervals. From the Oracle documentation: The size of the block change tracking file is proportional to the size of the database and the number of enabled threads of redo. The size of the block change tracking file can increase and decrease as the database changes. The size is not related to the frequency of updates to the database.
Typically, the space required for block change tracking for a single instance is about 1/30,000 the size of the data blocks to be tracked. For an Oracle RAC environment, it is 1/30,000 of the size of the database, times the number of enabled threads.
So enabling the block_change_tracking was a first improvement I have implemented for that database.
Awr Reports.
As I mentioned I am a lucky Dba to be allowed to use the Awr reports to do analyses to prove if the Rman backups indeed affected the performance of the database in a dramatic way:
A level 1Rman backup should be much faster – causing less overhead on the database since only the changes since last backup need to be put to tape right ? WRONG! ( well at least it is wrong if you did not enable the block_change_tracking yet). Ah but you want proof .. See is believe right :
AWR without the block_change tracking enabled the hour before Rman backup kicks in:
Top 5 Timed Foreground Events
Event |
Waits |
Time(s) |
Avg wait (ms) |
% DB time |
Wait Class |
db file sequential read |
92,008 |
723 |
8 |
59.13 |
User I/O |
DB CPU |
214 |
17.49 |
|||
log file sync |
192,402 |
194 |
1 |
15.85 |
Commit |
control file sequential read |
19,000 |
18 |
1 |
1.45 |
System I/O |
enq: TX – index contention |
1,604 |
14 |
9 |
1.17 |
Concurrency |
IOStat by Function summary
Function Name |
Reads: Data |
Reqs per sec |
Data per sec |
Writes: Data |
Reqs per sec |
Data per sec |
Waits: Count |
Avg Tm(ms) |
Others | 2G |
10.03 |
.554437 | 1.2G |
3.35 |
.346592 | 36.7K |
1.00 |
LGWR | 1M |
0.02 |
.000277 | 2.1G |
109.02 |
.600501 | 195.4K |
0.23 |
DBWR | 0M |
0.00 |
0M | 1.2G |
24.90 |
.342707 | 0 | |
Buffer Cache Reads | 697M |
24.55 |
.193414 | 0M |
0.00 |
0M | 88.4K |
7.59 |
Direct Reads | 344M |
2.16 |
.095458 | 1M |
0.02 |
.000277 | 0 | |
Direct Writes | 0M |
0.00 |
0M | 42M |
1.49 |
.011654 | 0 | |
RMAN | 0M |
0.01 |
0M | 0M |
0.00 |
0M | 42 |
1.40 |
TOTAL: | 3G |
36.77 |
.843588 | 4.6G |
138.78 |
1.30173 | 320.5K |
2.35 |
AWR report DURING the Rman backup without the block change tracking enabled:
Top 5 Timed Foreground Events with Level 1 Backup without Block_change_tracking
Event |
Waits |
Time(s) |
Avg wait (ms) |
% DB time |
Wait Class |
log file sync |
119,690 |
30,133 |
252 |
74.69 |
Commit |
db file sequential read |
47,120 |
6,113 |
130 |
15.15 |
User I/O |
enq: TX – index contention |
5,454 |
1,107 |
203 |
2.74 |
Concurrency |
direct path write |
7,679 |
586 |
76 |
1.45 |
User I/O |
control file sequential read |
4,505 |
436 |
97 |
1.08 |
System I/O |
As you can see the Avg wait time has increased dramatically. Let’s take a look at the IO and this was a real shock to me since even with a level 1 Backup all the database is being read (RMAN reading 1.2T):
IOStat by Function summary
Function Name |
Reads: Data |
Reqs per sec |
Data per sec |
Writes: Data |
Reqs per sec |
Data per sec |
Waits: Count |
Avg Tm(ms) |
RMAN | 1.2T |
350.69 |
349.559 | 42M |
0.23 |
.011712 | 4527 |
16.02 |
Others | 1.1G |
5.50 |
.305634 | 819M |
3.66 |
.228389 | 20.2K |
138.86 |
LGWR | 1M |
0.01 |
.000278 | 1.3G |
33.35 |
.380091 | 57.1K |
57.57 |
DBWR | 0M |
0.00 |
0M | 834M |
17.08 |
.232572 | 0 | |
Buffer Cache Reads | 371M |
13.04 |
.103458 | 0M |
0.00 |
0M | 46.6K |
124.67 |
Direct Reads | 341M |
1.98 |
.095092 | 0M |
0.02 |
0M | 0 | |
Direct Writes | 0M |
0.00 |
0M | 12M |
0.43 |
.003346 | 0 | |
Streams AQ | 0M |
0.00 |
0M | 0M |
0.00 |
0M | 14 |
56.64 |
TOTAL: | 1.2T |
371.22 |
350.064 | 3G |
54.76 |
.856112 | 128.5K |
93.26 |
Awr Report same time interval with block_change_tracking active and Rman Level 1 backup
Top 5 Timed Foreground Events
Event |
Waits |
Time(s) |
Avg wait (ms) |
% DB time |
Wait Class |
db file sequential read |
89,148 |
675 |
8 |
55.22 |
User I/O |
DB CPU |
222 |
18.15 |
|||
log file sync |
188,028 |
193 |
1 |
15.79 |
Commit |
SQL*Net more data from client |
125,490 |
81 |
1 |
6.65 |
Network |
enq: TX – index contention |
1,779 |
18 |
10 |
1.50 |
Concurrency |
Average wait times are back to normal.
IOStat by Function summary
Function Name |
Reads: Data |
Reqs per sec |
Data per sec |
Writes: Data |
Reqs per sec |
Data per sec |
Waits: Count |
Avg Tm(ms) |
Others | 2.2G |
10.92 |
.629154 | 1.7G |
5.39 |
.480530 | 41.3K |
0.85 |
LGWR | 1M |
0.02 |
.000277 | 2.3G |
106.90 |
.649395 | 191.2K |
0.24 |
DBWR | 0M |
0.00 |
0M | 1.2G |
23.95 |
.353535 | 0 | |
Buffer Cache Reads | 679M |
24.03 |
.188274 | 0M |
0.00 |
0M | 86.7K |
7.28 |
Direct Writes | 0M |
0.00 |
0M | 87M |
3.02 |
.024123 | 0 | |
RMAN | 60M |
0.94 |
.016636 | 15M |
0.20 |
.004159 | 4008 |
0.49 |
Direct Reads | 28M |
0.96 |
.007763 | 2M |
0.04 |
.000554 | 0 | |
Streams AQ | 0M |
0.00 |
0M | 0M |
0.00 |
0M | 17 |
5.53 |
TOTAL: | 3G |
36.88 |
.842107 | 5.3G |
139.50 |
1.51229 | 323.2K |
2.21 |
Happy reading,
Mathijs