Rman Level backup and Block change Tracking and the Positive effects in IO load.

Introduction:

Recently another environment came under surveillance ( or was it showed up on the radar) due to performance issues within a specific time interval ( app. 20:00 – 22:00). A quick scan showed that on that box various Rac Instances where present and they all have Rman level 0 and level 1 backups . First urgent request by customer was to reschedule the various backups so they would not run together anymore. After that first adoption the performance already changed for the better.  Yet it was time again to wear the thinking hat and investigate the environment closer. Fortunately we have awr running on that box so it was a nice challenge to go find out.  In this blog you will find the feedback of those analyses.

Summary

Most important in this matter was to have AWR reports running on a frequent (hour) bases. Check with your licences if you are allowed to use the awr reports. From those Awr Reports we saw Peek  Disk IO during level 1 backups. Well That as not supposed to be the case .. So a dive into the details of the database showed that there was no block_change_tracking enabled. After doing so the level 1 backup was so much more balanced in its io – needs . So when you have large Databases ( mine is app 1.2 TB) make sure that you or your fellow Dbas have enabled this block_change_tracking if  you do level 1 backups with Rman.

Details.

Environment we are working with is a Real application Cluster on RedHat Linux with multiple Instances on the Nodes of the cluster. Backups are performed with Rman to Tape  with Level 0 and Level 1  on a daily basis . Customer complained about performance going down the tube in a specific time interval ( 20.00 )  – ( 22.00). And as always the backups where first prime suspects causing the performance drain. Frankly at first I was frowning with the idea.. ‘Let’s blame the backups again for the poor performance of the database’.. Investigations however showed a couple of shocks to me.

First it came to my notice that the database (11.2.0.3) was setup as a Real application cluster and it was 1.2 TB in size. Frankly I could not understand why this Database was pushed to tape with Rman backup instead of using disk backups or using Snapshot technology to mount the database on a backup server and prevent every kind of overhead on the disks.  But as they say once you are in the kitchen for the short-term you need to cook with the tools you  have present.

Block change tracking:

I came across this blog and  that was once again an eye Opener: http://arup.blogspot.nl/2008/09/magic-of-block-change-tracking.html

From the official oracle Documentation:

http://docs.oracle.com/cd/E11882_01/backup.112/e10642/rcmbckba.htm#BRADV8125

So once it is enabled Oracle uses a file to keep track of the blocks that are being  changed  in the database with the goal to ease the burden of incremental backups.

This is how you can find out if that is already in place:

select filename, status, bytes from v$block_change_tracking;

If  block change tracking is in place you will see something like this ( in my case the file is stored in an ASM diskgroup since I am working with a Rac database) :

FILENAME
------------------------------------------------------------------------------------------------------------------------------------------------
STATUS                BYTES
---------- ----------
+MYDB_DATA01/mydb/changetracking/ctf.407.833981253
ENABLED     158401024

The smallest size for this file is 10MB and that it will expand with 10 Mb intervals. From the Oracle documentation:  The size of the block change tracking file is proportional to the size of the database and the number of enabled threads of redo. The size of the block change tracking file can increase and decrease as the database changes. The size is not related to the frequency of updates to the database.

Typically, the space required for block change tracking for a single instance is about 1/30,000 the size of the data blocks to be tracked. For an Oracle RAC environment, it is 1/30,000 of the size of the database, times the number of enabled threads.

So enabling the block_change_tracking was a first improvement I have implemented for that database.

Awr Reports.

As I mentioned I am a lucky Dba to be allowed to use the Awr reports to do analyses to prove if the Rman backups indeed affected the performance of the database in a dramatic way:

A level 1Rman backup should be much faster – causing less overhead on the database since only the changes since last backup need to be put to tape right ?  WRONG!  ( well at least it is wrong if you did not enable the block_change_tracking yet). Ah but you want proof ..  See is believe right :

AWR without the block_change tracking enabled  the hour before Rman backup kicks in:

Top 5 Timed Foreground Events

Event

Waits

Time(s)

Avg wait (ms)

% DB time

Wait Class

db file sequential read

92,008

723

8

59.13

User I/O
DB CPU  

214

 

17.49

 
log file sync

192,402

194

1

15.85

Commit
control file sequential read

19,000

18

1

1.45

System I/O
enq: TX – index contention

1,604

14

9

1.17

Concurrency

IOStat by Function summary

Function Name

Reads: Data

Reqs per sec

Data per sec

Writes: Data

Reqs per sec

Data per sec

Waits: Count

Avg Tm(ms)

Others 2G

10.03

.554437 1.2G

3.35

.346592 36.7K

1.00

LGWR 1M

0.02

.000277 2.1G

109.02

.600501 195.4K

0.23

DBWR 0M

0.00

0M 1.2G

24.90

.342707 0  
Buffer Cache Reads 697M

24.55

.193414 0M

0.00

0M 88.4K

7.59

Direct Reads 344M

2.16

.095458 1M

0.02

.000277 0  
Direct Writes 0M

0.00

0M 42M

1.49

.011654 0  
RMAN 0M

0.01

0M 0M

0.00

0M 42

1.40

TOTAL: 3G

36.77

.843588 4.6G

138.78

1.30173 320.5K

2.35

 

 AWR report DURING the Rman backup without the block change tracking enabled:

Top 5 Timed Foreground Events with Level 1 Backup without Block_change_tracking

Event

Waits

Time(s)

Avg wait (ms)

% DB time

Wait Class

log file sync

119,690

30,133

252

74.69

Commit
db file sequential read

47,120

6,113

130

15.15

User I/O
enq: TX – index contention

5,454

1,107

203

2.74

Concurrency
direct path write

7,679

586

76

1.45

User I/O
control file sequential read

4,505

436

97

1.08

System I/O

As you can see the Avg wait time has increased dramatically. Let’s take a look at the IO and this was a real shock to me since even with a level 1 Backup all the database is being read (RMAN reading 1.2T):

IOStat by Function summary

Function Name

Reads: Data

Reqs per sec

Data per sec

Writes: Data

Reqs per sec

Data per sec

Waits: Count

Avg Tm(ms)

RMAN 1.2T

350.69

349.559 42M

0.23

.011712 4527

16.02

Others 1.1G

5.50

.305634 819M

3.66

.228389 20.2K

138.86

LGWR 1M

0.01

.000278 1.3G

33.35

.380091 57.1K

57.57

DBWR 0M

0.00

0M 834M

17.08

.232572 0  
Buffer Cache Reads 371M

13.04

.103458 0M

0.00

0M 46.6K

124.67

Direct Reads 341M

1.98

.095092 0M

0.02

0M 0  
Direct Writes 0M

0.00

0M 12M

0.43

.003346 0  
Streams AQ 0M

0.00

0M 0M

0.00

0M 14

56.64

TOTAL: 1.2T

371.22

350.064 3G

54.76

.856112 128.5K

93.26

 

Awr Report same time interval with block_change_tracking active and Rman Level 1 backup

Top 5 Timed Foreground Events

Event

Waits

Time(s)

Avg wait (ms)

% DB time

Wait Class

db file sequential read

89,148

675

8

55.22

User I/O
DB CPU  

222

 

18.15

 
log file sync

188,028

193

1

15.79

Commit
SQL*Net more data from client

125,490

81

1

6.65

Network
enq: TX – index contention

1,779

18

10

1.50

Concurrency

 

Average wait times are back to normal.

 IOStat by Function summary

Function Name

Reads: Data

Reqs per sec

Data per sec

Writes: Data

Reqs per sec

Data per sec

Waits: Count

Avg Tm(ms)

Others 2.2G

10.92

.629154 1.7G

5.39

.480530 41.3K

0.85

LGWR 1M

0.02

.000277 2.3G

106.90

.649395 191.2K

0.24

DBWR 0M

0.00

0M 1.2G

23.95

.353535 0  
Buffer Cache Reads 679M

24.03

.188274 0M

0.00

0M 86.7K

7.28

Direct Writes 0M

0.00

0M 87M

3.02

.024123 0  
RMAN 60M

0.94

.016636 15M

0.20

.004159 4008

0.49

Direct Reads 28M

0.96

.007763 2M

0.04

.000554 0  
Streams AQ 0M

0.00

0M 0M

0.00

0M 17

5.53

TOTAL: 3G

36.88

.842107 5.3G

139.50

1.51229 323.2K

2.21


Happy reading,

Mathijs

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s