Infrastructure at your Service

Franck Pachot

Extended clusters and asm_preferred_read_failure_groups

When you have 2 sites that are not too far you can build an extended cluster. You have one node on each site. And you can also use ASM normal redundancy to store data on each site (each diskgroup has a failure group for each site). Writes are multiplexed, so the latency between the two sites increases the write time. By default, reads can be done from one or the other site. But we can, and should, define that preference goes to local reads.

The setup is easy. In the ASM instance you list the failure groups that are on the same site, with the ‘asm_preferred_read_failure_groups’ parameter. You set that with an ALTER SYSTEM SCOPE=spfile SID=… because you will have different values for each instance. Of course, that supposes that you know the SID of the ASM instance that run on a specific site. If you are in Flex ASM, don’t ask. Wait 12.2 or read Bertrand Drouvot blog post

I’m on an extended cluster where the two sites have between 0.3 and 0.4 milliseconds of latency. I’m checking the storage with SLOB so this is the occasion to check how asm_preferred_read_failure_groups helps in I/O latency.

I use a simple SLOB configuration for physical I/O, read only, single block, and check the wait event histogram for ‘db file sequential read’.
Here is an example of output:

EVENT WAIT_TIME_MICRO WAIT_COUNT WAIT_TIME_FORMAT
------------------------------ --------------- ---------- ------------------------------
db file sequential read 1 0 1 microsecond
db file sequential read 2 0 2 microseconds
db file sequential read 4 0 4 microseconds
db file sequential read 8 0 8 microseconds
db file sequential read 16 0 16 microseconds
db file sequential read 32 0 32 microseconds
db file sequential read 64 0 64 microseconds
db file sequential read 128 0 128 microseconds
db file sequential read 256 538 256 microseconds
db file sequential read 512 5461 512 microseconds
db file sequential read 1024 2383 1 millisecond
db file sequential read 2048 123 2 milliseconds
db file sequential read 4096 148 4 milliseconds
db file sequential read 8192 682 8 milliseconds
db file sequential read 16384 3777 16 milliseconds
db file sequential read 32768 1977 32 milliseconds
db file sequential read 65536 454 65 milliseconds
db file sequential read 131072 68 131 milliseconds
db file sequential read 262144 6 262 milliseconds

It seems that half of the reads are served by the array cache and the other half are above disk latency time.

Now I set the asm_preferred_read_failure_groups to the remote site, to measure reads coming from there.

alter system set asm_preferred_read_failure_groups='DATA1_MIR.FAILGRP_SH' scope=memory;

and here is the result on similar workload:

EVENT WAIT_TIME_MICRO WAIT_COUNT WAIT_TIME_FORMAT
------------------------------ --------------- ---------- ------------------------------
db file sequential read 1 0 1 microsecond
db file sequential read 2 0 2 microseconds
db file sequential read 4 0 4 microseconds
db file sequential read 8 0 8 microseconds
db file sequential read 16 0 16 microseconds
db file sequential read 32 0 32 microseconds
db file sequential read 64 0 64 microseconds
db file sequential read 128 0 128 microseconds
db file sequential read 256 0 256 microseconds
db file sequential read 512 5425 512 microseconds
db file sequential read 1024 6165 1 millisecond
db file sequential read 2048 150 2 milliseconds
db file sequential read 4096 89 4 milliseconds
db file sequential read 8192 630 8 milliseconds
db file sequential read 16384 3598 16 milliseconds
db file sequential read 32768 1903 32 milliseconds
db file sequential read 65536 353 65 milliseconds
db file sequential read 131072 36 131 milliseconds
db file sequential read 262144 0 262 milliseconds
db file sequential read 524288 1 524 milliseconds

The pattern is similar except that I’ve nothing lower than 0.5 millisecond. I/Os served by the storage cache have there the additional latency of 0.3 milliseconds from the remote site. Of course, when we are above the millisecond, we don’t see the difference.

Now let’s set the right setting where preference should go to local reads:

alter system set asm_preferred_read_failure_groups='DATA1_MIR.FAILGRP_VE' scope=memory;

and the result:

EVENT WAIT_TIME_MICRO WAIT_COUNT WAIT_TIME_FORMAT
------------------------------ --------------- ---------- ------------------------------
db file sequential read 1 0 1 microsecond
db file sequential read 2 0 2 microseconds
db file sequential read 4 0 4 microseconds
db file sequential read 8 0 8 microseconds
db file sequential read 16 0 16 microseconds
db file sequential read 32 0 32 microseconds
db file sequential read 64 0 64 microseconds
db file sequential read 128 0 128 microseconds
db file sequential read 256 1165 256 microseconds
db file sequential read 512 9465 512 microseconds
db file sequential read 1024 519 1 millisecond
db file sequential read 2048 184 2 milliseconds
db file sequential read 4096 227 4 milliseconds
db file sequential read 8192 705 8 milliseconds
db file sequential read 16384 3350 16 milliseconds
db file sequential read 32768 1743 32 milliseconds
db file sequential read 65536 402 65 milliseconds
db file sequential read 131072 42 131 milliseconds
db file sequential read 262144 1 262 milliseconds

Here the fast reads are around 0.5 millisecond. And one thousand reads had a service time lower than 0.3 milliseconds, which was not possible when reading from the remote site.

Here is the pattern in in an Excel chart where you see no big difference for latency above 4 milliseconds.

CapturePrefFailGrp

With efficient storage array, extended cluster latency may penalize performance of writes. However, writes should be asynchronous (DBRW) so the latency is not part of the user response time. I’m not talking about redo logs here. For redo you have to choose to put it on a local only diskgroup or on a mirrored one. This depends on availability requirements and latency between the two sites.

So, when you have non uniform latency among failure groups, don’t forget to set asm_preferred_read_failure_groups. And test it with SLOB as I did here. Wat you expect from theorical latencies should be visible in the wait event histogram.

 

One Comment

Leave a Reply


5 + = nine

Franck Pachot
Franck Pachot

Technology Leader