I had the opportunity to participate to POUG day 1 and wanted through this blog to share some of my feedback on the interesting sessions I could follow.

First of all, I would like to mention the great introduction done by the staff team and the great organization. POUG staff team could adapt to the coronavirus situation and organized excellent POUG virtual days. Well done!

I had the chance today to follow a few sessions :

  • Developing Clusterware agents by Michal Wesolowski for which I will provide a few feedback and interesting stuff to know, later in this blog.
  • The Heart of Oracle – how the core RDBMS engine works by Martin Widlake. Martin presented how the heart of oracle works from the instance to the database files going through the redo logs, archive logs. He addressed how blocks are handled, how SQL statements are parsed and optimized. By the way, did you know that during a transaction a single block goes to the SGA (buffer cache) and multiple blocks goes to the PGA, so not shared between other sessions? Same for full table scan. I did not, I was always thinking that all blocks went from the data files to the buffer cache. Also it is good to know that oracle use hashing algorithm to find a block in the SGA.
  • The foundations of understanding Execution Plans by Jonathan Lewis. Great subject! Using concrete examples, Jonathan covered a complex subject : how does an execution plans work.
  • Wait Events and DB time by Franck Pachot. My collegue Franck gave a very interesting session explaining wait events and DB time for which I will provide some of the information provided later in this blog.
  • Analysis of a Couple AWR Reports by Toon Koppelaars. This presentation was a good follow up of Franck’s session. Toon explained how to interpret AWR Report.

Developing Clusterware agents

Michal did a great interactive presentation and demo, having the demos refreshing diagrams displayed in the presentation using webservices development. If you have the opportunity to follow one day one of Michal’s presentation, I would really encourage you to do so. You will enjoy and have fun!

We got a great introduction and explanation on how Clusters are working. From the free solution (clusterware) to payable solution (veritas cluster).

Comparison on some solutions can be found on the next picture :







Grid infrastructure is composed of :

  • Clusterware / Oracle restart
  • ASM

The cluster architecture looks like :







Oracle clusterware : database and clustering

We can use oracle clusterware to make high available application with built-in agents.

The command crsctl status type will provide the configuration information of one or more particular resource types. All prefixes ora are Oracle objects.

To create HA applications, cluster_resource or local_resource should be used.

Oracle tools to deal with clusterware :

  • srvctl : dedicated to ora. resources, to be used form oracle_home and not GI home
  • crsctl : to be used for custom resources, for monitoring all resources, managing OHAS resources, managing CRS itself (crsctl stop/start crs)

Oracle Grid Infrastructures standalone Agents :
HA agents for oracle applications like GoldenGate, peoplesoft, weblogic, mysql, …. Written in Perl, easy to install and manage.

Standard Edition HA :

  • Available from GI 19.7 and DB SE2 19.7
  • No HA/GI for SE2 from 19.3 to 19.6
  • Need ASM or ACFS

As seen in the next picture, we can understand that clusterware is more complex that we could imagine :













Dependencies between ressources (node 1 -> RES A -> RESB):
To display dependency tree we will use crsctl stat res -dependency [-stop | -pullup].
To force to stop resource and all dependencies : crsctl stop res resA -f.
To start resB and automatically resA first : crsctl start res resB.
To relocate the whole chain on new node : crsctl relocate res resA -n node3 -f.
With hard dependency : both resources should be started on same node.
Pullup dependency is very important and needs to be used when having hard dependency : If resource B depends on resource A and resource A fails and then recovers, then resource B is restarted.

Resource states / return codes are displayed on next picture :







Wait Events and DB time

Wait events have been implemented since 1992 to see where the time is spent when DB is busy, to verify resource utilization, to know load profile, to see which wait can scale or not. Otherwise, without wait events tuning might be done with blinded eyes.

System calls are wait events.

Wait events can be seen in sql_trace, v$ views, statspatck, ASH, AWR, or any other tool like tkprof…

Between the fetch() and resultset, database needs to do some work : CPU work, read blocks,… Between CPU work the server process is just waiting (idle, I/O,…). Idea is then to instrument this time and do profiling.

Idle is a system call as well, waiting for network. Between fetch() and resultset it is user response time. Tuning will try to reduce this response time.

DB time (session is active) = user response time = cpu and/or wait events

Wait events can be SQL net message to client or from client, PL/SQL lock timer.

cpu time = parse queries, read blocks or sort rows.

  • Tuning idea, investigate execution plans.
  • Reduce parse queries : use bind variables for similar queries.
  • Reduce read blocks : Use indexes for better selectivity for the predicates or use hash join to join many rows.
  • Reduce sort row : Do not selet all columns (select *) if not needed.

wait events :
Only count wait events for DB time on foreground sessions. Other processes wait event can be troubleshoot further if needed.

I/O from user session process to shared buffer cache are named db file and the one to PGA are named direct path :

  • db file read
    • read one block to buffer cache : db file sequential read
    • read multiple blocks to buffer cache : db file scattered read, db file parallel read
  • db file sequential read
    • single block read. waits is the throughtput of single block reads (divided by elapsed time for IOPS). wait average time is the latency to read 8k.
    • physical reads si the number of blocks, physical IO request is the number of IO calls

For average time : look at the storage, get faster disk nvme, …
If count is too high : better execution plan, larger buffer cache or PGA, …

ASH viewer can be downloaded if no diagnostic license.

application wait class : locks

log file sync = commit wait

  • average time is too high : reduce queue length, get faster disks
  • count is too high : avoid row by row commit, use no logging operation, look at nowait

system I/O comes in major cases from background processes, backups running, contention on control file write (due to multiplexing), too many log switches

Tools : SQL trace, tkprof, v$ views

Following picture is a nice conclusion to summary wait events. I like it… 😉







Conclusion

This POUG event was a really great event and I would encourage anybody to participate to the next one. Sessions were really interesting with high technical level. Been busy tomorrow I will unfortunately not be able to participate to day 2. Thanks POUG staff team to organize this event! Well done!