By Franck Pachot

.
At Oracle Open World 2014, or rather the Oaktable World, Chris Antognini has presented ‘Indexes: Structure, Splits and Free Space Management Internals’. It’s not something new, but it’s still something that is not always well understood: how index space is managed, block splits, fragmentation, coalesce and rebuilds. Kyle Hailey has made a video of it available here.
For me, it is the occasion to share the script I use to see if an index is fragmented or not.

First, forget about those ‘analyze index validate structure’ which locks the table, and the DEL_LEAF_ROWS that counts only the deletion flags that are transient. The problem is not the amount of free space. The problem is where is that free space. Because if you will insert again in the same range of values, then that space will be reused. Wasted space occurs only when lot of rows were deleted in a range where you will not insert again. For exemple, when you purge old ORDERS, then the index on the ORDER_DATE – or on the ORDER_ID coming from a sequence – will be affected. Note that the problem occurs only for sparse purges because full blocks are reclaimed when needed and can get rows from an other range of value.

I have a script that shows the number of rows per block, as well as used and free space per block, and aggregates that by range of values.

First, let’s create a table with a date and an index on it:

drop table DEMOTABLE;
create table DEMOTABLE as select sysdate-900+rownum/1000 order_date,decode(mod(rownum,100),0,'N','Y') delivered , dbms_random.string('U',16) cust_id  from (select * from dual connect by level <= 1e4 );
create index DEMOINDEX on DEMOTABLE(ORDER_DATE) pctfree 90;

My script shows 10 buckets with begin and end value and for each of them the averge number of rows per block and the free space:

SQL> @index_fragmentation 
 
ORDER_DAT -> ORDER_DAT rows/block bytes/block %free space     blocks free                   
--------- -- --------- ---------- ----------- ----------- ---------- -----                     
24-APR-12 -> 02-AUG-12        377        7163          11        266                        
03-AUG-12 -> 11-NOV-12        377        7163          11        266                        
11-NOV-12 -> 19-FEB-13        377        7163          11        266                        
19-FEB-13 -> 30-MAY-13        377        7163          11        265                        
30-MAY-13 -> 07-SEP-13        377        7163          11        265                        
07-SEP-13 -> 16-DEC-13        377        7163          11        265                        
16-DEC-13 -> 26-MAR-14        377        7163          11        265                        
26-MAR-14 -> 03-JUL-14        377        7163          11        265                        
04-JUL-14 -> 11-OCT-14        377        7163          11        265                        
12-OCT-14 -> 19-JAN-15        376        7150          11        265                        

Note that the script reads all the table (it can do a sample but here it is 100%). Not exactly the table but only the index. It counts the index leaf blocks with the undocumented function sys_op_lbid() which is used by oracle to estimate the clustering factor.

So here I have no fragmentation. All blocks have about 377 rows and no free space. This is because I inserted them in increasing order and the so colled ’90-10′ block split occured.

Let’s see what I get if I delete most of the rows before the 01-JAN-2014:

SQL> delete from DEMOTABLE where order_dateSQL> @index_fragmentation 
 
ORDER_DAT -> ORDER_DAT rows/block bytes/block %free space     blocks free                   
--------- -- --------- ---------- ----------- ----------- ---------- -----                     
25-APR-12 -> 02-AUG-12          4          72          99        266 oooo                   
03-AUG-12 -> 11-NOV-12          4          72          99        266 oooo                   
11-NOV-12 -> 19-FEB-13          4          72          99        266 oooo                   
19-FEB-13 -> 30-MAY-13          4          72          99        265 oooo                   
30-MAY-13 -> 07-SEP-13          4          72          99        265 oooo                   
07-SEP-13 -> 16-DEC-13          4          72          99        265 oooo                   
16-DEC-13 -> 26-MAR-14          4          72          99        265 oooo                   
26-MAR-14 -> 03-JUL-14          4          72          99        265 oooo                   
04-JUL-14 -> 11-OCT-14         46         870          89        265 oooo                   
12-OCT-14 -> 19-JAN-15        376        7150          11        265                        

I have the same buckets, and same number of blocks. But blocks which are in the range below 01-JAN-2014 have only 4 rows and a lot of free space. This is exactly what I want to detect: I can check if that free space will be reused.

Here I know I will not enter any orders with a date in the past, so those blocks will never have an insert into them. I can reclaim that free space with a COALESCE:

SQL> alter index DEMOINDEX coalesce;
Index altered.
SQL> @index_fragmentation
ORDER_DAT to ORDER_DAT rows/block bytes/block %free space blocks free
 --------- -- --------- ---------- ----------- ----------- ---------- -----
 25-APR-12 -> 03-OCT-14 358 6809 15 32
 03-OCT-14 -> 15-OCT-14 377 7163 11 32
 15-OCT-14 -> 27-OCT-14 377 7163 11 32
 27-OCT-14 -> 08-NOV-14 377 7163 11 32
 08-NOV-14 -> 20-NOV-14 377 7163 11 32
 20-NOV-14 -> 02-DEC-14 377 7163 11 32
 02-DEC-14 -> 14-DEC-14 377 7163 11 32
 14-DEC-14 -> 26-DEC-14 377 7163 11 32
 27-DEC-14 -> 07-JAN-15 377 7163 11 32
 08-JAN-15 -> 19-JAN-15 371 7056 12 32

I still have 10 buckets because this is defined in my script, but each bucket noew has less rows. I’ve defragmented the blocks and reclaimed the free blocks.

Time to share the script now. Here it is: index_fragmentation.zip


set termout off verify off

define owner='DEMO'         -- table owner
define table='DEMOTABLE'    -- table name
define index='DEMOINDEX'    -- index name
define buckets=10           -- number of buckets
define sample=100           -- 100% scans all the index

column "free" format A5

variable c refcursor;

declare
 o all_indexes.owner%TYPE:='&owner';
 t all_indexes.table_name%TYPE:='&table';
 i all_indexes.table_name%TYPE:='&index';
 oid all_objects.object_id%TYPE;
 hsz varchar2(2000);
 n number:=&buckets;
 p number:=&sample;
 s varchar2(2000):='';
 k_min varchar2(2000);
 k_max varchar2(2000);
 k_lst varchar2(2000);
 k_nul varchar2(2000);
 k_vsz varchar2(2000);
 p_sam varchar2(2000):='';
 cursor cols is select i.column_name,i.column_position,case when data_type in ('VARCHAR2','RAW') then 3 else 1 end length_bytes
  from dba_ind_columns i join dba_tab_columns t 
  on (t.owner=i.table_owner and t.table_name=i.table_name and t.column_name=i.column_name)
  where i.table_owner=o and i.table_name=t and i.index_name=i order by column_position;
 procedure add(l in varchar2,i number default 0) is begin s:=s||chr(10)||rpad(' ',i)||l; end;
begin
 select object_id into oid from dba_objects where object_type='INDEX' and owner=o and object_name=i;
 /* Note:10640.1: block header size = fixed header (113 bytes) + variable transaction header (23*initrans) */
 select nvl(to_char(block_size - 113 - ini_trans*23),'null') header_size into hsz 
  from dba_indexes left outer join dba_tablespaces using (tablespace_name) where owner=o and index_name=i;
 for c in cols loop
  if ( c.column_position > 1 ) then k_lst:=k_lst||',' ; k_min:=k_min||',';k_max:=k_max||','; k_nul:=k_nul||' and ' ; k_vsz:=k_vsz||'+' ; end if;
  k_lst:=k_lst||c.column_name;
  k_nul:=k_nul||c.column_name|| ' is not null';
  k_min:=k_min||'min('||c.column_name||') '||c.column_name;
  k_max:=k_max||'max('||c.column_name||') '||c.column_name;
  k_vsz:=k_vsz||'nvl(vsize('||c.column_name||'),1)+'||c.length_bytes;
 end loop;
 if p != 100 then p_sam:='sample block('||p||')'; end if;
 add('with leaf_blocks as (',0);
 add('select /* cursor_sharing_exact dynamic_sampling(0) no_monitoring',1);
 add(' no_expand index_ffs('||t||','||i||') noparallel_index('||t||','||i||') */',10);
 add(k_min||','||1/(p/100)||'*count(rowid) num_rows',1);
 add(','||1/(p/100)||'*sum(1+vsize(rowid)+'||k_vsz||') vsize',1);
 add('from '||o||'.'||t||' '||p_sam||' '||t,1);
 add('where '||k_nul,1);
 add('group by sys_op_lbid('||oid||',''L'',rowid)',1);
 add('),keys as (',0);
 add('select ntile('||n||') over (order by '||k_lst||') bucket,',1);
 add(k_min||',',2);
 add('count(*) leaf_blocks, count(*)*'||hsz||' tsize,',2);
 add('sum(num_rows) num_rows,sum(vsize) vsize',2);
 add('from leaf_blocks group by '||k_lst,1);
 add(')',0);
 add('select '||k_min||',''->'' "->",'||k_max||',round(sum(num_rows)/sum(leaf_blocks)) "rows/block"',0);
 add(',round(sum(vsize)/sum(leaf_blocks)) "bytes/block",',1);
 add('case when sum(vsize)<=sum(tsize) then 100*round(1- sum(vsize) / (sum(tsize)),2) else null end "%free space",',1);
 add(' sum(leaf_blocks) "blocks",');
 --add('case when sum(vsize)<=sum(tsize)/2 then substr(rpad(''o'',5*round(1- sum(vsize) / (sum(tsize)),2),''o''),1,5) end "free"',1);
 add('substr(rpad(''o'',5*round(1- sum(vsize) / (sum(tsize)),2),''o''),1,5) "free"',1);
 add('from keys group by bucket order by bucket',0);
 begin open :c for s ; exception when others then dbms_output.put_line(s); raise; end ;
 dbms_output.put_line(s);
end;
/
set termout on 
print c

The script is quite ugly. It’s SQL generated by PL/SQL. It’s generated because it selects the index columns. And as I don’t want to have it too large it is not indented nor commented. However, if you run it with set servertoutput on you will see the generated query.

How to use it? Just change the owner, table_name, and index name. It reads the whole index so if you have a very large index you may want to change the sample size.