Fast refresh of aggregate-only materialized views with SUM – algorithm

In this post I will illustrate the algorithm used by Oracle (in 11.2.0.3) to fast refresh a materialized view (MV) containing only the SUM aggregate function:

create materialized view test_mv

build immediate

refresh fast on demand

with rowid

as

select gby as mv_gby,

count(*) as mv_cnt_star,

sum (dat) as mv_sum_dat,

count(dat) as mv_cnt_dat

from test_master

where whe = 0

group by gby

;

Note that count(dat) is specified – you could avoid that if column dat is constrained to be not-null (as stated in the documentation), but I’m not covering that corner case here.

The MV log is configured to “log everything”:

create materialized view log on test_master

with rowid ( whe, gby, dat ), sequence

including new values;

In the general introduction to aggregate-only MVs we have seen how the refresh engine first marks the log rows, then inspects TMPDLT (loading its rows into the result cache at the same time) to classify its content as insert-only (if it contains only new values), delete-only (if it contains only old values) or general (if it contains a mix of new/old values). Here we illustrate the refreshing SQL in all three scenarios, extracted from the supporting test case.

refresh for insert-only TMPDLT

The refresh is made using this single merge statement:

/* MV_REFRESH (MRG) */

merge into test_mv

using (

with tmpdlt$_test_master as (

-- check introduction post for statement

)

select gby,

sum( 1 ) as cnt_star,

sum( 1 * decode(dat, null, 0, 1) ) as cnt_dat,

nvl( sum( 1 * dat), 0 ) as sum_dat

from (select gby, whe, dat

from tmpdlt$_test_master

) as of snapshot(:b_scn)

where whe = 0

group by gby

) deltas

on ( sys_op_map_nonnull(test_mv.mv_gby) = sys_op_map_nonnull(deltas.gby) )

when matched then

update set

test_mv.mv_cnt_star = test_mv.mv_cnt_star + deltas.cnt_star,

test_mv.mv_cnt_dat = test_mv.mv_cnt_dat + deltas.cnt_dat,

test_mv.mv_sum_dat = decode( test_mv.mv_cnt_dat + deltas.cnt_dat,

0, null,

nvl(test_mv.mv_sum_dat,0) + deltas.sum_dat

)

when not matched then

insert ( test_mv.mv_gby, test_mv.mv_cnt_dat, test_mv.mv_sum_dat, test_mv.mv_cnt_star )

values ( deltas.gby, deltas.cnt_dat, decode (deltas.cnt_dat, 0, null, deltas.sum_dat), deltas.cnt_star)

It simply calculates the delta values to be propagated by grouping-and-summing the new values contained in TMPDLT that satisfy the where-clause (essentially, it executes the MV statement on the mv log without redundant values), and then looks for matches over the grouped-by expression (using the null-aware function sys_op_map_nonnull, more on this later). It then applies the deltas to the MV, or simply inserts them if no match is found.

Note that mv_sum_dat (that materializes sum(dat)) is set to null if, and only if, (the updated value of) mv_cnt_dat (that materializes count(dat)) is zero (signaling that for this value of mv_gby, all values of dat in the master table are null). This is done in all three scenarios of the algorithm.

The matching function sys_op_map_nonnull() is there to match null values with null values, since aggregating by null is perfectly legal, yet you cannot match null with null in a merge/join. This function returns a raw value that is never null, and set to 0xFF when the input is null and to the binary representation of the input postfixed with 0×00 for other input values. Note that a function-based index, named I_SNAP$_TEST_MV in our case, is automatically created on sys_op_map_nonnull(mv_gby) to give the CBO the opportunity to optimize the match (unless the MV is created specifying USING NO INDEX, which is probably almost never a good idea when you need to fast refresh).

Note also that the master table, test_master, is not accessed at all, as it is always the case for SUM (but not necessarily for MAX, as we will see in the next post). This elegant decoupling (possible thanks to the mathematical properties of the addition, of course) of the master table from the MV greatly improves performance and also simplifies performance tuning.

refresh for delete-only TMPDLT

The refresh is made in two steps, the first being this update statement:

/* MV_REFRESH (UPD) */

update /*+ bypass_ujvc */ (

select test_mv.mv_cnt_dat,

deltas .cnt_dat,

test_mv.mv_sum_dat,

deltas .sum_dat,

test_mv.mv_cnt_star,

deltas .cnt_star

from test_mv,

( with tmpdlt$_test_master as (

-- check introduction post for statement

)

select gby,

sum( -1 ) as cnt_star

sum( -1 * decode(dat, null, 0, 1) ) as cnt_dat,

nvl( sum(-1 * dat), 0) as sum_dat

from (select gby, whe, dat

from tmpdlt$_test_master mas$

) as of snapshot(:b_scn)

where whe = 0

group by gby

) deltas

where sys_op_map_nonnull(test_mv.mv_gby) = sys_op_map_nonnull(deltas.gby)

)

set mv_cnt_star = mv_cnt_star + cnt_star,

mv_cnt_dat = mv_cnt_dat + cnt_dat,

Page 1 of 2 | Next page