Fast refresh of aggregate-only materialized views with SUM – algorithm

mv_sum_dat = decode(mv_cnt_dat + cnt_dat, 0, null, nvl(mv_sum_dat,0) + sum_dat)

this calculates the same deltas as the insert-only case, just with signs reversed since, of course, we are propagating deletes instead of inserts; it then applies them using an updatable join in-line view instead of a merge.

Then, this delete statement is issued:

/* MV_REFRESH (DEL) */

delete from test_mv where mv_cnt_star = 0;

this is because, when mv_cnt_star (that materializes count(*)) is zero after the deltas application, it means that all the rows belonging to that value of mv_gby have been deleted in the master table, and hence that value must be removed from the MV as well.

Note that an index on mv_cnt_star is NOT automatically created (as of 11.2.0.3) – it might be a very good idea to create it, to avoid a full scan of the MV at every refresh, which is O(mv size) and not O(modifications) as the other steps (thus rendering the whole refresh process O(mv size)).

refresh for mixed-DML TMPDLT

The refresh is accomplished using a single merge statement, which is an augmented version of the insert-only statement plus a delete clause that implements the last part of the delete-only refresh:

/* MV_REFRESH (MRG) */

merge into test_mv

using (

select gby,

sum( decode(dml$$, 'I', 1, -1) ) as cnt_star,

sum( decode(dml$$, 'I', 1, -1) * decode(dat, null, 0, 1)) as cnt_dat,

nvl( sum(decode(dml$$, 'I', 1, -1) * dat), 0) as sum_dat

from (select chartorowid(m_row$$) rid$, gby, whe, dat,

decode(old_new$$, 'N', 'I', 'D') as dml$$,

dmltype$$

from mlog$_test_master

where snaptime$$ > :b_st0

) as of snapshot(:b_scn)

where whe = 0

group by gby

) deltas

on ( sys_op_map_nonnull(test_mv.mv_gby) = sys_op_map_nonnull(deltas.gby) )

when matched then

update set

test_mv.mv_cnt_star = test_mv.mv_cnt_star + deltas.cnt_star,

test_mv.mv_cnt_dat = test_mv.mv_cnt_dat + deltas.cnt_dat,

test_mv.mv_sum_dat = decode( test_mv.mv_cnt_dat + deltas.cnt_dat,

0, null,

nvl(test_mv.mv_sum_dat,0) + deltas.sum_dat

),

delete where ( test_mv.mv_cnt_star = 0 )

when not matched then

insert ( test_mv.mv_gby, test_mv.mv_cnt_dat, test_mv.mv_sum_dat, test_mv.mv_cnt_star )

values ( deltas.gby, deltas.cnt_dat, decode (deltas.cnt_dat, 0, null, deltas.sum_dat), deltas.cnt_star)

where (deltas.cnt_star > 0)

Here the deltas are calculated by reversing the sign of “old” values (dml$$ not equal to ‘I’, which is the same as old_new$$ not equal to ‘N’; note in passing that it does not distinguish between old_new$$ equal to ‘O’ or ‘U’, as stated in the introduction post), of course adjusting cnt_star and cnt_dat accordingly.

The removal of rows that get their mv_cnt_star set to zero is performed as a side case of the update, which is very nice since it does not call for an index on that column.

Surprisingly, this statement does not use TMPDLT, but reads straight from the log; I don’t know the reason behind this, and whether this is always the case or if TMPDLT is sometimes used, depending, perhaps, on some heuristic decision. Surely, while using TMPDLT is mandatory in the other two cases (since the statements work only if their input is insert/delete only, and that is checked over TMPDLT only), it is just a possible optimization choice here.

optimizations

Knowing the “internal” workings presented here makes it vastly easier (I hope) to optimize the refresh process and avoid pitfalls; it is of course unfeasible to cover all possible real-life scenarios, but I can offer some general high-level considerations.

Obviously, fast refreshing is better than complete refreshing only when the number of modifications stored in the MV log is “small” compared to the MV size. In this scenario the usual optimal values propagation plan reads from the log, computes TMPDLT (when used), and joins the MV using NESTED LOOPS using the (automatically created) index on sys_op_map_nonnull(mv_gby) – or possibly using HASH JOIN, or SORT MERGE JOIN, again using the same index.

Hence the only optimization worth making is to create the index on mv_cnt_star, unless you can be absolutely sure that you will never be in the delete-only scenario, or unless you don’t care about the resulting full MV scan.

Since the master table is never read during the refresh, it can be left alone. This is great.

The best access method for the log is usually a full table scan, since all rows are normally read, and hence usually nothing has to be done on the log. I can imagine that in rare cases one might consider optimizing the log by e.g. creating an index – for example a covering index on all the referenced columns; or one to speed up the analytic functions computations of TMPDLT avoiding the sort; or one prefixed by snaptime$$ if other MVs read from the log and refreshes at different times, etc.

Maybe, sometimes, it might prove beneficial to disable TMPDLT, as discussed in the introduction post.

Page 2 of 2 | Previous page

Leave a comment