Fast refresh of aggregate-only materialized views with MAX – algorithm

where snaptime$$ > :b_st0

) as of snapshot(:b_scn)

)

and an insert that recalculates them reading from the master table:

/* MV_REFRESH (INS) */

insert into test_mv

select gby, count(*), max(dat)

from (select gby, dat

from test_master

where ( sys_op_map_nonnull(gby) ) in (

select sys_op_map_nonnull(gby)

from (select gby, dat

from mlog$_test_master

where snaptime$$ > :b_st0

) as of snapshot(:b_scn)

)

)

group by gby

This is necessary since a delete(old value) might remove the max value present in the MV, and to know the new max value we must necessarily visit the master table. This might not happen for all log values, but the refresh engine takes the easiest (and drastic) option of deleting and recreating all anyway.

Note that this might result in massive degradation of performance – this algorithm in not O(modifications) but O(modifications * V), where V is the average number of rows per distinct value of gby, which is generally O(mv size). For example: if your order table doubles in size, you must expect to double the refresh time, even if the number of orders modified is always the same.

As in the SUM case, a bit surprisingly, this statement does not use TMPDLT, but reads straight from the log; the same observations made for the SUM case apply here as well.

optimizations

The insert-only case is very similar to the SUM case, and thus please refer to the high-level discussion presented there if interested (but, obviously, the creation of the index on mv_cnt_star is not needed in the MAX case). As a side note, one might notice that insert-only algorithms for aggregation are a class of their own, vastly simpler and vastly more performant (and that does not come as a surprise).

The mixed-DML case is another story altogether – to optimize it you must (almost always) create a proper index on the master table. At least the expression sys_op_map_nonnull(gby) must be indexed, but I would strongly advise to create this covering index:

create index test_master_covering on test_master ( sys_op_map_nonnull(gby), gby, dat )

this way everything needed for recalculating a given gby value is neatly clustered in some leaf blocks, instead of being spread out across all the table. You might spare thousands of table block visits if, as it is quite often the case, you have thousands of rows for each gby values and a bad clustering_factor.

Note also that this index is probably highly compressible, thus adding compress 2 (or even 3, depending on the “dat” column statistic distribution) is a great thing to do as well.

Script gby_max_with_covering_index.sql shows the possible index-only resulting plan:

--------------------------------------------------

|Id|Operation |Name |

--------------------------------------------------

| 0|INSERT STATEMENT | |

| 1| LOAD TABLE CONVENTIONAL| |

| 2| HASH GROUP BY | |

| 3| NESTED LOOPS | |

| 4| SORT UNIQUE | |

| 5| TABLE ACCESS FULL |MLOG$_TEST_MASTER |

| 6| INDEX RANGE SCAN |TEST_MASTER_COVERING|

--------------------------------------------------

For optimizing the log, and disabling TMPDLT, the same considerations made for the SUM case hold.

Page 2 of 2 | Previous page

Leave a comment