Fast refresh of aggregate-only materialized views – introduction

This post introduces a series about the algorithm used by Oracle (in 11.2.0.3) to fast refresh a materialized view (MV) containing only an aggregate:

create materialized view test_mv

build immediate

refresh fast on demand

with rowid

as

select gby as mv_gby,

count(*) as mv_cnt_star,

AGG (dat) as mv_AGG_dat,

count(dat) as mv_cnt_dat

from test_master

where whe = 0

group by gby

;

Where AGG is either SUM or MAX, the most important aggregates.

In the next posts, I will illustrate the algorithms used to propagate conventional (not direct-load) inserts, updates and deletes on the master table; I will illustrate also the specialized versions of the algorithms used when only one type of DML has been performed (if they exist).

In this post, we sets the stage, make some general observations, and illustrate the very first steps of the algorithm that are common to all scenarios. Everything is supported by the usual test case.

Materialized view logs configuration

I have configured the materialized view log on the master table to “log everything”, to give the most complete information possible to the MV refresh engine:

create materialized view log on test_master

with rowid ( whe, gby, dat ), sequence

including new values;

With this configuration, each modification to the master table logs the rowid of the affected rows (in column m_row$$), and it is labeled with an increasing value (in sequence$$) that enables the MV refresh engine to reconstruct the order in which the modifications happened. In detail, let’s see what’s inside the logs after we modify a single row (from mvlog_examples.sql):

After an INSERT:

SEQUENCE$$ M_ROW$$ DMLTYPE$$ OLD_NEW$$ WHE GBY DAT

---------- -------------------- --------- --------- ------ ------ ------

10084 AAAWK0AAEAAAxTHAD6 I N 10 10 10

This logs the new values (old_new$$=’N’) of an Insert (dmltype$$=’I’).

After a DELETE:

SEQUENCE$$ M_ROW$$ DMLTYPE$$ OLD_NEW$$ WHE GBY DAT

---------- -------------------- --------- --------- ------ ------ ------

10085 AAAWK0AAEAAAxTFAAA D O 0 0 1

This logs the old values (old_new$$=’O’) of a Delete (dmltype$$=’D’).

After an UPDATE:

SEQUENCE$$ M_ROW$$ DMLTYPE$$ OLD_NEW$$ WHE GBY DAT

---------- -------------------- --------- --------- ------ ------ ------

10086 AAAWK0AAEAAAxTHAD6 U U 10 10 10

10087 AAAWK0AAEAAAxTHAD6 U N 10 10 99

This logs both the old values (old_new$$=’U’) and the the new values (old_new$$=’N’) of an Update (dmltype$$=’U’). So we see that the update changed DAT from 10 to 99, without changing the other columns.

Note that the update log format is the same as a delete (at sequence 10086) immediately followed by an insert (at sequence 10087) at the same location on disk (AAAWK0AAEAAAxTHAD6), the only differences being dmltype$$=’U’ and old_new$$ set to ’U’ instead of ‘O’ for the old values.

But if you ignore these differences, you can consider the log a sequence of deletes/inserts, or if you prefer, a stream of old/new values. And this is exactly what the refresh engine does – it does not care whether an old value is present because it logs a delete or the “erase side” of an update, and ditto for new values. It “sees” the log as a stream of old/new values, as we will demonstrate.

Log snapshots

When the MV fast refresh is started, the first step is to “mark” the logged modifications to be propagated to the MV by setting snaptime$$ equal to the current time – check the description contained in this post for details (note also another possible variant with “commit-scn mv logs”). MV log purging (at the end of the refresh) is the same as well.

TMPDLT (deleting the redundant log values)

The stream of old/new values marked in the log might contain pairs of redundant values, each pair being composed of a new value (insert) immediately followed by an old value (delete) on the same row; every such pair can be ignored without affecting the refresh result. Filtering out these pairs is the job of this SQL fragment (nicknamed “TMPDLT”), heavily edited for readability:

with tmpdlt$_test_master as (

select /*+ result_cache(lifetime=session) */

rid$, gby, dat, whe,

decode(old_new$$, 'N', 'I', 'D') dml$$,

old_new$$, snaptime$$,

dmltype$$

from (select log.*,

min( sequence$$ ) over (partition by rid$) min_sequence$$,

max( sequence$$ ) over (partition by rid$) max_sequence$$

from (select chartorowid(m_row$$) rid$, gby, dat, whe,

Page 1 of 3 | Next page