fast refresh of join-only materialized views – algorithm summary

This post investigates how Oracle fast refreshes materialized views containing only joins of master tables:

create materialized view test_mv

build immediate

refresh fast on demand

as

select test_t1.*, test_t1.rowid as test_t1_rowid,

test_t2.*, test_t2.rowid as test_t2_rowid,

test_t3.*, test_t3.rowid as test_t3_rowid

from test_t1, test_t2, test_t3

where test_t1.j1_2 = test_t2.j2_1

and test_t2.j2_3 = test_t3.j3_2

;

The fast refresh algorithm is simple and very easy to understand – so trivial in fact that once examined and understood, the possible tuning techniques follow naturally.

The test case traces the fast refresh of the above materialized view (MV) using the 10046 event (aka “sql trace”). The test case has been run on 9.2.0.8, 10.2.0.4 and 11.1.0.7 (the latest versions of 9i, 10g and 11g available as of today), and on all of these versions the algorithm used by the refreshing engine (run by invoking dbms_mview.refresh) appears to be the same, with only a few implementation differences.

The test case explores the most general case: it performs inserts, updates and deletes on all the three master tables (the inserts being conventional; I will explore direct-path inserts another time).

Materialized view logs configuration

In the test case, I have configured the materialized view logs to “log everything”, in order to check whether more information in the logs could trigger some special kernel code designed to take advantage of it:

create materialized view log on test_t1

with sequence, rowid, primary key (j1_2, x1)

including new values;

but the engine uses only the rowid information even in 11.1.0.7, so you are better off logging only the rowid if the master table feeds join-only materialized views exclusively:

create materialized view log on test_t1 with rowid;

Minimal logging obviously improves the performance of DML against the master tables, but it also optimizes the fast refresh, since the latter, as we are going to see in a moment, reads each log twice, and of course the less you log, the more compact the logs will be.

Log snapshots

After some preliminary visits to the data dictionary, the first operation performed by the fast refresh engine is to “mark” the modifications (recorded in the materialized view logs) to be propagated to the MV. Only the marked log rows are then fed by the fast refresh engine as input to the next steps.

The “flag” used to mark the rows is the column snaptime$$. When the refresh starts, the engine performs a “snapshot” of the materialized view logs by setting the snaptime$$ of all the new rows (those with snaptime$$ = ‘01/01/4000′) of each log in turn to the current time (SYSDATE).

In detail, the snapshot is performed by issuing this SQL statement (slightly edited for readability) in 9.2.0.8 and 10.2.0.4:

update MLOG$_TEST_T1

set snaptime$$ = :1

where snaptime$$ > to_date('2100-01-01:00:00:00','YYYY-MM-DD:HH24:MI:SS')

The bind variable :1 is a DATE whose value is equal to SYSDATE.

Note: In 11.1.0.7, the statement is slightly different but makes the same thing, probably in a more scalable way concurrency-wise (check the script spools if you’re interested).

You might have noticed the where condition on snaptime$$; that is necessary since the logs might be used by more than one materialized view. When a refresh ends, in fact, the engine checks whether other MVs might need each log row, and deletes only the log rows that have been processed by all dependant MVs; the other ones are left unchanged (and hence keep the snaptime$$ that was set when the fast refresh started). The where condition is needed to avoid overwriting the snaptime$$, and mark with the current time only the brand new rows (those with snaptime$$ = ‘01/01/4000′).

So, at the end of the snapshot, the log rows that must be examined by the refresh engine will be the ones that are marked by having their snaptime$$ between the date of the last refresh (excluded) and :1 (included). All the other log rows must be ignored.

Side note: marking data at a certain point in time and then replicating the marked data is the only replication strategy that can work when you cannot “freeze” the master tables, as this is definitely our case. This is a general topic worth blogging about in the future.

The marked log rows are then inspected to count the number and type of the logged modifications. This is to check whether any of the replication steps (i.e. the DEL and INS steps that we are going to discuss in a moment) could be skipped. Also, the number of modifications is used (in some versions) to inject some hints in the SQL statements of the replication steps, a topic that falls out of the scope of this post.

Core algorithm: the INS and DEL steps

Page 1 of 2 | Next page