fast refresh of join-only materialized views – algorithm summary

This post investigates how Oracle fast refreshes materialized views containing only joins of master tables:

create materialized view test_mv
build immediate
refresh fast on demand
as
select test_t1.*, test_t1.rowid as test_t1_rowid,
       test_t2.*, test_t2.rowid as test_t2_rowid,
       test_t3.*, test_t3.rowid as test_t3_rowid
  from test_t1, test_t2, test_t3
 where test_t1.j1_2 = test_t2.j2_1
   and test_t2.j2_3 = test_t3.j3_2
;

The fast refresh algorithm is simple and very easy to understand - so trivial in fact that once examined and understood, the possible tuning techniques follow naturally.

The test case traces the fast refresh of the above materialized view (MV) using the 10046 event (aka "sql trace"). The test case has been run on 9.2.0.8, 10.2.0.4 and 11.1.0.7 (the latest versions of 9i, 10g and 11g available as of today), and on all of these versions the algorithm used by the refreshing engine (run by invoking dbms_mview.refresh) appears to be the same, with only a few implementation differences.

The test case explores the most general case: it performs inserts, updates and deletes on all the three master tables (the inserts being conventional; I will explore direct-path inserts another time).

Materialized view logs configuration

In the test case, I have configured the materialized view logs to "log everything", in order to check whether more information in the logs could trigger some special kernel code designed to take advantage of it:

create materialized view log on test_t1
with sequence, rowid, primary key (j1_2, x1)
including new values;

but the engine uses only the rowid information even in 11.1.0.7, so you are better off logging only the rowid if the master table feeds join-only materialized views exclusively:

create materialized view log on test_t1 with rowid;

Minimal logging obviously improves the performance of DML against the master tables, but it also optimizes the fast refresh, since the latter, as we are going to see in a moment, reads each log twice, and of course the less you log, the more compact the logs will be.

Log snapshots

After some preliminary visits to the data dictionary, the first operation performed by the fast refresh engine is to "mark" the modifications (recorded in the materialized view logs) to be propagated to the MV. Only the marked log rows are then fed by the fast refresh engine as input to the next steps.

The "flag" used to mark the rows is the column snaptime$$. When the refresh starts, the engine performs a "snapshot" of the materialized view logs by setting the snaptime$$ of all the new rows (those with snaptime$$ = '01/01/4000') of each log in turn to the current time (SYSDATE).

In detail, the snapshot is performed by issuing this SQL statement (slightly edited for readability) in 9.2.0.8 and 10.2.0.4:

update MLOG$_TEST_T1
   set snaptime$$ = :1
 where snaptime$$ > to_date('2100-01-01:00:00:00','YYYY-MM-DD:HH24:MI:SS')

The bind variable :1 is a DATE whose value is equal to SYSDATE.

Note: In 11.1.0.7, the statement is slightly different but makes the same thing, probably in a more scalable way concurrency-wise (check the script spools if you're interested).

You might have noticed the where condition on snaptime$$; that is necessary since the logs might be used by more than one materialized view. When a refresh ends, in fact, the engine checks whether other MVs might need each log row, and deletes only the log rows that have been processed by all dependant MVs; the other ones are left unchanged (and hence keep the snaptime$$ that was set when the fast refresh started). The where condition is needed to avoid overwriting the snaptime$$, and mark with the current time only the brand new rows (those with snaptime$$ = '01/01/4000').

So, at the end of the snapshot, the log rows that must be examined by the refresh engine will be the ones that are marked by having their snaptime$$ between the date of the last refresh (excluded) and :1 (included). All the other log rows must be ignored.

Side note: marking data at a certain point in time and then replicating the marked data is the only replication strategy that can work when you cannot "freeze" the master tables, as this is definitely our case. This is a general topic worth blogging about in the future.

The marked log rows are then inspected to count the number and type of the logged modifications. This is to check whether any of the replication steps (i.e. the DEL and INS steps that we are going to discuss in a moment) could be skipped. Also, the number of modifications is used (in some versions) to inject some hints in the SQL statements of the replication steps, a topic that falls out of the scope of this post.

Core algorithm: the INS and DEL steps

Then, the core replication starts. The replication considers each master table in turn, and for each table, propagates the modifications to the MV. So we have essentially one single algorithm that propagates from a single master table, just repeated once for each master table.

The propagation for each master table is made of two simple steps, steps that I'm going to name after the comments of the SQL as a DEL (DELETE) step followed by an INS (INSERT) step.

The DEL step is (editing for readability: removing hints, unnecessary aliases, etc):

/* MV_REFRESH (DEL) */
delete from test_mv
 where test_t1_rowid in
       (
select * from
       (
select chartorowid (m_row$$)
  from mlog$_test_t1
 where snaptime$$ > :1
       ) as of snapshot (:2)
       )

The subquery simply fetches the rowid of all marked rows, since :1 is the date of the previous refresh of the materialized view, and :2 is the SCN (coded as a RAW variable) of the time when the snapshot was performed. So, this step deletes from the MV all the rows that record the result of the MV-defining join of any of the marked rows (of the current master table) with the other master tables.

This is the step that can benefit from the index on the column that stores the master table rowid (here, test_t1_rowid) that the documentation suggests to create. Note that in order to optimize this step, you need three separate single-column indexes (here, on test_t1_rowid, test_t2_rowid, test_t3_rowid), not a single composite index spanning the (here, three) columns, as it is sometimes wrongly stated.

The INS step is (again editing for readability):

/* MV_REFRESH (INS) */
insert into test_mv
select jv.j1_2, jv.x1, jv.pk1, jv.rid$,
       mas2.j2_1, mas2.j2_3, mas2.x2, mas2.pk2, mas2.rowid,
       mas3.j3_2, mas3.x3, mas3.pk3, mas3.rowid
  from (
select log.rowid rid$, log.*
  from test_t1 log
 where rowid in
       (
select chartorowid(log.m_row$$)
  from mlog$_test_t1
 where snaptime$$ > :1
       )
       ) as of snapshot (:2) jv,
       test_t2 as of snapshot (:2)  mas2,
       test_t3 as of snapshot (:2)  mas3
 where   jv.j1_2 = mas2.j2_1
   and mas2.j2_3 = mas3.j3_2

The subquery is the same as the DEL step and serves the very same function. So, this statement replays the SQL statement of the MV definition, but limited to the marked rows only. Note that all tables are read at the same point in time in the past, the time when the snapshot of the log was performed, thanks to the argument of the "as of snapshot" clause being the same.

In order to speed up the INS step, indexes on the joined columns can be created on the master tables (not on the MV!). This is because, special cases aside, it is well known that the "fast refresh" (the name itself is quite horrible, many people prefer the adjective "incremental") can be actually "fast" only if a small fraction of the master tables is modified (otherwise, a complete refresh is better); in this scenario, almost certainly the optimal plan is a series of NESTED LOOPs that has the current table (test_t1 in this case) as the most outer table, series that can usually benefit a lot by an index on the inner tables join columns. Of course, you must remember that every table, in turn, acts as the most outer table, hence you should index every possible join permutation.

So here what the algorithm is all about: the DEL and INS steps, together, simply delete and recreate the "slice" of the MV that is referenced by the marked rows, whatever the modification type. The algorithm is as simple (and brutal) as it seems.

Algorithm optimizations

The only optimizations performed are the skipping of some steps when they are obviously unnecessary. For every master table, the DEL step is skipped when only INSERTs are marked in the logs; the INS is skipped when only DELETEs are marked, and of course both are skipped if no row is marked. I have not been able to spot any other optimization.

Note that this means that UPDATEs always turn into a delete+insert of the entire "slice". For example, consider the typical case of a parent table (say, CUSTOMER), with a child (say, ORDER) and a grandchild (say, ORDER_LINE); if you update a column of a row of the parent (say, ORDERS_TOTAL_AMOUNT), the parent row and its whole progeny (the "slice") will be deleted and then recreated. This was a quite surprising discovery for me - a fact that I have now committed to memory.


  1. Cristian

    Friday, August 7, 2009

    Very interesting post. I’ve read it and re-read and i’ve made same tests by myself. I’ve a question: why in your opinion whith simple mv’s (on a single table, without joins) Oracle does not use the same algorithm but makes updates as updates? (i’ve made a test on 10.2.0.4)

  2. Alberto Dell'Era

    Friday, August 7, 2009

    @Cristian

    what do you mean by “simple MV”, with or without aggregates ?

    You might want to post the defining SQL statement of the MV …

  3. Cristian

    Friday, August 7, 2009

    I meant without aggregates:

    update testmv1 set fld1='a2' where pk1=1;
    commit;
    EXEC DBMS_MVIEW.REFRESH('TEST_MV1');
    SELECT * FROM TESTMV1_LOG;
    

    This because we sometimes use MV’s as data replication tool.

  4. Alberto Dell'Era

    Friday, August 7, 2009

    Ok, but what is the defining SQL statement of the MV TEST_MV1 ?

  5. Cristian

    Friday, August 7, 2009

    Sorry, the sun on my head :)

    create table testmv1 (pk1 number,fld1 varchar2(10));
    create materialized view log on testmv1 with rowid;
    
    create materialized view test_mv1
    build immediate
    refresh fast with rowid
    on demand
    as
     select
      testmv1.*
     from testmv1;
    

    I’ve not tried with primary key based refresh.

  6. Alberto Dell'Era

    Friday, August 7, 2009

    I know a “rowid MV”, as your one is, cannot contain neither joins neither aggregates and must be based on a single table, so they probably use a different algorithm than the one examined in this post (which is for join-only MVs). They are in fact MVs used for replication almost exclusively, usually from a remote database to a local one through a dblink – the old name was “snapshots”, a technology “now merged” with the “materialized view” one. So it is outside the scope of this post, albeit I’ve added an investigation about this on my to-do list. It makes sense, anyway, that on a scenario as simple as this one, they have tried to optimize the propagation (hence using UPDATEs) as much as possible (especially to minimize the traffic over the dblink).

  7. Alberto Dell’Era’s Oracle blog » fast refresh of single-table materialized views - algorithm summary

    Tuesday, August 11, 2009

    [...] might be considered a degenerate case of a join-only MV, a topic that we investigated in an earlier post, and one could expect the same algorithm. But that is not the case: the test case shows that the [...]

  8. Alberto Dell'Era

    Tuesday, August 11, 2009

    @Cristian

    I have posted an investigation about your scenario.

  9. Blogroll Report 31/07/2009 – 07/08/2009 « Coskan’s Approach to Oracle

    Wednesday, August 12, 2009

    [...] Alberto Dell’Era- fast refresh of join-only materialized views -algorithm summary [...]

  10. Alberto Dell’Era’s Oracle blog » 11gR2: materialized view logs changes

    Tuesday, November 3, 2009

    [...] the refresh is performed by using a “mark-and-propagate” algorithm, which is essentially (check this post for some additional details): 1) new log rows are inserted with snaptime$$=4000 A.D; 2) at refresh [...]

  11. Alberto Dell’Era’s Oracle blog » fast refresh of join-only MVs: _mv_refresh_use_stats and locking log stats

    Thursday, March 11, 2010

    [...] understand the importance of points 5 and 6, please check this post of mine; note how those indexes are a necessary prerequisite for the sanity of the DEL and INS steps of the [...]

  12. Shuai

    Tuesday, November 1, 2011

    I’m very glad to read your article about the algorithm of join-only MV fast refresh. It’s very useful. I just want to know more about this: how about a joined MV with aggregation such as sum or count? I found it’s very difficult to explain the aggregation MV’s behavior while doing fast refresh because there is no rowid column. Do you have any idea? Many thanks!

  13. Alberto Dell'Era

    Tuesday, November 1, 2011

    Hi Shuai,

    I have never investigated that scenario – but you can do it yourself by modifying one of my test cases … just try to see the SQL statements that the refresh produces, that’s all it takes to get a rough model of the process ;)

  14. Alberto Dell’Era’s Oracle blog » fast refresh of outer-join-only materialized views – algorithm, part 1

    Monday, April 22, 2013

    [...] only 11.2.0.3. We will use the very same scenario (MV log configuration, DML type, etc) as in the inner join case, "just" turning the inner join into an outer [...]

  15. Alberto Dell’Era’s Oracle blog » Fast refresh of aggregate-only materialized views – introduction

    Monday, August 5, 2013

    [...] to the MV by setting snaptime$$ equal to the current time – check the description contained in this post for details (note also another possible variant with "commit-scn mv logs"). MV log purging (at the [...]

  16. the venus factor

    Monday, August 18, 2014

    It’s in fact very difficult in this active life to listen news on Television, therefore I just use web
    for that purpose, and get the newest news.

  17. Complete Small Dog Breed List

    Tuesday, August 19, 2014

    Thank youu for sharing your thoughts. I really appreciate
    youur efforts andd I am waiting for your next post thanos once again.

    Here is my website Complete Small Dog Breed List

Leave a Comment

Please enclose SQL code inside the tag pair [sql] ... [/sql]

Subscribe without commenting

Links (alphabetical order)

Blogroll (alphabetical order)

Blog Aggregators (alphabetical order)


Switch to our mobile site