fast refresh of join-only materialized views – algorithm summary

This post investigates how Oracle fast refreshes materialized views containing only joins of master tables:
[sql]
create materialized view test_mv
build immediate
refresh fast on demand
as
select test_t1.*, test_t1.rowid as test_t1_rowid,
test_t2.*, test_t2.rowid as test_t2_rowid,
test_t3.*, test_t3.rowid as test_t3_rowid
from test_t1, test_t2, test_t3
where test_t1.j1_2 = test_t2.j2_1
and test_t2.j2_3 = test_t3.j3_2
;
[/sql]
The fast refresh algorithm is simple and very easy to understand – so trivial in fact that once examined and understood, the possible tuning techniques follow naturally.

The test case traces the fast refresh of the above materialized view (MV) using the 10046 event (aka “sql trace”). The test case has been run on 9.2.0.8, 10.2.0.4 and 11.1.0.7 (the latest versions of 9i, 10g and 11g available as of today), and on all of these versions the algorithm used by the refreshing engine (run by invoking dbms_mview.refresh) appears to be the same, with only a few implementation differences.

The test case explores the most general case: it performs inserts, updates and deletes on all the three master tables (the inserts being conventional; I will explore direct-path inserts another time).

Materialized view logs configuration

In the test case, I have configured the materialized view logs to “log everything”, in order to check whether more information in the logs could trigger some special kernel code designed to take advantage of it:
[sql]
create materialized view log on test_t1
with sequence, rowid, primary key (j1_2, x1)
including new values;
[/sql]
but the engine uses only the rowid information even in 11.1.0.7, so you are better off logging only the rowid if the master table feeds join-only materialized views exclusively:
[sql]
create materialized view log on test_t1 with rowid;
[/sql]
Minimal logging obviously improves the performance of DML against the master tables, but it also optimizes the fast refresh, since the latter, as we are going to see in a moment, reads each log twice, and of course the less you log, the more compact the logs will be.

Log snapshots

After some preliminary visits to the data dictionary, the first operation performed by the fast refresh engine is to “mark” the modifications (recorded in the materialized view logs) to be propagated to the MV. Only the marked log rows are then fed by the fast refresh engine as input to the next steps.

The “flag” used to mark the rows is the column snaptime$$. When the refresh starts, the engine performs a “snapshot” of the materialized view logs by setting the snaptime$$ of all the new rows (those with snaptime$$ = ’01/01/4000′) of each log in turn to the current time (SYSDATE).

In detail, the snapshot is performed by issuing this SQL statement (slightly edited for readability) in 9.2.0.8 and 10.2.0.4:
[sql]
update MLOG$_TEST_T1
set snaptime$$ = :1
where snaptime$$ > to_date(‘2100-01-01:00:00:00′,’YYYY-MM-DD:HH24:MI:SS’)
[/sql]
The bind variable :1 is a DATE whose value is equal to SYSDATE.

Note: In 11.1.0.7, the statement is slightly different but makes the same thing, probably in a more scalable way concurrency-wise (check the script spools if you’re interested).

You might have noticed the where condition on snaptime$$; that is necessary since the logs might be used by more than one materialized view. When a refresh ends, in fact, the engine checks whether other MVs might need each log row, and deletes only the log rows that have been processed by all dependant MVs; the other ones are left unchanged (and hence keep the snaptime$$ that was set when the fast refresh started). The where condition is needed to avoid overwriting the snaptime$$, and mark with the current time only the brand new rows (those with snaptime$$ = ’01/01/4000′).

So, at the end of the snapshot, the log rows that must be examined by the refresh engine will be the ones that are marked by having their snaptime$$ between the date of the last refresh (excluded) and :1 (included). All the other log rows must be ignored.

Side note: marking data at a certain point in time and then replicating the marked data is the only replication strategy that can work when you cannot “freeze” the master tables, as this is definitely our case. This is a general topic worth blogging about in the future.

The marked log rows are then inspected to count the number and type of the logged modifications. This is to check whether any of the replication steps (i.e. the DEL and INS steps that we are going to discuss in a moment) could be skipped. Also, the number of modifications is used (in some versions) to inject some hints in the SQL statements of the replication steps, a topic that falls out of the scope of this post.

Core algorithm: the INS and DEL steps

Then, the core replication starts. The replication considers each master table in turn, and for each table, propagates the modifications to the MV. So we have essentially one single algorithm that propagates from a single master table, just repeated once for each master table.

The propagation for each master table is made of two simple steps, steps that I’m going to name after the comments of the SQL as a DEL (DELETE) step followed by an INS (INSERT) step.

The DEL step is (editing for readability: removing hints, unnecessary aliases, etc):
[sql]
/* MV_REFRESH (DEL) */
delete from test_mv
where test_t1_rowid in
(
select * from
(
select chartorowid (m_row$$)
from mlog$_test_t1
where snaptime$$ > :1
) as of snapshot (:2)
)
[/sql]
The subquery simply fetches the rowid of all marked rows, since :1 is the date of the previous refresh of the materialized view, and :2 is the SCN (coded as a RAW variable) of the time when the snapshot was performed. So, this step deletes from the MV all the rows that record the result of the MV-defining join of any of the marked rows (of the current master table) with the other master tables.

This is the step that can benefit from the index on the column that stores the master table rowid (here, test_t1_rowid) that the documentation suggests to create. Note that in order to optimize this step, you need three separate single-column indexes (here, on test_t1_rowid, test_t2_rowid, test_t3_rowid), not a single composite index spanning the (here, three) columns, as it is sometimes wrongly stated.

The INS step is (again editing for readability):
[sql]
/* MV_REFRESH (INS) */
insert into test_mv
select jv.j1_2, jv.x1, jv.pk1, jv.rid$,
mas2.j2_1, mas2.j2_3, mas2.x2, mas2.pk2, mas2.rowid,
mas3.j3_2, mas3.x3, mas3.pk3, mas3.rowid
from (
select log.rowid rid$, log.*
from test_t1 log
where rowid in
(
select chartorowid(log.m_row$$)
from mlog$_test_t1
where snaptime$$ > :1
)
) as of snapshot (:2) jv,
test_t2 as of snapshot (:2) mas2,
test_t3 as of snapshot (:2) mas3
where jv.j1_2 = mas2.j2_1
and mas2.j2_3 = mas3.j3_2
[/sql]
The subquery is the same as the DEL step and serves the very same function. So, this statement replays the SQL statement of the MV definition, but limited to the marked rows only. Note that all tables are read at the same point in time in the past, the time when the snapshot of the log was performed, thanks to the argument of the “as of snapshot” clause being the same.

In order to speed up the INS step, indexes on the joined columns can be created on the master tables (not on the MV!). This is because, special cases aside, it is well known that the “fast refresh” (the name itself is quite horrible, many people prefer the adjective “incremental”) can be actually “fast” only if a small fraction of the master tables is modified (otherwise, a complete refresh is better); in this scenario, almost certainly the optimal plan is a series of NESTED LOOPs that has the current table (test_t1 in this case) as the most outer table, series that can usually benefit a lot by an index on the inner tables join columns. Of course, you must remember that every table, in turn, acts as the most outer table, hence you should index every possible join permutation.

So here what the algorithm is all about: the DEL and INS steps, together, simply delete and recreate the “slice” of the MV that is referenced by the marked rows, whatever the modification type. The algorithm is as simple (and brutal) as it seems.

Algorithm optimizations

The only optimizations performed are the skipping of some steps when they are obviously unnecessary. For every master table, the DEL step is skipped when only INSERTs are marked in the logs; the INS is skipped when only DELETEs are marked, and of course both are skipped if no row is marked. I have not been able to spot any other optimization.

Note that this means that UPDATEs always turn into a delete+insert of the entire “slice”. For example, consider the typical case of a parent table (say, CUSTOMER), with a child (say, ORDER) and a grandchild (say, ORDER_LINE); if you update a column of a row of the parent (say, ORDERS_TOTAL_AMOUNT), the parent row and its whole progeny (the “slice”) will be deleted and then recreated. This was a quite surprising discovery for me – a fact that I have now committed to memory.

17 Responses to "fast refresh of join-only materialized views – algorithm summary"

Cristian says:

August 7, 2009 at 14:08

Very interesting post. I’ve read it and re-read and i’ve made same tests by myself. I’ve a question: why in your opinion whith simple mv’s (on a single table, without joins) Oracle does not use the same algorithm but makes updates as updates? (i’ve made a test on 10.2.0.4)

Reply
1. Alberto Dell'Era says:
  
  August 7, 2009 at 15:24
  
  @Cristian
  
  what do you mean by “simple MV”, with or without aggregates ?
  
  You might want to post the defining SQL statement of the MV …
  
  Reply
Cristian says:

August 7, 2009 at 16:12

I meant without aggregates:
[sql]
update testmv1 set fld1=’a2′ where pk1=1;
commit;
EXEC DBMS_MVIEW.REFRESH(‘TEST_MV1’);
SELECT * FROM TESTMV1_LOG;
[/sql]
This because we sometimes use MV’s as data replication tool.

Reply
1. Alberto Dell'Era says:
  
  August 7, 2009 at 16:16
  
  Ok, but what is the defining SQL statement of the MV TEST_MV1 ?
  
  Reply
Cristian says:

August 7, 2009 at 16:30

Sorry, the sun on my head 🙂
[sql]
create table testmv1 (pk1 number,fld1 varchar2(10));
create materialized view log on testmv1 with rowid;

create materialized view test_mv1
build immediate
refresh fast with rowid
on demand
as
select
testmv1.*
from testmv1;
[/sql]

I’ve not tried with primary key based refresh.

Reply
1. Alberto Dell'Era says:
  
  August 7, 2009 at 19:07
  
  I know a “rowid MV”, as your one is, cannot contain neither joins neither aggregates and must be based on a single table, so they probably use a different algorithm than the one examined in this post (which is for join-only MVs). They are in fact MVs used for replication almost exclusively, usually from a remote database to a local one through a dblink – the old name was “snapshots”, a technology “now merged” with the “materialized view” one. So it is outside the scope of this post, albeit I’ve added an investigation about this on my to-do list. It makes sense, anyway, that on a scenario as simple as this one, they have tried to optimize the propagation (hence using UPDATEs) as much as possible (especially to minimize the traffic over the dblink).
  
  Reply
2. Alberto Dell'Era says:
  
  August 11, 2009 at 17:58
  
  @Cristian
  
  I have posted an investigation about your scenario.
  
  Reply
Alberto Dell’Era’s Oracle blog » fast refresh of single-table materialized views - algorithm summary says:

August 11, 2009 at 17:57

[…] might be considered a degenerate case of a join-only MV, a topic that we investigated in an earlier post, and one could expect the same algorithm. But that is not the case: the test case shows that the […]

Reply
Blogroll Report 31/07/2009 – 07/08/2009 « Coskan’s Approach to Oracle says:

August 12, 2009 at 16:32

[…] Alberto Dell’Era- fast refresh of join-only materialized views -algorithm summary […]

Reply
Alberto Dell’Era’s Oracle blog » 11gR2: materialized view logs changes says:

November 3, 2009 at 19:20

[…] the refresh is performed by using a “mark-and-propagate” algorithm, which is essentially (check this post for some additional details): 1) new log rows are inserted with snaptime$$=4000 A.D; 2) at refresh […]

Reply
Alberto Dell’Era’s Oracle blog » fast refresh of join-only MVs: _mv_refresh_use_stats and locking log stats says:

March 11, 2010 at 22:46

[…] understand the importance of points 5 and 6, please check this post of mine; note how those indexes are a necessary prerequisite for the sanity of the DEL and INS steps of the […]

Reply
Shuai says:

November 1, 2011 at 17:19

I’m very glad to read your article about the algorithm of join-only MV fast refresh. It’s very useful. I just want to know more about this: how about a joined MV with aggregation such as sum or count? I found it’s very difficult to explain the aggregation MV’s behavior while doing fast refresh because there is no rowid column. Do you have any idea? Many thanks!

Reply
1. Alberto Dell'Era says:
  
  November 1, 2011 at 20:57
  
  Hi Shuai,
  
  I have never investigated that scenario – but you can do it yourself by modifying one of my test cases … just try to see the SQL statements that the refresh produces, that’s all it takes to get a rough model of the process 😉
  
  Reply
Alberto Dell’Era’s Oracle blog » fast refresh of outer-join-only materialized views – algorithm, part 1 says:

April 22, 2013 at 09:40

[…] only 11.2.0.3. We will use the very same scenario (MV log configuration, DML type, etc) as in the inner join case, "just" turning the inner join into an outer […]

Reply
Alberto Dell’Era’s Oracle blog » Fast refresh of aggregate-only materialized views – introduction says:

August 5, 2013 at 14:49

[…] to the MV by setting snaptime$$ equal to the current time – check the description contained in this post for details (note also another possible variant with "commit-scn mv logs"). MV log purging (at the […]

Reply
The mess that is fast-refresh join-only Materialized Views | OraStory says:

November 27, 2014 at 17:23

[…] http://www.adellera.it/blog/2009/08/04/fast-refresh-of-join-only-materialized-views-algorithm-summar… […]

Reply
MV Refresh | Oracle Scratchpad says:

March 26, 2015 at 13:27

[…] (Inner) Join View refresh (2009) […]

Reply

Leave a Reply Cancel reply