fast refresh of outer-join-only materialized views – algorithm, part 1

In this series of posts we will discuss how Oracle refreshes materialized views (MV) containing only OUTER joins, covering only 11.2.0.3. We will use the very same scenario (MV log configuration, DML type, etc) as in the inner join case, “just” turning the inner join into an outer join:

create materialized view test_mv

build immediate

refresh fast on demand

as

select test_outer.*, test_outer.rowid as test_outer_rowid,

test_inner.*, test_inner.rowid as test_inner_rowid

from test_outer, test_inner

where test_outer.jouter = test_inner.jinner(+)

;

For the outer case, the overall strategy is the same we already saw for inner joins: modifications from each master table are propagated separately to the MV, and by still performing, for each master table, the same two macro steps (first the DEL, and then the INS one).

Actually, propagation from the outer table is exactly the same (with the obvious slight difference of performing an outer join to recalculate the “slice” in the INS step), and hence we will discuss only the propagation from the inner table, which is considerably more complex.

There are actually two different propagation algorithms, one much simpler and less resource-intensive that requires a unique constraint on the joined columns (here, “jinner” alone); in this post I will discuss the details of this specialized algorithm only, leaving the details of the “general” other one for the next post.

Does the existence of a a unique constraint on the joined colum(s) enable such a dramatic simplification of the propagation that justifies a specialized algorithm? Yes, absolutely – and it is interesting to understand the reason since it comes out naturally from the very semantic of the outer join SQL construct, and hence we can also improve our understanding of this important component of the SQL syntax as a by-product.

Let’s start by remembering that every row of the outer table is always represented in the MV, possibly with all columns coming from the inner table set to null (including, most importantly, test_inner_rowid) if no match is found in the inner table. If M matches are found for a given outer row, M rows will be present in the MV; e.g. for M=2, we will see the following “outer slice” (my definition) corresponding to outer row ooo:

ooo inn1

ooo inn2

Now, if a unique constraint exists on the joined column(s), M can be at most 1, and hence only two possible states are possible for our outer slice:

(a) ooo inn1

(b) ooo *null*

Hence, if inn1 is marked in the log, propagating its deletion in the DEL step is just a matter of simply switching the slice from state (a) to (b) using an update statement, and conversely, propagating its insertion in the INS step is just a matter of updating the slice from state (b) to state (a). In other words, the possible matching rows of the outer table are already there, in the MV, and all we need to do is to “flip their state” if necessary. Thus it is possible to propagate using only quite efficient update statements – no delete or insert needs to be performed at all.

Now, consider how the absence of unique constraint adds additional complexity. In this case this scenario is possible:

ooo inn1

ooo inn2

if only one of (inn1, inn2) is marked in the log, the DEL step can simply delete only the corresponding row in the MV, but if both are marked, it must leave a single row with all the columns coming from the inner table set to null:

ooo *null*

conversely, the INS step must remember to remove the above row if it finds at least a match in the outer table.

In other words, the whole “outer slice” must be considered and examined by both steps; it is not enough to consider only marked rows “in isolation”, as it was the case in the inner join scenario and the constrained outer join scenario. This is considerably more complex, and in fact, the “general” algorithm was designed and implemented only in 10g – before 10g it was mandatory to have a unique constraint on the joined columns to enable fast refresh.

conventions and investigation scope

To reduce the visual clutter, instead of this log reading fragment (whose meaning we already discussed in the previous post)

(select rid$

from (select chartorowid(mas$.m_row$$) rid$

from mlog$_test_inner mas$

where mas$.snaptime$$ > :b_st0

)

) as of snapshot(:b_scn) mas$

I will use the following simplified notation

(select rid$ from mlog$_test_inner)

And of course, I will restructure and simplify the SQL statements to increase readability (the original statements are included in the test case of course).

Page 1 of 2 | Next page