I really like (besides the light and modern look) the aggregator of the OakTable members' blogs - a window on high-quality news and investigations about Oracle ...
Xtrace is a graphical tool that can navigate Oracle trace files, manipulate them, and optionally get them back as a text file. It actually makes (much) more, but in this first post we are going to focus on its basic browsing capabilities.
Let’s see the tool in action on the trace file produced by this simple PL/SQL block:
begin for r in (select * from t) loop null; end loop; end;
The resulting trace file is
WAIT #2: nam='SQL*Net message from client' ela= 61126 driver id=1413697536 #bytes=1 p3=0 obj#=76357 tim=5789636384898 ===================== PARSING IN CURSOR #26 len=66 dep=0 uid=73 oct=47 lid=73 tim=5789636385129 hv=3421439103 ad='aeb809c8' begin for r in (select * from t) loop null; end loop; end; END OF STMT PARSE #26:c=0,e=153,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=5789636385122 BINDS #26: ===================== PARSING IN CURSOR #28 len=15 dep=1 uid=73 oct=3 lid=73 tim=5789636386184 hv=1406298530 ad='a0503300' SELECT * FROM T END OF STMT PARSE #28:c=0,e=804,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=1,tim=5789636386181 BINDS #28: EXEC #28:c=0,e=64,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,tim=5789636386284 WAIT #28: nam='db file sequential read' ela= 19 file#=4 block#=59 blocks=1 obj#=76357 tim=5789636386383 WAIT #28: nam='db file sequential read' ela= 11 file#=4 block#=60 blocks=1 obj#=76357 tim=5789636386457 FETCH #28:c=0,e=243,p=2,cr=3,cu=0,mis=0,r=100,dep=1,og=1,tim=5789636386566 FETCH #28:c=0,e=54,p=0,cr=1,cu=0,mis=0,r=100,dep=1,og=1,tim=5789636386663 FETCH #28:c=0,e=3,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,tim=5789636386693 EXEC #26:c=0,e=1543,p=2,cr=4,cu=0,mis=0,r=1,dep=0,og=1,tim=5789636386746 WAIT #26: nam='SQL*Net message to client' ela= 2 driver id=1413697536 #bytes=1 p3=0 obj#=76357 tim=5789636387057 WAIT #26: nam='SQL*Net message from client' ela= 42743 driver id=1413697536 #bytes=1 p3=0 obj#=76357 tim=5789636429824 STAT #28 id=1 cnt=200 pid=0 pos=1 obj=76357 op='TABLE ACCESS FULL T (cr=4 pr=2 pw=0 time=363 us)'
Even for this artificially simple trace file, it takes a lot of effort to read and understand it; for example, it takes a while to associate the recursive SQL lines to the execution of the PL/SQL blocks (the “EXEC #26” line).
With Xtrace, the trace reading experience is remarkably much better:
Note the indentation by recursive level (which is provided out-of-the -box) and the color of the lines by statement (that takes perhaps a minute in order to be set up).
You can try this example live by pressing the “Launch” button above if you are interested; in particular, try the “Options” button of the middle pane, and the “set color” popup menus of the top pane.
Suggestion: you might even check the hyperlinks that links together the lines; for example, the xct pointer that links the SQL recursive calls to the parent “EXEC #26” (check the interactive manual for more information).
You can also get the trace back as a text file, if so desired:
000 line zero 001 xtrace: log file 'E:\localCVS30\TrilogyLectures\MioSitoWeb\xtrace\dist\xtrace.log' 002 VIRTUAL CALL #-4: 'null call - ignore this' 003 VIRTUAL CALL #-4: 'null call - ignore this' 004 +WAIT #2: nam='SQL*Net message from client' xe=ela=61126 p1='driver id'=1413697536 p2='#bytes'=1 p3=''=0 xphy=0 obj#=76357 tim=5789636384898 005 VIRTUAL CALL #-8: 'wait-for-client' 006 VIRTUAL CALL #-5: 'client-message-received' 007 ---------------------PARSING IN CURSOR #26: len=66 dep=0 uid=73 oct=47 lid=73 tim=5789636385129 hv=3421439103 ad='0eb809c8' begin for r in (select * from t) loop null; end loop; end; END OF STMT 008 PARSE #26: mis=0 r=0 dep=0 og=1 tim=5789636385122 e=153 c=0 p=0 cr=0 cu=0 009 BINDS #26: 010 ---------------------PARSING IN CURSOR #28: len=15 dep=1 uid=73 oct=3 lid=73 tim=5789636386184 hv=1406298530 ad='00503300' SELECT * FROM T END OF STMT 011 PARSE #28: mis=1 r=0 dep=1 og=1 tim=5789636386181 e=804 c=0 p=0 cr=0 cu=0 012 BINDS #28: 013 EXEC #28: mis=0 r=0 dep=1 og=1 tim=5789636386284 e=64 c=0 p=0 cr=0 cu=0 014 +WAIT #28: nam='db file sequential read' xe=ela=19 p1='file#'=4 p2='block#'=59 p3='blocks'=1 xphy=1 obj#=76357 tim=5789636386383 015 +WAIT #28: nam='db file sequential read' xe=ela=11 p1='file#'=4 p2='block#'=60 p3='blocks'=1 xphy=1 obj#=76357 tim=5789636386457 016 FETCH #28: mis=0 r=100 dep=1 og=1 tim=5789636386566 e=243 c=0 p=2 cr=3 cu=0 017 FETCH #28: mis=0 r=100 dep=1 og=1 tim=5789636386663 e=54 c=0 p=0 cr=1 cu=0 018 FETCH #28: mis=0 r=0 dep=1 og=1 tim=5789636386693 e=3 c=0 p=0 cr=0 cu=0 019 EXEC #26: mis=0 r=1 dep=0 og=1 tim=5789636386746 e=1543 c=0 p=2 cr=4 cu=0 020 -WAIT #26: nam='SQL*Net message to client' xe=ela=2 p1='driver id'=1413697536 p2='#bytes'=1 p3=''=0 xphy=0 obj#=76357 tim=5789636387057 021 +WAIT #26: nam='SQL*Net message from client' xe=ela=42743 p1='driver id'=1413697536 p2='#bytes'=1 p3=''=0 xphy=0 obj#=76357 tim=5789636429824 022 VIRTUAL CALL #-8: 'wait-for-client' 023 VIRTUAL CALL #-5: 'client-message-received' 024 STAT #28: id=1 pid=0 pos=1 obj=76357 op='TABLE ACCESS FULL T' cnt=200 avg(cnt)=200.0 card=n/a cr=4 avg(cr)=4.0 cost=n/a pr=2 pw=0 time=363 size=n/a xnexecs=1 xstatn=0 xplannum=0 025 026 VIRTUAL CALL #-4: 'null call - ignore this'
This can be obtained using the “save as text“ popup menu of the middle pane.
We are going to keep exploring Xtrace in the upcoming posts.
A devastating performance degradation of materialized view fast refreshes can happen in versions after 9i - and can be healed rather easily by simply setting the hidden parameter _mv_refresh_use_stats or, a bit surprisingly, by locking statistics on the logs. The problem can manifest at least in the currently-latest patchsets of 10g, 11gR1 and 11gR2 (10.2.0.4, 126.96.36.199 and 188.8.131.52), seems to hit a lot of people, and its root cause are the utilization of wrong hints by the Oracle refresh engine.
We will investigate the join-only MV case only, since this is the case I have investigated after a question by Christo Kutrovsky, factoring in some observations by Taral Desai and some Support notes; I have some clues that something similar may happen for other types of MVs.
The test case sets up this very common scenario for fast refreshes:
1 - two big base tables joined together by the MV;
2 - only a small fraction of rows modified (actually one deleted, two updated, one inserted);
3 - all tables and indexes with fresh statistics collected;
4 - MV logs with no statistic collected AND with not-locked statistics;
5 - indexes present on the joined columns;
6 - indexes present on the rowid columns of the MV.
Points 1 and 2 make for the ideal scenario for incremental ("fast") refreshes to be effective; 3 is very common as well, since you normally have many other statements issued on the tables; the relevance of 4 will be clear later, but it happens very often in real life, since people might perhaps consider collecting stats on the log, but locking their statistics is usually not made, at least in my experience.
To understand the importance of points 5 and 6, please check this post of mine; note how those indexes are a necessary prerequisite for the sanity of the DEL and INS steps of the MV process. Without them, the refresh cannot be incremental since it has no physical way to read and propagate only the modified rows and those related to them, but it must scan (uselessly) most of the base tables and MV. But in other for the refresh to be incremental ("fast"), those indexes have to be actually used...
Let's illustrate the issue focusing on the DEL step (the easier to discuss about). In the above mentioned post, we have seen that the DEL step uses a single SQL statement whose text, leaving out minor technical details and hints, is:
/* MV_REFRESH (DEL) */ delete from test_mv where test_t1_rowid in ( select * from ( select chartorowid (m_row$$) from mlog$_test_t1 where snaptime$$ > :1 ) as of snapshot (:2) )
In 184.108.40.206, we get this very healthy plan:
------------------------------------------------- |Id|Operation |Name | ------------------------------------------------- | 0|DELETE STATEMENT | | | 1| DELETE | | | 2| NESTED LOOPS | | | 3| VIEW | | | 4| SORT UNIQUE | | | 5| TABLE ACCESS FULL|MLOG$_TEST_T1 | | 6| INDEX RANGE SCAN |TEST_MV_TEST_T1_ROWID| -------------------------------------------------
That is: get the rowid of all modified rows from the log, and use the rowid-based index to delete the "old image" of them from the MV (inserting their "new image" is the job of the INS step). This is truly incremental, since the resource usage and elapsed time are proportional to the number of rows logged in the MV log, not to the dimension of the tables.
In 10.2.0.4, 220.127.116.11 and 18.104.22.168 the plan becomes:
------------------------------------------ |Id|Operation |Name | ------------------------------------------ | 0|DELETE STATEMENT | | | 1| DELETE |TEST_MV | | 2| HASH JOIN RIGHT SEMI | | | 3| TABLE ACCESS FULL |MLOG$_TEST_T1| | 4| MAT_VIEW ACCESS FULL|TEST_MV | ------------------------------------------
Oops, the indexes are not used ... hence the DEL step overhead is proportional to the size of the MV, and that can be definitely unacceptable.
That is due to the engine injecting an HASH_SJ hint in the outermost nested subquery:
... WHERE "TEST_T1_ROWID" IN (SELECT /*+ NO_MERGE HASH_SJ */ ...
This is recognized as a bug in many scenarios (start from Oracle Support note 578720.1 and follow the references to explore some of them) even if I have not found a clear and exhaustive note that documents the behaviour.
remedy one: set "_mv_refresh_use_stats"
To get back to the healthy plan, simply set "_mv_refresh_use_stats" to "true" (ask Oracle Support first of course for permission); this makes for a set of hint much more adequate for a fast refresh:
... WHERE "TEST_T1_ROWID" IN (SELECT /*+ NO_MERGE NO_SEMIJOIN */ ...
Note: The root cause for this bug is probably due to a change hinted in note 875532.1 - in 10.2.0.3 the meaning of _mv_refresh_use_stats was reversed, but not the default, hence (by mistake?) activating a different piece of the engine code.
The very same problem happens for the INS step; I won't go into much details here (please check the test case spools provided above if interested), but in 22.214.171.124 the base table modified rows are directly fetched using the rowid contained in the log:
----------------------------------------------------- |Id|Operation |Name | ----------------------------------------------------- | 0|INSERT STATEMENT | | | 1| TABLE ACCESS BY INDEX ROWID |TEST_T2 | | 2| NESTED LOOPS | | | 3| VIEW | | | 4| NESTED LOOPS | | | 5| VIEW | | | 6| SORT UNIQUE | | | 7| TABLE ACCESS FULL |MLOG$_TEST_T1 | | 8| TABLE ACCESS BY USER ROWID|TEST_T1 | | 9| INDEX RANGE SCAN |TEST_T2_J2_1_IDX| -----------------------------------------------------
Instead, in 10.2.0.4, 126.96.36.199 and 188.8.131.52 we get the following plan:
-------------------------------------------------- |Id|Operation |Name | -------------------------------------------------- | 0|INSERT STATEMENT | | | 1| TABLE ACCESS BY INDEX ROWID|TEST_T2 | | 2| NESTED LOOPS | | | 3| VIEW | | | 4| HASH JOIN RIGHT SEMI | | | 5| TABLE ACCESS FULL |MLOG$_TEST_T1 | | 6| TABLE ACCESS FULL |TEST_T1 | | 7| INDEX RANGE SCAN |TEST_T2_J2_1_IDX| --------------------------------------------------
Whose resource consumption is, of course, proportional to the size of the base table.
Even in this case, this is due to the nasty HASH_SJ hint:
... FROM "TEST_T1" "MAS$" WHERE ROWID IN (SELECT /*+ HASH_SJ */ ...
If you set _mv_refresh_use_stats, you get back the 184.108.40.206 plan - and thus you are back to incremental for both the DEL and INS steps. As a side note, a cardinality hint is used, where the cardinality is set to the correct value (6 in my test case):
... FROM "TEST_T1" "MAS$" WHERE ROWID IN (SELECT /*+ CARDINALITY(MAS$ 6) NO_SEMIJOIN ...
remedy two: collect and lock statistics on the logs
Very interestingly, instead of setting the hidden parameter, you have another way to get back to the healthy plan: gather statistics on the MV logs when they are empty AND lock them (as suggested in note 578720.1, albeit not in this scenario and even if setting the parameter is not necessary; thanks to Taral Desai for pointing me to the note). In this case, no hint at all is injected beside a NO_MERGE for the DEL step:
... WHERE "TEST_T1_ROWID" IN (SELECT /*+ NO_MERGE */ ... ... FROM "TEST_T1" "MAS$" WHERE ROWID IN (SELECT ...
So, the engine is confident that the CBO will come out with a good plan, and it does not inject any "intelligent" hint. Possibly, and intriguing, this is because by locking the statistics, I am assuring the engine that these statistics are representative of the data anytime. So, locking the statistics is not meant only as a way to prevent dbms_stats from changing them ... it is deeper than that. At least in this case, you are taking responsibility for them, and Oracle will take that in consideration.
As a minor but useful new feature, xplan is now able to integrate into its report the DDL of tables (and indexes) referenced by the plan, calling dbms_metadata.get_ddl transparently.
This is mostly useful to get more details about referenced tables' constraints and partitions definition - to complement their CBO-related statistics that xplan reports about.
This feature can be activated by specifing dbms_metadata=y or dbms_metadata=all (check xplan.sql header of xplan.sql for more informations).
We spoke about xplan in general here.
For people that have asked - I'm not blogging simply because I'm feverishly working on a tool of mine that I'm very fond of :)
Guess what it does from the following next two screenshots ...
This post investigates the improvements that have been made in 11gR2 to the fast refresh engine of materialized views (MVs) that are set to be automatically refreshed at commit time. We speak about join-only materialized views only in this post, as always with the help of a test case.
As noted in the post of mine "11gR2: materialized view logs changes", in 11gR2 a new column, xid$$, is now part of materialized view logs; this column records the id of the transaction that logged the changes of the base table which the log is defined on. It is important to stress that this column is added regardless of the type of the MV log, that is, to both the brand-new "commit SCN-based" logs and the old fashioned "timestamp-based" ones. That means that both types of MV logs can take advantage of the new improvements - albeit I haven't tested whether MVs (logs) migrated from a previous version are automatically upgraded by the migration scripts and get the new xid$$ column added.
algorithm before 11gR2
In versions before 11gR2, the refresh algorithm for on-commit MVs was the same as the one for on-demand ones, with only minor variants. That is, the algorithm was almost completely the same, just triggered by the commit event instead of by the user.
For an in-depth analysis of the algorithm, I will refer the reader to the discussion about the on-demand algorithm in the post "fast refresh of join-only materialized views - algorithm summary"; in passing, the test case for this post is in fact the very same three-table join MV, just redefined as "on commit" instead of "on demand". To recap, the "old" algorithm (until 220.127.116.11) was:
1) new log rows are inserted with snaptime$$=4000 A.D;
2) at refresh time (commit time), a snapshot of the new rows is taken, that is, all new rows are marked with snaptime$$= "commit time", using the statement
update MLOG$_TEST_T1 set snaptime$$ = :1 where snaptime$$ > to_date('2100-01-01:00:00:00','YYYY-MM-DD:HH24:MI:SS')
3) all modifications whose snaptime$$ is between the date of the last refresh (excluded) and the commit date(included) are propagated to the MV. The propagation consists of two steps.
First a DEL step:
/* MV_REFRESH (DEL) */ delete from test_mv where test_t1_rowid in ( select * from ( select chartorowid (m_row$$) from mlog$_test_t1 where snaptime$$ > :1 ) -- no "as of snapshot (:2)" clause )
Then an INS one:
/* MV_REFRESH (INS) */ insert into test_mv select jv.j1_2, jv.x1, jv.pk1, jv.rid$, mas2.j2_1, mas2.j2_3, mas2.x2, mas2.pk2, mas2.rowid, mas3.j3_2, mas3.x3, mas3.pk3, mas3.rowid from ( select log.rowid rid$, log.* from test_t1 log where rowid in ( select chartorowid(log.m_row$$) from mlog$_test_t1 where snaptime$$ > :1 ) ) jv, -- no "as of snapshot (:2) jv" clause test_t2 as of snapshot (:2) mas2, test_t3 as of snapshot (:2) mas3 where jv.j1_2 = mas2.j2_1 and mas2.j2_3 = mas3.j3_2
Note that the only small difference from the on-demand case is the absence of the "as of snapshot" clause, but the statements are otherwise identical. Note also that the rows in the MV log are identified in both statements by snaptime, using the subquery
select chartorowid(log.m_row$$) from mlog$_test_t1 where snaptime$$ > :1
4) all obsolete log rows are deleted, that is, all rows whose snaptime$$ is less than or equal the lowest of all refresh times are removed from the log, using the the statement
delete from mlog$_test_t1 where snaptime$$ <= :1
algorithm starting from 11gR2
In 11gR2, the on-commit algorithm is still almost the same as the on-demand one; the "only" change is how modified rows to be propagated are identified, and in general, how logs are managed. Not surprisingly, log rows are now directly identified by the transaction id, which is logged in xid$$. In detail:
1) new log rows are inserted with xid$$ = transaction id;
2) at refresh time (commit time), no snapshot is taken, that is, the MV log is not updated at all;
3) all modifications made by the committing transaction are propagated to the MV, still using the same two steps.
The DEL step is now:
/* MV_REFRESH (DEL) */ delete from test_mv where test_t1_rowid in ( select * from ( select chartorowid (m_row$$) from mlog$_test_t1 where xid$$ = :1 ) )
The INS one is:
/* MV_REFRESH (INS) */ insert into test_mv select jv.j1_2, jv.x1, jv.pk1, jv.rid$, mas2.j2_1, mas2.j2_3, mas2.x2, mas2.pk2, mas2.rowid, mas3.j3_2, mas3.x3, mas3.pk3, mas3.rowid from ( select log.rowid rid$, log.* from test_t1 log where rowid in ( select chartorowid(log.m_row$$) from mlog$_test_t1 where xid$$ = :1 ) ) jv, -- no "as of snapshot (:2) jv" clause test_t2 as of snapshot (:2) mas2, test_t3 as of snapshot (:2) mas3 where jv.j1_2 = mas2.j2_1 and mas2.j2_3 = mas3.j3_2
Hence, the big difference from the previous versions case is that rows in the MV log are identified very simply by the transaction that logged them (the committing transaction, of course), by the subquery
select chartorowid(log.m_row$$) from mlog$_test_t1 where xid$$ = :1
4) all obsolete log rows are deleted, that is, the rows logged by the committing transaction are removed, using the the statement
delete from mlog$_test_t1 where where xid$$ = :1
The new algorithm is for sure much simpler and more elegant. Performance is improved since the snapshot step has been removed, and the other steps are more or less as expensive as before.
practical implications: an example
I strongly believe that studying the internals is the best way to learn how to make the best use of any feature. Let's see an example of how the few bits of "internal knowledge" I shared here can be used in practice - that is, how a little investment in investigation makes for huge savings in effort afterwards, and huge gains in effectiveness of your work as well.
It is well-known that it can be sometimes beneficial, in pre-11gR2, to place an index on the log (indexing the log is even suggested by support note 258252 "MATERIALIZED VIEW REFRESH: Locking, Performance, Monitoring"). The scenario that benefits the most from such an index is when the log is composed of mostly-empty blocks, and hence an index access is preferable over a full table(log) scan; you get mostly-empty blocks, for example, when there are peeks in activity on the master tables that keep the log High Water Mark very high.
From the above discussion, it is obvious that in pre-11gR2, the best index for join-only MVs was on (snaptime$$, m_row$$) - not on snaptime$$ alone as it is sometimes suggested - to make the refresh operation an index-only one.
Starting from 11gR2, the best index is now on (xid$$, m_row$$). Not only that, but having no snapshot step, and hence no update on the index, makes the indexing option even more attractive.
Could you see these implications so easily, without knowing the internals? I don't.
In this post we are going to discuss some 11gR2 changes to materialized view logs that are aimed at increasing the performance of the fast-refresh engine of materialized views (MVs), especially the on-commit variant.
The MV logs, in 10gr2, now comes in two flavours: the traditional (and still the default) timestamp-based one and the brand new commit SCN-based one; you choose the latter type by specifing the "WITH COMMIT SCN" clause at MV log creation time. Interestingly, the "old" timestamp-based implementation has been changed as well. Let's examine both with the help, as usual, of a test case.
Timestamp-based MV logs (the "old" type)
The test case configures an MV log as "log everything", that is, it activates all the logging options:
create materialized view log on test_t1 with sequence, rowid, primary key (x1) including new values;
In pre-11gR2 (e.g. in 18.104.22.168, 10.2.0.4), the MV log columns were:
pk1 number(22) x1 number(22) m_row$$ varchar2(255) sequence$$ number(22) snaptime$$ date(7) dmltype$$ varchar2(1) old_new$$ varchar2(1) change_vector$$ raw(255)
now in 11gR2 (10.2.0.1):
pk1 number(22) x1 number(22) m_row$$ varchar2(255) sequence$$ number(22) snaptime$$ date(7) dmltype$$ varchar2(1) old_new$$ varchar2(1) change_vector$$ raw(255) xid$$ number(22)
the only difference is the new column xid$$ (transaction id) that uniquely identifies the transaction that made the changes to the row. For the curious, the number is a combination of the elements of the triplet (undo segment number, undo slot, undo sequence); it is simply the binary concatenation of the three numbers shifted by (48, 32, 0) bits respectively (as checked in the script).
The xid$$ column is used by the 11gR2 on-commit fast refresh engine, which can now easily retrieve the changes made by the just-committed transaction by its xid; at the opposite, the on-demand fast refresh one keeps using snaptime$$ as it did in previous versions. I will speak about this in more detail in an upcoming post.
Commit SCN-based MV logs (the "new" type in 11gR2)
Let's recreate the same MV log, this time adding the commit SCN clause (new in 11GR2):
create materialized view log on test_t1 with sequence, rowid, primary key (x1), COMMIT SCN including new values;
The columns of the MV log are:
pk1 number(22) x1 number(22) m_row$$ varchar2(255) sequence$$ number(22) dmltype$$ varchar2(1) old_new$$ varchar2(1) change_vector$$ raw(255) xid$$ number(22)
so, the only difference from the 11gR2 timestamp-based case is that snaptime$$ is no longer a column of the MV log; the only difference from the pre-11gR2 is that snaptime$$ has been replaced with xid$$.
For this log flavour only, the mapping between the xid that modified the table and its commit-time SCN is now tracked in a new view, all_summap (probably named after "SUMmary MAP", "summary" being yet another synonym for "MV"), which is (as of 22.214.171.124) a straight "select *" of the dictionary table sys.snap_xcmt$. To illustrate, the script makes one insert, one update and one delete on the base table, which translates into 4 rows inside the MV log with the same xid:
SQL> select distinct xid$$ from mlog$_test_t1; XID$$ --------------------- 1126024460895690
after the commit, we get
SQL> select * from all_summap where xid in (select xid$$ from mlog$_test_t1); XID COMMIT_SCN --------------------- ---------- 1126024460895690 2885433
hence, it is now possible to know the infinite-precision time (the SCN) when every modification became visible to an external observer (the commit SCN) by simply joining the MV log and all_summap (or sys.snap_xcmt$). Note that the commit SCN is not propagated to the MV log at all.
commit SCN-based MV logs for on-demand fast refresh
This new xid$$ column and commit-SCN mapping table are leveraged by the fast refresh of on-demand MVs as follows (on-commit ones do not need the SCN as they know exactly the xid of the committed transaction; again we will see that in an upcoming post).
With "old style" timestamp-based MV logs, the refresh is performed by using a "mark-and-propagate" algorithm, which is essentially (check this post for some additional details):
1) new log rows are inserted with snaptime$$=4000 A.D;
2) at refresh time, a snapshot of the new rows is taken, that is, all new rows are marked with snaptime$$=sysdate;
3) all modifications whose snaptime$$ is between the date of the last refresh (excluded) and sysdate(included) are propagated to the MV;
4) all obsolete log rows are deleted, that is, all rows whose snaptime$$ is less than or equal the lowest of all refresh times are removed from the log.
With "new style" SCN-based MV logs, the algorithm is, instead:
1) new log rows are inserted with xid$$=transaction id of modifing transaction;
2) at refresh time, the current SCN is retrieved (no snapshot is performed);
3) all modifications whose xid maps to a row in all_summap whose commit_scn is between the SCN of the last refresh (excluded) and the retrieved current SCN(included) are propagated to the MV;
4) obsolete rows are removed from the log as before, this time using the SCN instead of snaptime$$.
The main advantage is that the snapshot is not performed, thus removing the redo and undo generated by the update, and obviously the log visit (usually a full table scan) as well - at the cost of an additional join with all_summap (or sys.snap_xcmt$) later; if the join is calculated efficiently, that is very likely advantageous "in general" (but as always, it depends on your scenario).
It might be (rarely) beneficial to index xid$$, as it is (rarely) beneficial to index snaptime$$. In that case, having no snapshot performed reduces both the undo and redo generated for the index maintenance.
As a side and "philosophical" note, it is also worth noting that the new logging mechanism records more information - now we know which transactions modified the table and the infinite-precision time (the SCN) of modifications, and this is much more informative about the history of the logged table than the mostly meaningless refresh time contained in snaptime$$. This is definitely a better utilization of storage.
I plan to blog about how the new MV log impact fast refreshes in 11gR2 in the near future, focusing on join-only MVs; so stay tuned if you're interested.
As we have seen in the previous posts of this series, in 11g a new figure named "NewDensity" has been introduced as a replacement for the "density" column statistic for columns whose histogram has been collected; this change has been backported in 10.2.0.4 also.
In the previous post we discussed how NewDensity influences the CBO cardinality estimate for Height-Balanced histograms; in this one we are going to investigate the same for Frequency Histograms. We will see that the most important change is the introduction of the “half the least popular" rule (see the "Histogram change" post by Jonathan Lewis, which distills the findings of Randolf Geist and Riyaj Shamsudeen) - a surprising rule that might easily cause trouble (in fact as Jonathan reports in the comments - bug 6082745 was opened against this rule).
The test case (script density_post_freq.sql) considers the same test statement we focused on in the previous post (a single equality filter predicate which asks for a value inside the min-max range of the column):
select ... from t where value = 64.5;
Of course we compute a Frequency instead of an Height-Balanced histogram, and use a slightly different value distribution in order to highlight the new rule:
SQL> select value, count(*) 2 from t 3 group by value 4 order by value; VALUE COUNT(*) ---------- ---------- 8 8 16 16 64 64 128 128
The histogram generated by the test case is (from DBA_HISTOGRAMS):
VALUE EP BKT ---------- ---------- ---------- 8 8 8 16 24 16 64 88 64 128 216 128
VALUE is an abbreviation for ENDPOINT_VALUE, EP for ENDPOINT_NUMBER.
BKT is the number of buckets covered by the value (i.e.: EP minus the previous EP), that is, the number of rows whose column value was equal to VALUE at statistics collection time.
When the filter predicate selects a value contained in the histogram, the new releases behave the same as the old ones (but check the "bottom note about singleton values" at the bottom for a a minor but interesting detail): neither density nor NewDensity is used, and the cardinality estimate is the usual intuitive one. In the complementary case of a value not contained in the histogram (but still inside the min-max interval), the cardinality used to be calculated as density*num_rows and it is now NewDensity*num_rows. Note the simmetry with the Height-Balanced case: the formula is the same, with NewDensity simply replacing density.
NewDensity with the “half the least popular" rule active
By default the rule is active, and in this case, NewDensity is set to
NewDensity = 0.5 * bkt(least_popular_value) / num_rows
and hence, for non-existent values:
E[card] = (0.5 * bkt(least_popular_value) / num_rows) * num_rows = 0.5 * bkt(least_popular_value)
For our test case, the least_popular_value is 8 and bkt(8) = 8, hence E[card] = 0.5 * 8 = 4 thanks to NewDensity being equal to 0.5 * 8 / 216 = 0.018518519. In fact, we can verify in the 10053 traces (in 10.2.0.4, 126.96.36.199, 188.8.131.52) for our statement, that asks for a not-existent value (64.5), that E[card] and NewDensity are set as above:
NewDensity:0.018519, OldDensity:0.002315 BktCnt:216, PopBktCnt:216, PopValCnt:4, NDV:4 Using density: 0.018519 of col #1 as selectivity of unpopular value pred Table: T Alias: NOT_EXISTENT Card: Original: 216.000000 Rounded: 4 Computed: 4.00 Non Adjusted: 4.00
As another check, let's see what happens if bkt(least_popular_value) = 1, that is, if there is (at least) one value that occurred exactly one time (a singleton value) at statistics collection time. Adding such a row to our test case is trivial (just uncomment the first insert row in the script); in this scenario, our formula above predicts E[card] = 0.5 with NewDensity = 0.5 / 217 = .002304147, and in fact (check the *_least_is_one.trc traces):
NewDensity:0.002304, OldDensity:0.002304 BktCnt:217, PopBktCnt:216, PopValCnt:4, NDV:5 Using density: 0.002304 of col #1 as selectivity of unpopular value pred Table: T Alias: NOT_EXISTENT Card: Original: 217.000000 Rounded: 1 Computed: 0.50 Non Adjusted: 0.50
note that E[card] gets rounded up from 0.5 to 1 (as usual).
What is the rationale behind this rule? Thanks to Randolf Geist (see the comment in Jonathan's blog entry above), we know that it was introduced as a patch to solve one particular scenario (see bug 5483301) and then included in the main release, for some reason. Luckily, the rule can be disabled and the old sane behaviour can be restored.
NewDensity with the “half the least popular" rule disabled
To disable the new rule, just switch off the patch 5483301:
alter session set "_fix_control"='5483301:off';
(or alter system if you want to make it permanent)
with this setting, NewDensity becomes simply
NewDensity = 0.5 / num_rows
and hence, again for non-existent values:
E[card] = 0.5
which is exactly what we got in pre-10.2.0.4, where density was used (and density was, and is still, set to 0.5 / num_rows by dbms_stats). So the cardinality estimate is 0.5 (rounded up to 1).
For our test case, we predict NewDensity = 0.5 / 216 = 0.002314815. In fact our 10053 traces tell us:
NewDensity:0.002315, OldDensity:0.002315 BktCnt:216, PopBktCnt:216, PopValCnt:4, NDV:4 Table: T Alias: NOT_EXISTENT_OFF Card: Original: 216.000000 Rounded: 1 Computed: 0.50 Non Adjusted: 0.50
The rationale for this behaviour is sound; the CBO knows that no row with the requested value existed at statistics collection time, hence it returns the minimal cardinality estimate compatible with the non empty result set assumption (check this post for the importance of this assumption). If the statistics are reasonably fresh, this is the only sane estimate that can be made.
Playing with density - a warning
If you set your own column stats using dbms_stats.set_column_stats, the behaviour is different; I haven't made any extensive investigations but as far as I can tell, the value you provide for density is used instead of NewDensity. User-provided column statistics are flagged with dba_tab_cols.user_stats = 'YES'. You can disguise your user statistics as non-user by setting the flags parameter of dbms_stats.set_column_stats to 2 - but since the latter parameter is labeled as "for Oracle internal use only", I would do it only for investigations purposes - that is, never in production.
Bottom note about singleton values: actually in pre-10.2.0.4 versions, if the value was present in the Frequency histogram but covering a single bucket (hence it was present in the table exactly one time at statistic collection time), it used to be classified as "unpopular" and hence used to get the same treatment as a value not in the histogram - the end result being that the cardinality was estimated as 0.5 rounded up to 1; now it is 1 before rounding as one would intuitively expects. I hope to be able to investigate whether this change fixes the issues about join cardinality estimation I investigated - see "the mystery of halving" in this investigation of mine if interested.
In this post we are going to explore and explain the rationale for the formula used by the CBO to compute the "NewDensity" figure that replaces, from 10.2.0.4 onwards, the "density" column statistic in the cardinality estimation formulae for columns with height-balanced (HB) histograms defined.
In a previous post, we already discussed the pre-10.2.0.4 scenario: we saw how and when the "density" column statistic is used in the cardinality formula for equality filter predicates, we explained its statistical rationale and defining formula, introduced the concept of the NPS (Not Popular Subtable), and built a test case. Now we are going to use the very same test case and explain the differences in the most recent versions (the previous post zip file contains logs for them also).
To summarize the test case - we have a table T with a single column VALUE, exponentially distributed, and with a SIZE 5 Height-Balanced histogram collected on. The histogram is:
SQL> select ep, value, popularity from formatted_hist; EP VALUE POPULARITY ---------- ---------- ---------- 0 1 0 1 16 0 5 64 1
Thus, we have a single popular value, 64; all the others are unpopular.
In this "densities" series of post, we focus on a SQL statement that contains only an equality filter predicate on table T:
select ... from t where value = 2.4;
the literal value is not a popular value (but inside the 1-64 interval) and hence, in pre-10.2.0.4, the formula used for the expected cardinality calculation is equal to:
E[card] = density * num_rows;
We discussed, in the previous post, how density is carefully calculated by dbms_stats to get back the expected cardinality of the family (class) of all possible equality filter predicate statements that hit the NPS, under the usual "non-empty result set assumption" and the further (strange and strong) assumption that the more a value is represented in the NPS, the higher the probability that the value is used as the literal of the equality predicate (an assumption that mathematically translates into the formula "w(:x) = count(:x) / num_rows_nps").
Switching to 10.2.0.4 - the formula for E[card] is still the same, but with "density" replaced by "NewDensity" (as hinted by the fact that "density" is reported as "OldDensity" in the 10053 trace files, as we are going to see in a moment):
E[card] = NewDensity * num_rows;
NewDensity is not stored anywhere in the data dictionary, but it is computed at query optimization time by the CBO (note that density is still computed by dbms_stats using the old formula, but then it is ignored by the CBO). The NewDensity formula is based mainly on some histogram-derived figures; using the same names found in 10053 traces:
NewDensity = [(BktCnt - PopBktCnt) / BktCnt] / (NDV - PopValCnt)
Where BktCnt ("Bucket Count") is the number of buckets (the "N" in the "SIZE N" clause);
PopBktCnt ("Popular Bucket Count") the number of buckets covered by the popular values;
PopValCnt ("Popular Value Count") is the number of popular values; NDV ("Number of Distinct Values") is the traditional name used by CBO developers for the num_distinct column statistic. With the exception of NDV, all these values are derived from the histogram.
Side note: if the numerator is equal to zero, NewDensity is set to 0.5 / num_rows, thus giving an E[card] = 0.5, as far as I have seen (not exaustively) in a few test cases; it looks like a lower-bound "sanity check". The denominator cannot be zero for HB histograms.
To illustrate the formula: the histogram of our test case has 5 buckets, hence BktCnt=5; 64 is the only popular value, hence PopValCnt =1; this popular value covers 4 buckets (since its EP is 5 and the previous EP is 1), hence PopBktCnt=4; we know that the column has num_distinct=6, hence NDV=6. This is in fact what we see in the 10053 trace file (in 184.108.40.206 and 220.127.116.11):
SINGLE TABLE ACCESS PATH Single Table Cardinality Estimation for T[T] Column (#1): NewDensity:0.040000, OldDensity:0.115789 BktCnt:5, PopBktCnt:4, PopValCnt:1, NDV:6 Using density: 0.040000 of col #1 as selectivity of unpopular value pred Table: T Alias: T Card: Original: 95.000000 Rounded: 4 Computed: 3.80 Non Adjusted: 3.80
So NewDensity = [(5-4)/5] / (6-1) = 1/25 = 0.04 and E[card]=0.04*95=3.8, which is exactly what we see in the above trace fragment.
The formula is statistically based on replacing the previous versions' assumption (that we labeled "strange and strong") about w(:x) with the standard assumption that the client will ask for the values in the NPS with the same probability; mathematically, that means replacing the formula "w(:x) = count(:x) / num_rows_nps" with the standard "w(:x) = 1 / num_distinct_nps" (where num_distinct_nps is of course the number of distinct values of the NPS). If you plug this shape of w(:x) into the formula for E[card], you get
E[card] = sum ( w(:x) * E[count(:x)] ) = = sum (E[count(:x)] ) / num_distinct_nps for all values of :x (belonging to the NPS)
E[card] = num_rows_nps / num_distinct_nps
which is, not surprising, the standard formula used for columns without histograms, but applied to the NPS, not the whole table.
One possibility for producing the above E[card] value at run-time could have been to change dbms_stats to compute a value for "density" equal to (num_rows_nps / num_distinct_nps) / num_rows; but forcing users to recompute statistics for all their tables in their upgraded databases is not really a viable option. Hence, the CBO designers chose to simply ignore "density" and calculate the above formula at run-time, mining the histogram, at the cost of reduced precision. In fact, the easy part is num_distinct_nps, which is obviously exactly equal to num_distinct minus the number of popular values; but num_rows_nps can only calculated approximately, since the histogram is a (deterministic) sample of the column values obtained by first sorting the column values and then sampling on a uniform grid (for more information and illustration, see the first part of this article of mine). Using the histogram, the best approximation for num_rows_nps is num_rows times the fraction of buckets not covered by popular values. Hence, using the 10053 terminology
num_distinct_nps = NDV - PopValCnt (exactly) num_rows_nps = [(BktCnt - PopBktCnt) / BktCnt] * num_rows (approximately)
which gets back (again, approximately) the E[card] formula above, as can be trivially checked.
It might be desirable that one day, NewDensity gets calculated exactly by dbms_stats and stored in the data dictionary, at least for columns with new statistics, albeit the precision reduction is probably more than acceptable (that is, I have never seen a case where that has been an issue). The test case script, just for the sake of completeness, calculates the exact figure as well; it gets back an E[card] of 6.2 instead of 3.8.
For a summary of the above discussion and some more discussion, check back this investigation of mine. By the way, NewDensity replaces "density" also in join cardinality formulae, even if I have not run a complete investigation - but that is not surprising at all.
As a final nore - NewDensity is used also for Frequency Histograms, and in a very creative way; we will discuss this in part IV of this series.
In this post we are going to explore and explain the rationale for the formula used by dbms_stats to compute the "density" column statistic, used by the CBO in versions less than 10.2.0.4 to estimate the cardinality of a class of SQL statements. In the next post, we will speak about its replacement, named "NewDensity" in 10053 trace files.
We will consider only the non-trivial case of Height-Balanced histograms, since for Frequency Histograms density is a constant (0.5 / num_rows) and for columns without histogram, it is simply 1/num_distinct.
Let's illustrate the test case on which we will base our discussion, contained in this zip file.
First, a table T is created with the following exponential value distribution:
SQL> select value, count(*) 2 from t 3 group by value 4 order by value; VALUE COUNT(*) ---------- ---------- 1 1 2 2 4 4 8 8 16 16 64 64
The test case then computes a SIZE 5 Height-Balanced histogram. The resulting histogram (from dba_histograms) is as follows (note that I have added the column POPULARITY that marks popular values with "1"; EP is shorthand for column ENDPOINT_NUMBER, VALUE for column ENDPOINT_VALUE):
SQL> select ep, value, popularity from formatted_hist; EP VALUE POPULARITY ---------- ---------- ---------- 0 1 0 1 16 0 5 64 1
The test case then issues this SQL statement that contains only an equality filter predicate on table T:
select ... from t where value = 2.4;
The literal value 2.4 is not contained in the table (and hence in the histogram), in order to make the CBO factor in "density" in its estimate of the expected cardinality - in fact, as it might be known, density is used when the literal is not popular (that is, not equal to 64 in our case), and it doesn't matter whether the literal is not contained in the histogram, or contained as an unpopular value (1 and 16 in our case), or even contained in the table or not. All it takes is its being not popular.
Side note: I'm assuming the literal is inside the closed min-max interval (1-64 in this case); when outside, it depends on the version.
When the literal is not popular, the formula used for the expected cardinality calculation is equal to
E[card] = density * num_rows;
That is easy to verify from the test case logs; in 9i we can see that density = 0.115789474 and num_rows=95, hence 0.115789474 * 95 = 11.000000000 which is exactly equal to the CBO estimate for our statement.
The formula used by dbms_stats to compute "density" was published in Jonathan Lewis' book Cost Based Oracle (page 172) and Wolfgang Breitling's presentation Histograms - Myths and Facts. The key fact is that the formula takes as input the rows of what I've nicknamed the not-popular subtable (NPS), that is, the original table without the rows whose values are popular values (in this case, 64 is the only popular value). Letting num_rows_nps the number of rows of the NPS (for our example, num_rows_nps=1+2+4+8+16=31), we have:
density = (1 / num_rows) * sum (count (value) ^ 2) / num_rows_nps for "value" belonging to the NPS
The script performs this calculation automatically; it is anyway instructive to perform the calculation manually at least one time:
density = (1/95) * (1*1+2*2+4*4+8*8+16*16) / 31 = .115789474
that matches perfectly the density we observed in the script log before.
What is the statistical rationale for this seemingly strange computation?
If we plug it inside the formula for E[card], we can see that num_rows is cancelled:
E[card] = sum (count (value) ^ 2) / num_rows_nps summed over all "values" belonging to the NPS
Now we must reference the statistical concepts introduced in this post, and consider the family of all statements of our kind that can reference the NPS:
select ... from t where x = :x; :x being a value belonging to the NPS
its E[card] is
E[card] = sum ( w(:x) * E[count(:x)] ) for all values of :x (belonging to the NPS)
dbms_stats takes count(:x) as the best estimate for E[count(:x)] (for example, E[count(4)] = count(4) = 4 in our case). All we have to do in order to obtain the observed formula, is to assume w(:x) = count(:x) / num_rows_nps:
E[card] = sum ( (count(:x) / num_rows_nps) * count(:x) ) = sum ( count(:x) ^ 2 ) / num_rows_nps for all values of :x (belonging to the NPS)
The meaning of the above particular shape of w(:x) is that the probability that the client submits a certain value for :x is proportional to the number of rows (in the NPS) that has that value; more precisely, that if X% of rows has a certain common value, X% of user-submitted statements that "hit" the NPS will ask for that value. Under this assumption, dbms_stats precomputes "density" to give back the above E[card] when the literal is known to be not popular, hence hitting the NPS - remember that the CBO operates under the "non-empty result set assumption", hence if the literal does not hit a popular value, it must hit a value of the NPS.
The above assumption for w(:x) is quite a strange assumption - and in fact, we will see in the next post that in 11g (and 10.2.0.4), this assumption has been dropped and replaced with a more standard one. The "density" column statistics is in fact ignored in 10.2.0.4+ and a value computed at run-time, named "newDensity" in 10053 trace files, is used instead.