Xtrace: an Oracle session trace browser – exec flow
Tracing a session is extremely useful when you need to investigate how a client interacts with the database - the client could be an application of yours, a third-party application, or an Oracle module such as dbms_stats or dbms_mview. To get the perfect picture of the client-server dialogue, you "simply" need to consider all EXEC lines in the trace file, and associate to each line the executed statement and the bind variable values; a very tedious and error-prone task when done manually, that Xtrace can make for you (and for free).
Let's see the tool in action. Consider tracing a call to this stored procedure, that executes a recursive SQL statement :
create or replace procedure test_proc( p_x int )
is
begin
for i in 1..p_x loop
for k in (select count(*) from t where x > i) loop
null;
end loop;
end loop;
end;
Here is the output of Xtrace:

Reading it bottom-up, you can see that the client called the SP, which in turn executed recursively (note the indentation) the SQL statement twice.
You can also ask Xtrace to display the bind variable values used for each execution:

So - the client passed the value "2" for :p_x to the SP, which in turn executed the SQL statement first passing "1" for :B1, and then passing "2".
Interested ? Try it live (requires Java Web Start):
When Xtrace opens up, press the "options" button and then the "EXEC FLOW analysis" button. Enable/disable the bind variable values using the "display BINDS under EXEC" checkbox; color the statements as you like.
We introduced Xtrace in this post; the Xtrace home page contains the tool (which can be used online or downloaded) - and a manual for advanced uses.
- Monday, May 17, 2010 xtrace
- Comment now »
An improved OakTable web site
Today the new OakTable web site, www.oaktable.net, has been published: many thanks to Kurt Van Meerbeeck (that I'm told worked the most on the site), James Morle and Marco Gralike!
I really like (besides the light and modern look) the aggregator of the OakTable members' blogs - a window on high-quality news and investigations about Oracle ...
- Friday, April 23, 2010 Uncategorized
- Comment now »
Xtrace: an Oracle session trace browser (introduction)
Xtrace is a graphical tool that can navigate Oracle trace files, manipulate them, and optionally get them back as a text file. It actually makes (much) more, but in this first post we are going to focus on its basic browsing capabilities.
Let’s see the tool in action on the trace file produced by this simple PL/SQL block:
begin
for r in (select * from t) loop
null;
end loop;
end;
The resulting trace file is
WAIT #2: nam='SQL*Net message from client' ela= 61126 driver id=1413697536 #bytes=1 p3=0 obj#=76357 tim=5789636384898
=====================
PARSING IN CURSOR #26 len=66 dep=0 uid=73 oct=47 lid=73 tim=5789636385129 hv=3421439103 ad='aeb809c8'
begin
for r in (select * from t) loop
null;
end loop;
end;
END OF STMT
PARSE #26:c=0,e=153,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=5789636385122
BINDS #26:
=====================
PARSING IN CURSOR #28 len=15 dep=1 uid=73 oct=3 lid=73 tim=5789636386184 hv=1406298530 ad='a0503300'
SELECT * FROM T
END OF STMT
PARSE #28:c=0,e=804,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=1,tim=5789636386181
BINDS #28:
EXEC #28:c=0,e=64,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,tim=5789636386284
WAIT #28: nam='db file sequential read' ela= 19 file#=4 block#=59 blocks=1 obj#=76357 tim=5789636386383
WAIT #28: nam='db file sequential read' ela= 11 file#=4 block#=60 blocks=1 obj#=76357 tim=5789636386457
FETCH #28:c=0,e=243,p=2,cr=3,cu=0,mis=0,r=100,dep=1,og=1,tim=5789636386566
FETCH #28:c=0,e=54,p=0,cr=1,cu=0,mis=0,r=100,dep=1,og=1,tim=5789636386663
FETCH #28:c=0,e=3,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,tim=5789636386693
EXEC #26:c=0,e=1543,p=2,cr=4,cu=0,mis=0,r=1,dep=0,og=1,tim=5789636386746
WAIT #26: nam='SQL*Net message to client' ela= 2 driver id=1413697536 #bytes=1 p3=0 obj#=76357 tim=5789636387057
WAIT #26: nam='SQL*Net message from client' ela= 42743 driver id=1413697536 #bytes=1 p3=0 obj#=76357 tim=5789636429824
STAT #28 id=1 cnt=200 pid=0 pos=1 obj=76357 op='TABLE ACCESS FULL T (cr=4 pr=2 pw=0 time=363 us)'
Even for this artificially simple trace file, it takes a lot of effort to read and understand it; for example, it takes a while to associate the recursive SQL lines to the execution of the PL/SQL blocks (the “EXEC #26” line).
With Xtrace, the trace reading experience is remarkably much better:

Note the indentation by recursive level (which is provided out-of-the -box) and the color of the lines by statement (that takes perhaps a minute in order to be set up).
You can try this example live by pressing the “Launch” button above if you are interested; in particular, try the “Options” button of the middle pane, and the “set color” popup menus of the top pane.
Suggestion: you might even check the hyperlinks that links together the lines; for example, the xct pointer that links the SQL recursive calls to the parent “EXEC #26” (check the interactive manual for more information).
You can also get the trace back as a text file, if so desired:
000 line zero
001 xtrace: log file 'E:\localCVS30\TrilogyLectures\MioSitoWeb\xtrace\dist\xtrace.log'
002 VIRTUAL CALL #-4: 'null call - ignore this'
003 VIRTUAL CALL #-4: 'null call - ignore this'
004 +WAIT #2: nam='SQL*Net message from client' xe=ela=61126 p1='driver id'=1413697536 p2='#bytes'=1 p3=''=0 xphy=0 obj#=76357 tim=5789636384898
005 VIRTUAL CALL #-8: 'wait-for-client'
006 VIRTUAL CALL #-5: 'client-message-received'
007 ---------------------PARSING IN CURSOR #26: len=66 dep=0 uid=73 oct=47 lid=73 tim=5789636385129 hv=3421439103 ad='0eb809c8'
begin
for r in (select * from t) loop
null;
end loop;
end;
END OF STMT
008 PARSE #26: mis=0 r=0 dep=0 og=1 tim=5789636385122 e=153 c=0 p=0 cr=0 cu=0
009 BINDS #26:
010 ---------------------PARSING IN CURSOR #28: len=15 dep=1 uid=73 oct=3 lid=73 tim=5789636386184 hv=1406298530 ad='00503300'
SELECT * FROM T
END OF STMT
011 PARSE #28: mis=1 r=0 dep=1 og=1 tim=5789636386181 e=804 c=0 p=0 cr=0 cu=0
012 BINDS #28:
013 EXEC #28: mis=0 r=0 dep=1 og=1 tim=5789636386284 e=64 c=0 p=0 cr=0 cu=0
014 +WAIT #28: nam='db file sequential read' xe=ela=19 p1='file#'=4 p2='block#'=59 p3='blocks'=1 xphy=1 obj#=76357 tim=5789636386383
015 +WAIT #28: nam='db file sequential read' xe=ela=11 p1='file#'=4 p2='block#'=60 p3='blocks'=1 xphy=1 obj#=76357 tim=5789636386457
016 FETCH #28: mis=0 r=100 dep=1 og=1 tim=5789636386566 e=243 c=0 p=2 cr=3 cu=0
017 FETCH #28: mis=0 r=100 dep=1 og=1 tim=5789636386663 e=54 c=0 p=0 cr=1 cu=0
018 FETCH #28: mis=0 r=0 dep=1 og=1 tim=5789636386693 e=3 c=0 p=0 cr=0 cu=0
019 EXEC #26: mis=0 r=1 dep=0 og=1 tim=5789636386746 e=1543 c=0 p=2 cr=4 cu=0
020 -WAIT #26: nam='SQL*Net message to client' xe=ela=2 p1='driver id'=1413697536 p2='#bytes'=1 p3=''=0 xphy=0 obj#=76357 tim=5789636387057
021 +WAIT #26: nam='SQL*Net message from client' xe=ela=42743 p1='driver id'=1413697536 p2='#bytes'=1 p3=''=0 xphy=0 obj#=76357 tim=5789636429824
022 VIRTUAL CALL #-8: 'wait-for-client'
023 VIRTUAL CALL #-5: 'client-message-received'
024 STAT #28: id=1 pid=0 pos=1 obj=76357 op='TABLE ACCESS FULL T' cnt=200 avg(cnt)=200.0 card=n/a cr=4 avg(cr)=4.0 cost=n/a pr=2 pw=0 time=363 size=n/a xnexecs=1 xstatn=0 xplannum=0
025
026 VIRTUAL CALL #-4: 'null call - ignore this'
This can be obtained using the “save as text“ popup menu of the middle pane.
We are going to keep exploring Xtrace in the upcoming posts.
- Thursday, April 8, 2010 xtrace
- 4 Comments »
fast refresh of join-only MVs: _mv_refresh_use_stats and locking log stats
A devastating performance degradation of materialized view fast refreshes can happen in versions after 9i - and can be healed rather easily by simply setting the hidden parameter _mv_refresh_use_stats or, a bit surprisingly, by locking statistics on the logs. The problem can manifest at least in the currently-latest patchsets of 10g, 11gR1 and 11gR2 (10.2.0.4, 11.1.0.7 and 11.2.0.1), seems to hit a lot of people, and its root cause are the utilization of wrong hints by the Oracle refresh engine.
We will investigate the join-only MV case only, since this is the case I have investigated after a question by Christo Kutrovsky, factoring in some observations by Taral Desai and some Support notes; I have some clues that something similar may happen for other types of MVs.
The test case sets up this very common scenario for fast refreshes:
1 - two big base tables joined together by the MV;
2 - only a small fraction of rows modified (actually one deleted, two updated, one inserted);
3 - all tables and indexes with fresh statistics collected;
4 - MV logs with no statistic collected AND with not-locked statistics;
5 - indexes present on the joined columns;
6 - indexes present on the rowid columns of the MV.
Points 1 and 2 make for the ideal scenario for incremental ("fast") refreshes to be effective; 3 is very common as well, since you normally have many other statements issued on the tables; the relevance of 4 will be clear later, but it happens very often in real life, since people might perhaps consider collecting stats on the log, but locking their statistics is usually not made, at least in my experience.
To understand the importance of points 5 and 6, please check this post of mine; note how those indexes are a necessary prerequisite for the sanity of the DEL and INS steps of the MV process. Without them, the refresh cannot be incremental since it has no physical way to read and propagate only the modified rows and those related to them, but it must scan (uselessly) most of the base tables and MV. But in other for the refresh to be incremental ("fast"), those indexes have to be actually used...
the issue
Let's illustrate the issue focusing on the DEL step (the easier to discuss about). In the above mentioned post, we have seen that the DEL step uses a single SQL statement whose text, leaving out minor technical details and hints, is:
/* MV_REFRESH (DEL) */
delete from test_mv
where test_t1_rowid in
(
select * from
(
select chartorowid (m_row$$)
from mlog$_test_t1
where snaptime$$ > :1
) as of snapshot (:2)
)
In 9.2.0.8, we get this very healthy plan:
------------------------------------------------- |Id|Operation |Name | ------------------------------------------------- | 0|DELETE STATEMENT | | | 1| DELETE | | | 2| NESTED LOOPS | | | 3| VIEW | | | 4| SORT UNIQUE | | | 5| TABLE ACCESS FULL|MLOG$_TEST_T1 | | 6| INDEX RANGE SCAN |TEST_MV_TEST_T1_ROWID| -------------------------------------------------
That is: get the rowid of all modified rows from the log, and use the rowid-based index to delete the "old image" of them from the MV (inserting their "new image" is the job of the INS step). This is truly incremental, since the resource usage and elapsed time are proportional to the number of rows logged in the MV log, not to the dimension of the tables.
In 10.2.0.4, 11.1.0.7 and 11.2.0.1 the plan becomes:
------------------------------------------ |Id|Operation |Name | ------------------------------------------ | 0|DELETE STATEMENT | | | 1| DELETE |TEST_MV | | 2| HASH JOIN RIGHT SEMI | | | 3| TABLE ACCESS FULL |MLOG$_TEST_T1| | 4| MAT_VIEW ACCESS FULL|TEST_MV | ------------------------------------------
Oops, the indexes are not used ... hence the DEL step overhead is proportional to the size of the MV, and that can be definitely unacceptable.
That is due to the engine injecting an HASH_SJ hint in the outermost nested subquery:
... WHERE "TEST_T1_ROWID" IN (SELECT /*+ NO_MERGE HASH_SJ */ ...
This is recognized as a bug in many scenarios (start from Oracle Support note 578720.1 and follow the references to explore some of them) even if I have not found a clear and exhaustive note that documents the behaviour.
remedy one: set "_mv_refresh_use_stats"
To get back to the healthy plan, simply set "_mv_refresh_use_stats" to "true" (ask Oracle Support first of course for permission); this makes for a set of hint much more adequate for a fast refresh:
... WHERE "TEST_T1_ROWID" IN (SELECT /*+ NO_MERGE NO_SEMIJOIN */ ...
Note: The root cause for this bug is probably due to a change hinted in note 875532.1 - in 10.2.0.3 the meaning of _mv_refresh_use_stats was reversed, but not the default, hence (by mistake?) activating a different piece of the engine code.
The very same problem happens for the INS step; I won't go into much details here (please check the test case spools provided above if interested), but in 9.2.0.8 the base table modified rows are directly fetched using the rowid contained in the log:
----------------------------------------------------- |Id|Operation |Name | ----------------------------------------------------- | 0|INSERT STATEMENT | | | 1| TABLE ACCESS BY INDEX ROWID |TEST_T2 | | 2| NESTED LOOPS | | | 3| VIEW | | | 4| NESTED LOOPS | | | 5| VIEW | | | 6| SORT UNIQUE | | | 7| TABLE ACCESS FULL |MLOG$_TEST_T1 | | 8| TABLE ACCESS BY USER ROWID|TEST_T1 | | 9| INDEX RANGE SCAN |TEST_T2_J2_1_IDX| -----------------------------------------------------
Instead, in 10.2.0.4, 11.1.0.7 and 11.2.0.1 we get the following plan:
-------------------------------------------------- |Id|Operation |Name | -------------------------------------------------- | 0|INSERT STATEMENT | | | 1| TABLE ACCESS BY INDEX ROWID|TEST_T2 | | 2| NESTED LOOPS | | | 3| VIEW | | | 4| HASH JOIN RIGHT SEMI | | | 5| TABLE ACCESS FULL |MLOG$_TEST_T1 | | 6| TABLE ACCESS FULL |TEST_T1 | | 7| INDEX RANGE SCAN |TEST_T2_J2_1_IDX| --------------------------------------------------
Whose resource consumption is, of course, proportional to the size of the base table.
Even in this case, this is due to the nasty HASH_SJ hint:
... FROM "TEST_T1" "MAS$" WHERE ROWID IN (SELECT /*+ HASH_SJ */ ...
If you set _mv_refresh_use_stats, you get back the 9.2.0.8 plan - and thus you are back to incremental for both the DEL and INS steps. As a side note, a cardinality hint is used, where the cardinality is set to the correct value (6 in my test case):
... FROM "TEST_T1" "MAS$" WHERE ROWID IN (SELECT /*+ CARDINALITY(MAS$ 6) NO_SEMIJOIN ...
remedy two: collect and lock statistics on the logs
Very interestingly, instead of setting the hidden parameter, you have another way to get back to the healthy plan: gather statistics on the MV logs when they are empty AND lock them (as suggested in note 578720.1, albeit not in this scenario and even if setting the parameter is not necessary; thanks to Taral Desai for pointing me to the note). In this case, no hint at all is injected beside a NO_MERGE for the DEL step:
... WHERE "TEST_T1_ROWID" IN (SELECT /*+ NO_MERGE */ ... ... FROM "TEST_T1" "MAS$" WHERE ROWID IN (SELECT ...
So, the engine is confident that the CBO will come out with a good plan, and it does not inject any "intelligent" hint. Possibly, and intriguing, this is because by locking the statistics, I am assuring the engine that these statistics are representative of the data anytime. So, locking the statistics is not meant only as a way to prevent dbms_stats from changing them ... it is deeper than that. At least in this case, you are taking responsibility for them, and Oracle will take that in consideration.
- Thursday, March 11, 2010 materialized views, performance tuning
- 3 Comments »
xplan: dbms_metadata.get_ddl for tables referenced by the plan
As a minor but useful new feature, xplan is now able to integrate into its report the DDL of tables (and indexes) referenced by the plan, calling dbms_metadata.get_ddl transparently.
This is mostly useful to get more details about referenced tables' constraints and partitions definition - to complement their CBO-related statistics that xplan reports about.
This feature can be activated by specifing dbms_metadata=y or dbms_metadata=all (check xplan.sql header of xplan.sql for more informations).
We spoke about xplan in general here.
- Tuesday, February 9, 2010 performance tuning, tools, xplan
- 1 Comment »
tweet …
For people that have asked - I'm not blogging simply because I'm feverishly working on a tool of mine that I'm very fond of :)
Guess what it does from the following next two screenshots ...


- Wednesday, January 6, 2010 Uncategorized
- 2 Comments »
11gR2: new algorithm for fast refresh of on-commit materialized views
This post investigates the improvements that have been made in 11gR2 to the fast refresh engine of materialized views (MVs) that are set to be automatically refreshed at commit time. We speak about join-only materialized views only in this post, as always with the help of a test case.
As noted in the post of mine "11gR2: materialized view logs changes", in 11gR2 a new column, xid$$, is now part of materialized view logs; this column records the id of the transaction that logged the changes of the base table which the log is defined on. It is important to stress that this column is added regardless of the type of the MV log, that is, to both the brand-new "commit SCN-based" logs and the old fashioned "timestamp-based" ones. That means that both types of MV logs can take advantage of the new improvements - albeit I haven't tested whether MVs (logs) migrated from a previous version are automatically upgraded by the migration scripts and get the new xid$$ column added.
algorithm before 11gR2
In versions before 11gR2, the refresh algorithm for on-commit MVs was the same as the one for on-demand ones, with only minor variants. That is, the algorithm was almost completely the same, just triggered by the commit event instead of by the user.
For an in-depth analysis of the algorithm, I will refer the reader to the discussion about the on-demand algorithm in the post "fast refresh of join-only materialized views - algorithm summary"; in passing, the test case for this post is in fact the very same three-table join MV, just redefined as "on commit" instead of "on demand". To recap, the "old" algorithm (until 11.1.0.7) was:
1) new log rows are inserted with snaptime$$=4000 A.D;
2) at refresh time (commit time), a snapshot of the new rows is taken, that is, all new rows are marked with snaptime$$= "commit time", using the statement
update MLOG$_TEST_T1
set snaptime$$ = :1
where snaptime$$ > to_date('2100-01-01:00:00:00','YYYY-MM-DD:HH24:MI:SS')
3) all modifications whose snaptime$$ is between the date of the last refresh (excluded) and the commit date(included) are propagated to the MV. The propagation consists of two steps.
First a DEL step:
/* MV_REFRESH (DEL) */
delete from test_mv
where test_t1_rowid in
(
select * from
(
select chartorowid (m_row$$)
from mlog$_test_t1
where snaptime$$ > :1
) -- no "as of snapshot (:2)" clause
)
Then an INS one:
/* MV_REFRESH (INS) */
insert into test_mv
select jv.j1_2, jv.x1, jv.pk1, jv.rid$,
mas2.j2_1, mas2.j2_3, mas2.x2, mas2.pk2, mas2.rowid,
mas3.j3_2, mas3.x3, mas3.pk3, mas3.rowid
from (
select log.rowid rid$, log.*
from test_t1 log
where rowid in
(
select chartorowid(log.m_row$$)
from mlog$_test_t1
where snaptime$$ > :1
)
) jv, -- no "as of snapshot (:2) jv" clause
test_t2 as of snapshot (:2) mas2,
test_t3 as of snapshot (:2) mas3
where jv.j1_2 = mas2.j2_1
and mas2.j2_3 = mas3.j3_2
Note that the only small difference from the on-demand case is the absence of the "as of snapshot" clause, but the statements are otherwise identical. Note also that the rows in the MV log are identified in both statements by snaptime, using the subquery
select chartorowid(log.m_row$$) from mlog$_test_t1 where snaptime$$ > :1
4) all obsolete log rows are deleted, that is, all rows whose snaptime$$ is less than or equal the lowest of all refresh times are removed from the log, using the the statement
delete from mlog$_test_t1 where snaptime$$ <= :1
algorithm starting from 11gR2
In 11gR2, the on-commit algorithm is still almost the same as the on-demand one; the "only" change is how modified rows to be propagated are identified, and in general, how logs are managed. Not surprisingly, log rows are now directly identified by the transaction id, which is logged in xid$$. In detail:
1) new log rows are inserted with xid$$ = transaction id;
2) at refresh time (commit time), no snapshot is taken, that is, the MV log is not updated at all;
3) all modifications made by the committing transaction are propagated to the MV, still using the same two steps.
The DEL step is now:
/* MV_REFRESH (DEL) */
delete from test_mv
where test_t1_rowid in
(
select * from
(
select chartorowid (m_row$$)
from mlog$_test_t1
where xid$$ = :1
)
)
The INS one is:
/* MV_REFRESH (INS) */
insert into test_mv
select jv.j1_2, jv.x1, jv.pk1, jv.rid$,
mas2.j2_1, mas2.j2_3, mas2.x2, mas2.pk2, mas2.rowid,
mas3.j3_2, mas3.x3, mas3.pk3, mas3.rowid
from (
select log.rowid rid$, log.*
from test_t1 log
where rowid in
(
select chartorowid(log.m_row$$)
from mlog$_test_t1
where xid$$ = :1
)
) jv, -- no "as of snapshot (:2) jv" clause
test_t2 as of snapshot (:2) mas2,
test_t3 as of snapshot (:2) mas3
where jv.j1_2 = mas2.j2_1
and mas2.j2_3 = mas3.j3_2
Hence, the big difference from the previous versions case is that rows in the MV log are identified very simply by the transaction that logged them (the committing transaction, of course), by the subquery
select chartorowid(log.m_row$$) from mlog$_test_t1 where xid$$ = :1
4) all obsolete log rows are deleted, that is, the rows logged by the committing transaction are removed, using the the statement
delete from mlog$_test_t1 where where xid$$ = :1
The new algorithm is for sure much simpler and more elegant. Performance is improved since the snapshot step has been removed, and the other steps are more or less as expensive as before.
practical implications: an example
I strongly believe that studying the internals is the best way to learn how to make the best use of any feature. Let's see an example of how the few bits of "internal knowledge" I shared here can be used in practice - that is, how a little investment in investigation makes for huge savings in effort afterwards, and huge gains in effectiveness of your work as well.
It is well-known that it can be sometimes beneficial, in pre-11gR2, to place an index on the log (indexing the log is even suggested by support note 258252 "MATERIALIZED VIEW REFRESH: Locking, Performance, Monitoring"). The scenario that benefits the most from such an index is when the log is composed of mostly-empty blocks, and hence an index access is preferable over a full table(log) scan; you get mostly-empty blocks, for example, when there are peeks in activity on the master tables that keep the log High Water Mark very high.
From the above discussion, it is obvious that in pre-11gR2, the best index for join-only MVs was on (snaptime$$, m_row$$) - not on snaptime$$ alone as it is sometimes suggested - to make the refresh operation an index-only one.
Starting from 11gR2, the best index is now on (xid$$, m_row$$). Not only that, but having no snapshot step, and hence no update on the index, makes the indexing option even more attractive.
Could you see these implications so easily, without knowing the internals? I don't.
- Sunday, November 22, 2009 materialized views, performance tuning
- 8 Comments »
11gR2: materialized view logs changes
In this post we are going to discuss some 11gR2 changes to materialized view logs that are aimed at increasing the performance of the fast-refresh engine of materialized views (MVs), especially the on-commit variant.
The MV logs, in 10gr2, now comes in two flavours: the traditional (and still the default) timestamp-based one and the brand new commit SCN-based one; you choose the latter type by specifing the "WITH COMMIT SCN" clause at MV log creation time. Interestingly, the "old" timestamp-based implementation has been changed as well. Let's examine both with the help, as usual, of a test case.
Timestamp-based MV logs (the "old" type)
The test case configures an MV log as "log everything", that is, it activates all the logging options:
create materialized view log on test_t1 with sequence, rowid, primary key (x1) including new values;
In pre-11gR2 (e.g. in 11.1.0.7, 10.2.0.4), the MV log columns were:
pk1 number(22) x1 number(22) m_row$$ varchar2(255) sequence$$ number(22) snaptime$$ date(7) dmltype$$ varchar2(1) old_new$$ varchar2(1) change_vector$$ raw(255)
now in 11gR2 (10.2.0.1):
pk1 number(22) x1 number(22) m_row$$ varchar2(255) sequence$$ number(22) snaptime$$ date(7) dmltype$$ varchar2(1) old_new$$ varchar2(1) change_vector$$ raw(255) xid$$ number(22)
the only difference is the new column xid$$ (transaction id) that uniquely identifies the transaction that made the changes to the row. For the curious, the number is a combination of the elements of the triplet (undo segment number, undo slot, undo sequence); it is simply the binary concatenation of the three numbers shifted by (48, 32, 0) bits respectively (as checked in the script).
The xid$$ column is used by the 11gR2 on-commit fast refresh engine, which can now easily retrieve the changes made by the just-committed transaction by its xid; at the opposite, the on-demand fast refresh one keeps using snaptime$$ as it did in previous versions. I will speak about this in more detail in an upcoming post.
Commit SCN-based MV logs (the "new" type in 11gR2)
Let's recreate the same MV log, this time adding the commit SCN clause (new in 11GR2):
create materialized view log on test_t1 with sequence, rowid, primary key (x1), COMMIT SCN including new values;
The columns of the MV log are:
pk1 number(22) x1 number(22) m_row$$ varchar2(255) sequence$$ number(22) dmltype$$ varchar2(1) old_new$$ varchar2(1) change_vector$$ raw(255) xid$$ number(22)
so, the only difference from the 11gR2 timestamp-based case is that snaptime$$ is no longer a column of the MV log; the only difference from the pre-11gR2 is that snaptime$$ has been replaced with xid$$.
For this log flavour only, the mapping between the xid that modified the table and its commit-time SCN is now tracked in a new view, all_summap (probably named after "SUMmary MAP", "summary" being yet another synonym for "MV"), which is (as of 11.2.0.1) a straight "select *" of the dictionary table sys.snap_xcmt$. To illustrate, the script makes one insert, one update and one delete on the base table, which translates into 4 rows inside the MV log with the same xid:
SQL> select distinct xid$$ from mlog$_test_t1;
XID$$
---------------------
1126024460895690
after the commit, we get
SQL> select * from all_summap where xid in (select xid$$ from mlog$_test_t1);
XID COMMIT_SCN
--------------------- ----------
1126024460895690 2885433
hence, it is now possible to know the infinite-precision time (the SCN) when every modification became visible to an external observer (the commit SCN) by simply joining the MV log and all_summap (or sys.snap_xcmt$). Note that the commit SCN is not propagated to the MV log at all.
commit SCN-based MV logs for on-demand fast refresh
This new xid$$ column and commit-SCN mapping table are leveraged by the fast refresh of on-demand MVs as follows (on-commit ones do not need the SCN as they know exactly the xid of the committed transaction; again we will see that in an upcoming post).
With "old style" timestamp-based MV logs, the refresh is performed by using a "mark-and-propagate" algorithm, which is essentially (check this post for some additional details):
1) new log rows are inserted with snaptime$$=4000 A.D;
2) at refresh time, a snapshot of the new rows is taken, that is, all new rows are marked with snaptime$$=sysdate;
3) all modifications whose snaptime$$ is between the date of the last refresh (excluded) and sysdate(included) are propagated to the MV;
4) all obsolete log rows are deleted, that is, all rows whose snaptime$$ is less than or equal the lowest of all refresh times are removed from the log.
With "new style" SCN-based MV logs, the algorithm is, instead:
1) new log rows are inserted with xid$$=transaction id of modifing transaction;
2) at refresh time, the current SCN is retrieved (no snapshot is performed);
3) all modifications whose xid maps to a row in all_summap whose commit_scn is between the SCN of the last refresh (excluded) and the retrieved current SCN(included) are propagated to the MV;
4) obsolete rows are removed from the log as before, this time using the SCN instead of snaptime$$.
The main advantage is that the snapshot is not performed, thus removing the redo and undo generated by the update, and obviously the log visit (usually a full table scan) as well - at the cost of an additional join with all_summap (or sys.snap_xcmt$) later; if the join is calculated efficiently, that is very likely advantageous "in general" (but as always, it depends on your scenario).
It might be (rarely) beneficial to index xid$$, as it is (rarely) beneficial to index snaptime$$. In that case, having no snapshot performed reduces both the undo and redo generated for the index maintenance.
As a side and "philosophical" note, it is also worth noting that the new logging mechanism records more information - now we know which transactions modified the table and the infinite-precision time (the SCN) of modifications, and this is much more informative about the history of the logged table than the mostly meaningless refresh time contained in snaptime$$. This is definitely a better utilization of storage.
I plan to blog about how the new MV log impact fast refreshes in 11gR2 in the near future, focusing on join-only MVs; so stay tuned if you're interested.
- Tuesday, November 3, 2009 materialized views
- 3 Comments »
CBO: NewDensity for Frequency Histograms,11g-10.2.0.4 (densities part IV)
As we have seen in the previous posts of this series, in 11g a new figure named "NewDensity" has been introduced as a replacement for the "density" column statistic for columns whose histogram has been collected; this change has been backported in 10.2.0.4 also.
In the previous post we discussed how NewDensity influences the CBO cardinality estimate for Height-Balanced histograms; in this one we are going to investigate the same for Frequency Histograms. We will see that the most important change is the introduction of the “half the least popular" rule (see the "Histogram change" post by Jonathan Lewis, which distills the findings of Randolf Geist and Riyaj Shamsudeen) - a surprising rule that might easily cause trouble (in fact as Jonathan reports in the comments - bug 6082745 was opened against this rule).
The test case (script density_post_freq.sql) considers the same test statement we focused on in the previous post (a single equality filter predicate which asks for a value inside the min-max range of the column):
select ... from t where value = 64.5;
Of course we compute a Frequency instead of an Height-Balanced histogram, and use a slightly different value distribution in order to highlight the new rule:
SQL> select value, count(*)
2 from t
3 group by value
4 order by value;
VALUE COUNT(*)
---------- ----------
8 8
16 16
64 64
128 128
The histogram generated by the test case is (from DBA_HISTOGRAMS):
VALUE EP BKT
---------- ---------- ----------
8 8 8
16 24 16
64 88 64
128 216 128
VALUE is an abbreviation for ENDPOINT_VALUE, EP for ENDPOINT_NUMBER.
BKT is the number of buckets covered by the value (i.e.: EP minus the previous EP), that is, the number of rows whose column value was equal to VALUE at statistics collection time.
When the filter predicate selects a value contained in the histogram, the new releases behave the same as the old ones (but check the "bottom note about singleton values" at the bottom for a a minor but interesting detail): neither density nor NewDensity is used, and the cardinality estimate is the usual intuitive one. In the complementary case of a value not contained in the histogram (but still inside the min-max interval), the cardinality used to be calculated as density*num_rows and it is now NewDensity*num_rows. Note the simmetry with the Height-Balanced case: the formula is the same, with NewDensity simply replacing density.
NewDensity with the “half the least popular" rule active
By default the rule is active, and in this case, NewDensity is set to
NewDensity = 0.5 * bkt(least_popular_value) / num_rows
and hence, for non-existent values:
E[card] = (0.5 * bkt(least_popular_value) / num_rows) * num_rows
= 0.5 * bkt(least_popular_value)
For our test case, the least_popular_value is 8 and bkt(8) = 8, hence E[card] = 0.5 * 8 = 4 thanks to NewDensity being equal to 0.5 * 8 / 216 = 0.018518519. In fact, we can verify in the 10053 traces (in 10.2.0.4, 11.1.0.7, 11.2.0.1) for our statement, that asks for a not-existent value (64.5), that E[card] and NewDensity are set as above:
NewDensity:0.018519, OldDensity:0.002315 BktCnt:216, PopBktCnt:216, PopValCnt:4, NDV:4
Using density: 0.018519 of col #1 as selectivity of unpopular value pred
Table: T Alias: NOT_EXISTENT
Card: Original: 216.000000 Rounded: 4 Computed: 4.00 Non Adjusted: 4.00
As another check, let's see what happens if bkt(least_popular_value) = 1, that is, if there is (at least) one value that occurred exactly one time (a singleton value) at statistics collection time. Adding such a row to our test case is trivial (just uncomment the first insert row in the script); in this scenario, our formula above predicts E[card] = 0.5 with NewDensity = 0.5 / 217 = .002304147, and in fact (check the *_least_is_one.trc traces):
NewDensity:0.002304, OldDensity:0.002304 BktCnt:217, PopBktCnt:216, PopValCnt:4, NDV:5
Using density: 0.002304 of col #1 as selectivity of unpopular value pred
Table: T Alias: NOT_EXISTENT
Card: Original: 217.000000 Rounded: 1 Computed: 0.50 Non Adjusted: 0.50
note that E[card] gets rounded up from 0.5 to 1 (as usual).
What is the rationale behind this rule? Thanks to Randolf Geist (see the comment in Jonathan's blog entry above), we know that it was introduced as a patch to solve one particular scenario (see bug 5483301) and then included in the main release, for some reason. Luckily, the rule can be disabled and the old sane behaviour can be restored.
NewDensity with the “half the least popular" rule disabled
To disable the new rule, just switch off the patch 5483301:
alter session set "_fix_control"='5483301:off';
(or alter system if you want to make it permanent)
with this setting, NewDensity becomes simply
NewDensity = 0.5 / num_rows
and hence, again for non-existent values:
E[card] = 0.5
which is exactly what we got in pre-10.2.0.4, where density was used (and density was, and is still, set to 0.5 / num_rows by dbms_stats). So the cardinality estimate is 0.5 (rounded up to 1).
For our test case, we predict NewDensity = 0.5 / 216 = 0.002314815. In fact our 10053 traces tell us:
NewDensity:0.002315, OldDensity:0.002315 BktCnt:216, PopBktCnt:216, PopValCnt:4, NDV:4
Table: T Alias: NOT_EXISTENT_OFF
Card: Original: 216.000000 Rounded: 1 Computed: 0.50 Non Adjusted: 0.50
The rationale for this behaviour is sound; the CBO knows that no row with the requested value existed at statistics collection time, hence it returns the minimal cardinality estimate compatible with the non empty result set assumption (check this post for the importance of this assumption). If the statistics are reasonably fresh, this is the only sane estimate that can be made.
Playing with density - a warning
If you set your own column stats using dbms_stats.set_column_stats, the behaviour is different; I haven't made any extensive investigations but as far as I can tell, the value you provide for density is used instead of NewDensity. User-provided column statistics are flagged with dba_tab_cols.user_stats = 'YES'. You can disguise your user statistics as non-user by setting the flags parameter of dbms_stats.set_column_stats to 2 - but since the latter parameter is labeled as "for Oracle internal use only", I would do it only for investigations purposes - that is, never in production.
---
Bottom note about singleton values: actually in pre-10.2.0.4 versions, if the value was present in the Frequency histogram but covering a single bucket (hence it was present in the table exactly one time at statistic collection time), it used to be classified as "unpopular" and hence used to get the same treatment as a value not in the histogram - the end result being that the cardinality was estimated as 0.5 rounded up to 1; now it is 1 before rounding as one would intuitively expects. I hope to be able to investigate whether this change fixes the issues about join cardinality estimation I investigated - see "the mystery of halving" in this investigation of mine if interested.
Other posts belonging to this series:
densities part I
densities part II
densities part III
- Friday, October 23, 2009 CBO
- 5 Comments »