Third International NoCOUG SQL & NoSQL Challenge!

For anyone into using their SQL skills creatively, and getting out of the boring SQL-coding daily routine ... here is a puzzle that is both entertaining and challenging, and with a real prize for the winner!

Official abstract:
"In this challenge (see page 25), the Wicked Witch of the West needs help in creating a magic spell to ensure that the Third Annual Witching & Wizarding Ball is a grand success. The winner will receive the August Order of the Wooden Pretzel in keeping with the Steven Feuerstein’s observation that "some people can perform seeming miracles with straight SQL, but the statements end up looking like pretzels created by somebody who is experimenting with hallucinogens." There are currently four knights of the August Order of the Wooden Pretzel: Alberto Dell’Era (Italy) who won the first challenge in 2009 and Andre Araujo (Australia), Rob van Wijk (Netherlands), and Ilya Chuhnakov (Russia) who won the second challenge in 2011."

I remember having a lot of fun when I joined the first edition of this Challenge - I hope the same for you :)

Tom Kyte in Italia – 5 Aprile 2011

[Note: I'm writing in Italian since this post is about a local event]

Anche quest'anno Thomas "Tom" Kyte, il "Tom dietro asktom.oracle.com" e autore di diversi libri, è tornato in Italia per tenere una delle sue conferenze ricorrenti più popolari (ecco le slides) - quella sulle features più significative della versione corrente di Oracle (quindi 11gR2 al momento).

Non avevo mai visto Tom sul palco nonostante ci conoscessimo da tempo, per cui ho approfittato subito della gentile offerta dell'Ufficio Stampa Oracle di incontrarlo a valle della conferenza - un'incontro che si è trasformato in una lunga informale chiacchierata di due ore (insieme a Christian Antognini) cominciata a pranzo e continuata sul taxi. Oltre al piacere di parlare con Tom (sempre molto disponibile e amichevole) ne ho tratto diverse informazioni e impressioni che espongo in ordine sparso.

La conferenza in generale. L'aspetto che meglio descrive Tom Kyte, come chiunque abbia avuto modo di leggere uno dei suoi libri può intuire, è di essere una perfetta (quanto rarissima) sintesi fra un ottimo tecnico ed un efficace comunicatore: e difatti, dalla conferenza ho ricavato una conoscenza molto più netta e chiara delle features presentate, anche se già le conoscevo. Mi sono dunque pentito di non aver partecipato alle conferenze degli anni passati - sarebbe stato un ottimo investimento del mio tempo ...

Total Recall (Flashback Data Archive). A mio parere è la killer feature di 11g (era già presente in 10g ma i miglioramenti in 11g riguardo al supporto di molti tipi di DDL sulle tabelle tracciate la rendono nettamente più usabile): poter eseguire una query nel passato (anche anni) semplicemente aggiungendo la clausola "AS OF TIMESTAMP" può davvero "cambiare la vita", sia agli sviluppatori che ai DBA. Basti solo pensare alle investigazioni di problemi segnalati oggi ma verificatosi giorni addietro, al ripristino dei dati a fronte di errori, etc. Uno degli usi che voglio indagare è l'estrazione di dati dai sistemi operazionali verso i DWH; a prima vista è più efficiente (e senz'altro infinitamente più manutenibile e generalizzabile) delle classiche tecniche utilizzate. Importantissimo poi sapere come le history tables delle tabelle tracciate vengano aggiornate leggendo le informazioni dagli undo segments, dunque senza impatti sul tempo di esecuzione degli statements DML operanti sulle tabelle tracciate. Total Recall è anche una delle extra-cost options più economiche.

Smart Flash Cache. L'informazione cruciale è che il disco a stato solido ("SSD" o "Flash") della Flash Cache viene usato solo per i blocchi clean, quelly dirty vengono scritti solo sui datafile: quindi solo sistemi il cui bottleneck sono le letture da disco possono trarne beneficio.

La mia impressione è che il suo uso più interessante sia di rendere disponibile memoria veloce per la nuova feature "In-Memory Parallel Execution" (che permette di leggere blocchi nella buffer cache invece che solo nella UGA del processo); sarebbe interessante verificarlo.

Edition-based Redefinition. Certamente è possibile usare questa feature per installare nuove versioni sia dei dati (tabelle con colonne nuove, etc) che del software (package, stored procedures), ma mentre il primo caso è relativamente complesso da gestire, il secondo è semplicissimo - c'è un baratro nella difficoltà d'uso nei due casi. Quindi ritengo che salvo casi particolarissimi (in sistemi con requisiti di altissima disponibilità, gestiti da personale molto preparato, e con alti budget), questa feature troverà uso diffuso "solo" nel secondo caso. Forse era voluto che la demo della conferenza fosse incentrata proprio sul secondo caso ...

Kernel Programmers. Non faceva parte della conferenza, ma gentilmente Tom ha soddisfatto alcune mie curiosità riguardo il kernel di Oracle. Ho così scoperto che vengono seguite delle coding covention strettissime che sono un poco difficili da comprendere inizialmente, ma che permettono a chiunque sia membro del team di leggere e modificare agevolmente il codice (scritto in C ovviamente) del kernel. Inoltre, per entrare a far parte del Team (o meglio di uno dei Team) che lavorano sul kernel, o si è brillantissimi laureati di Università di primo piano, oppure si proviene dall'interno dell'azienda e dunque si è già conosciuti come ottimi professionisti. Ed ovviamente (ma questo è noto), prima di introdurre una nuova feature, questa viene descritta dettagliatamente in appositi documenti ed analizzata per rilevanza e fattibilità con un preciso flusso decisionale. Insomma, un ambiente di lavoro rigorosissimo ed estremamente professionale - un tipo di ambiente ormai raro oggigiorno, ma certamente al cuore del successo di Oracle.

Conclusione. Queste erano le mie considerazioni principali, che ovviamente riflettono i miei interessi e le mie necessità professionali. Per il futuro, mi ripropongo di partecipare più spesso alle conferenze tenute da Oracle Italia - sono sempre state utili e di alta qualità, con un approccio "americano" : tanti fatti, poche chiacchiere, e impostati sulle necessità di chi ascolta.

Xtrace: an Oracle session trace browser – exec flow

Tracing a session is extremely useful when you need to investigate how a client interacts with the database - the client could be an application of yours, a third-party application, or an Oracle module such as dbms_stats or dbms_mview. To get the perfect picture of the client-server dialogue, you "simply" need to consider all EXEC lines in the trace file, and associate to each line the executed statement and the bind variable values; a very tedious and error-prone task when done manually, that Xtrace can make for you (and for free).

Let's see the tool in action. Consider tracing a call to this stored procedure, that executes a recursive SQL statement :

create or replace procedure test_proc( p_x int )
is
begin
  for i in 1..p_x loop
    for k in (select count(*) from t where x > i) loop
      null;
    end loop;
  end loop;
end;

Here is the output of Xtrace:

Reading it bottom-up, you can see that the client called the SP, which in turn executed recursively (note the indentation) the SQL statement twice.

You can also ask Xtrace to display the bind variable values used for each execution:

So - the client passed the value "2" for :p_x to the SP, which in turn executed the SQL statement first passing "1" for :B1, and then passing "2".

Interested ? Try it live (requires Java Web Start):


When Xtrace opens up, press the "options" button and then the "EXEC FLOW analysis" button. Enable/disable the bind variable values using the "display BINDS under EXEC" checkbox; color the statements as you like.

We introduced Xtrace in this post; the Xtrace home page contains the tool (which can be used online or downloaded) - and a manual for advanced uses.

An improved OakTable web site

Today the new OakTable web site, www.oaktable.net, has been published: many thanks to Kurt Van Meerbeeck (that I'm told worked the most on the site), James Morle and Marco Gralike!

I really like (besides the light and modern look) the aggregator of the OakTable members' blogs - a window on high-quality news and investigations about Oracle ...

Xtrace: an Oracle session trace browser (introduction)

Xtrace is a graphical tool that can navigate Oracle trace files, manipulate them, and optionally get them back as a text file. It actually makes (much) more, but in this first post we are going to focus on its basic browsing capabilities.

Let’s see the tool in action on the trace file produced by this simple PL/SQL block:

begin
  for r in (select * from t) loop
    null;
  end loop;
end;

The resulting trace file is

WAIT #2: nam='SQL*Net message from client' ela= 61126 driver id=1413697536 #bytes=1 p3=0 obj#=76357 tim=5789636384898
=====================
PARSING IN CURSOR #26 len=66 dep=0 uid=73 oct=47 lid=73 tim=5789636385129 hv=3421439103 ad='aeb809c8'
begin
  for r in (select * from t) loop
    null;
  end loop;
end;
END OF STMT
PARSE #26:c=0,e=153,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=5789636385122
BINDS #26:
=====================
PARSING IN CURSOR #28 len=15 dep=1 uid=73 oct=3 lid=73 tim=5789636386184 hv=1406298530 ad='a0503300'
SELECT * FROM T
END OF STMT
PARSE #28:c=0,e=804,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=1,tim=5789636386181
BINDS #28:
EXEC #28:c=0,e=64,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,tim=5789636386284
WAIT #28: nam='db file sequential read' ela= 19 file#=4 block#=59 blocks=1 obj#=76357 tim=5789636386383
WAIT #28: nam='db file sequential read' ela= 11 file#=4 block#=60 blocks=1 obj#=76357 tim=5789636386457
FETCH #28:c=0,e=243,p=2,cr=3,cu=0,mis=0,r=100,dep=1,og=1,tim=5789636386566
FETCH #28:c=0,e=54,p=0,cr=1,cu=0,mis=0,r=100,dep=1,og=1,tim=5789636386663
FETCH #28:c=0,e=3,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,tim=5789636386693
EXEC #26:c=0,e=1543,p=2,cr=4,cu=0,mis=0,r=1,dep=0,og=1,tim=5789636386746
WAIT #26: nam='SQL*Net message to client' ela= 2 driver id=1413697536 #bytes=1 p3=0 obj#=76357 tim=5789636387057
WAIT #26: nam='SQL*Net message from client' ela= 42743 driver id=1413697536 #bytes=1 p3=0 obj#=76357 tim=5789636429824
STAT #28 id=1 cnt=200 pid=0 pos=1 obj=76357 op='TABLE ACCESS FULL T (cr=4 pr=2 pw=0 time=363 us)'

Even for this artificially simple trace file, it takes a lot of effort to read and understand it; for example, it takes a while to associate the recursive SQL lines to the execution of the PL/SQL blocks (the “EXEC #26” line).

With Xtrace, the trace reading experience is remarkably much better:



Note the indentation by recursive level (which is provided out-of-the -box) and the color of the lines by statement (that takes perhaps a minute in order to be set up).
You can try this example live by pressing the “Launch” button above if you are interested; in particular, try the “Options” button of the middle pane, and the “set color” popup menus of the top pane.

Suggestion: you might even check the hyperlinks that links together the lines; for example, the xct pointer that links the SQL recursive calls to the parent “EXEC #26” (check the interactive manual for more information).

You can also get the trace back as a text file, if so desired:

000 line zero
001 xtrace: log file 'E:\localCVS30\TrilogyLectures\MioSitoWeb\xtrace\dist\xtrace.log'
002 VIRTUAL CALL #-4: 'null call - ignore this'
003 VIRTUAL CALL #-4: 'null call - ignore this'
004 +WAIT #2: nam='SQL*Net message from client' xe=ela=61126 p1='driver id'=1413697536 p2='#bytes'=1 p3=''=0 xphy=0 obj#=76357 tim=5789636384898
005 VIRTUAL CALL #-8: 'wait-for-client'
006 VIRTUAL CALL #-5: 'client-message-received'
007 ---------------------PARSING IN CURSOR #26: len=66 dep=0 uid=73 oct=47 lid=73 tim=5789636385129 hv=3421439103 ad='0eb809c8'
    begin
      for r in (select * from t) loop
        null;
      end loop;
    end;
    END OF STMT
008 PARSE #26: mis=0 r=0 dep=0 og=1 tim=5789636385122 e=153 c=0 p=0 cr=0 cu=0
009 BINDS #26:
010   ---------------------PARSING IN CURSOR #28: len=15 dep=1 uid=73 oct=3 lid=73 tim=5789636386184 hv=1406298530 ad='00503300'
      SELECT * FROM T
      END OF STMT
011   PARSE #28: mis=1 r=0 dep=1 og=1 tim=5789636386181 e=804 c=0 p=0 cr=0 cu=0
012   BINDS #28:
013   EXEC  #28: mis=0 r=0 dep=1 og=1 tim=5789636386284 e=64 c=0 p=0 cr=0 cu=0
014   +WAIT #28: nam='db file sequential read' xe=ela=19 p1='file#'=4 p2='block#'=59 p3='blocks'=1 xphy=1 obj#=76357 tim=5789636386383
015   +WAIT #28: nam='db file sequential read' xe=ela=11 p1='file#'=4 p2='block#'=60 p3='blocks'=1 xphy=1 obj#=76357 tim=5789636386457
016   FETCH #28: mis=0 r=100 dep=1 og=1 tim=5789636386566 e=243 c=0 p=2 cr=3 cu=0
017   FETCH #28: mis=0 r=100 dep=1 og=1 tim=5789636386663 e=54 c=0 p=0 cr=1 cu=0
018   FETCH #28: mis=0 r=0 dep=1 og=1 tim=5789636386693 e=3 c=0 p=0 cr=0 cu=0
019 EXEC  #26: mis=0 r=1 dep=0 og=1 tim=5789636386746 e=1543 c=0 p=2 cr=4 cu=0
020 -WAIT #26: nam='SQL*Net message to client' xe=ela=2 p1='driver id'=1413697536 p2='#bytes'=1 p3=''=0 xphy=0 obj#=76357 tim=5789636387057
021 +WAIT #26: nam='SQL*Net message from client' xe=ela=42743 p1='driver id'=1413697536 p2='#bytes'=1 p3=''=0 xphy=0 obj#=76357 tim=5789636429824
022 VIRTUAL CALL #-8: 'wait-for-client'
023 VIRTUAL CALL #-5: 'client-message-received'
024   STAT  #28: id=1 pid=0 pos=1 obj=76357 op='TABLE ACCESS FULL T' cnt=200 avg(cnt)=200.0 card=n/a cr=4 avg(cr)=4.0 cost=n/a pr=2 pw=0 time=363 size=n/a xnexecs=1 xstatn=0 xplannum=0
025
026 VIRTUAL CALL #-4: 'null call - ignore this'

This can be obtained using the “save as text“ popup menu of the middle pane.

We are going to keep exploring Xtrace in the upcoming posts.

fast refresh of join-only MVs: _mv_refresh_use_stats and locking log stats

A devastating performance degradation of materialized view fast refreshes can happen in versions after 9i - and can be healed rather easily by simply setting the hidden parameter _mv_refresh_use_stats or, a bit surprisingly, by locking statistics on the logs. The problem can manifest at least in the currently-latest patchsets of 10g, 11gR1 and 11gR2 (10.2.0.4, 11.1.0.7 and 11.2.0.1), seems to hit a lot of people, and its root cause are the utilization of wrong hints by the Oracle refresh engine.

We will investigate the join-only MV case only, since this is the case I have investigated after a question by Christo Kutrovsky, factoring in some observations by Taral Desai and some Support notes; I have some clues that something similar may happen for other types of MVs.

The test case sets up this very common scenario for fast refreshes:

1 - two big base tables joined together by the MV;
2 - only a small fraction of rows modified (actually one deleted, two updated, one inserted);
3 - all tables and indexes with fresh statistics collected;
4 - MV logs with no statistic collected AND with not-locked statistics;
5 - indexes present on the joined columns;
6 - indexes present on the rowid columns of the MV.

Points 1 and 2 make for the ideal scenario for incremental ("fast") refreshes to be effective; 3 is very common as well, since you normally have many other statements issued on the tables; the relevance of 4 will be clear later, but it happens very often in real life, since people might perhaps consider collecting stats on the log, but locking their statistics is usually not made, at least in my experience.

To understand the importance of points 5 and 6, please check this post of mine; note how those indexes are a necessary prerequisite for the sanity of the DEL and INS steps of the MV process. Without them, the refresh cannot be incremental since it has no physical way to read and propagate only the modified rows and those related to them, but it must scan (uselessly) most of the base tables and MV. But in other for the refresh to be incremental ("fast"), those indexes have to be actually used...

the issue

Let's illustrate the issue focusing on the DEL step (the easier to discuss about). In the above mentioned post, we have seen that the DEL step uses a single SQL statement whose text, leaving out minor technical details and hints, is:

/* MV_REFRESH (DEL) */
delete from test_mv
 where test_t1_rowid in
       (
select * from
       (
select chartorowid (m_row$$)
  from mlog$_test_t1
 where snaptime$$ > :1
       ) as of snapshot (:2)
       )

In 9.2.0.8, we get this very healthy plan:

-------------------------------------------------
|Id|Operation             |Name                 |
-------------------------------------------------
| 0|DELETE STATEMENT      |                     |
| 1| DELETE               |                     |
| 2|  NESTED LOOPS        |                     |
| 3|   VIEW               |                     |
| 4|    SORT UNIQUE       |                     |
| 5|     TABLE ACCESS FULL|MLOG$_TEST_T1        |
| 6|   INDEX RANGE SCAN   |TEST_MV_TEST_T1_ROWID|
-------------------------------------------------

That is: get the rowid of all modified rows from the log, and use the rowid-based index to delete the "old image" of them from the MV (inserting their "new image" is the job of the INS step). This is truly incremental, since the resource usage and elapsed time are proportional to the number of rows logged in the MV log, not to the dimension of the tables.

In 10.2.0.4, 11.1.0.7 and 11.2.0.1 the plan becomes:

------------------------------------------
|Id|Operation              |Name         |
------------------------------------------
| 0|DELETE STATEMENT       |             |
| 1| DELETE                |TEST_MV      |
| 2|  HASH JOIN RIGHT SEMI |             |
| 3|   TABLE ACCESS FULL   |MLOG$_TEST_T1|
| 4|   MAT_VIEW ACCESS FULL|TEST_MV      |
------------------------------------------

Oops, the indexes are not used ... hence the DEL step overhead is proportional to the size of the MV, and that can be definitely unacceptable.

That is due to the engine injecting an HASH_SJ hint in the outermost nested subquery:

... WHERE "TEST_T1_ROWID" IN (SELECT /*+ NO_MERGE  HASH_SJ  */ ...

This is recognized as a bug in many scenarios (start from Oracle Support note 578720.1 and follow the references to explore some of them) even if I have not found a clear and exhaustive note that documents the behaviour.

remedy one: set "_mv_refresh_use_stats"

To get back to the healthy plan, simply set "_mv_refresh_use_stats" to "true" (ask Oracle Support first of course for permission); this makes for a set of hint much more adequate for a fast refresh:

... WHERE "TEST_T1_ROWID" IN (SELECT /*+ NO_MERGE  NO_SEMIJOIN  */ ...

Note: The root cause for this bug is probably due to a change hinted in note 875532.1 - in 10.2.0.3 the meaning of _mv_refresh_use_stats was reversed, but not the default, hence (by mistake?) activating a different piece of the engine code.

The very same problem happens for the INS step; I won't go into much details here (please check the test case spools provided above if interested), but in 9.2.0.8 the base table modified rows are directly fetched using the rowid contained in the log:

-----------------------------------------------------
|Id|Operation                      |Name            |
-----------------------------------------------------
| 0|INSERT STATEMENT               |                |
| 1| TABLE ACCESS BY INDEX ROWID   |TEST_T2         |
| 2|  NESTED LOOPS                 |                |
| 3|   VIEW                        |                |
| 4|    NESTED LOOPS               |                |
| 5|     VIEW                      |                |
| 6|      SORT UNIQUE              |                |
| 7|       TABLE ACCESS FULL       |MLOG$_TEST_T1   |
| 8|     TABLE ACCESS BY USER ROWID|TEST_T1         |
| 9|   INDEX RANGE SCAN            |TEST_T2_J2_1_IDX|
-----------------------------------------------------

Instead, in 10.2.0.4, 11.1.0.7 and 11.2.0.1 we get the following plan:

--------------------------------------------------
|Id|Operation                   |Name            |
--------------------------------------------------
| 0|INSERT STATEMENT            |                |
| 1| TABLE ACCESS BY INDEX ROWID|TEST_T2         |
| 2|  NESTED LOOPS              |                |
| 3|   VIEW                     |                |
| 4|    HASH JOIN RIGHT SEMI    |                |
| 5|     TABLE ACCESS FULL      |MLOG$_TEST_T1   |
| 6|     TABLE ACCESS FULL      |TEST_T1         |
| 7|   INDEX RANGE SCAN         |TEST_T2_J2_1_IDX|
--------------------------------------------------

Whose resource consumption is, of course, proportional to the size of the base table.

Even in this case, this is due to the nasty HASH_SJ hint:

... FROM "TEST_T1" "MAS$" WHERE ROWID IN (SELECT  /*+ HASH_SJ */  ...

If you set _mv_refresh_use_stats, you get back the 9.2.0.8 plan - and thus you are back to incremental for both the DEL and INS steps. As a side note, a cardinality hint is used, where the cardinality is set to the correct value (6 in my test case):

... FROM "TEST_T1" "MAS$" WHERE ROWID IN (SELECT  /*+ CARDINALITY(MAS$ 6) NO_SEMIJOIN ...

 
remedy two: collect and lock statistics on the logs

Very interestingly, instead of setting the hidden parameter, you have another way to get back to the healthy plan: gather statistics on the MV logs when they are empty AND lock them (as suggested in note 578720.1, albeit not in this scenario and even if setting the parameter is not necessary; thanks to Taral Desai for pointing me to the note). In this case, no hint at all is injected beside a NO_MERGE for the DEL step:

... WHERE "TEST_T1_ROWID" IN (SELECT /*+ NO_MERGE  */ ...
... FROM "TEST_T1" "MAS$" WHERE ROWID IN (SELECT  ...

So, the engine is confident that the CBO will come out with a good plan, and it does not inject any "intelligent" hint. Possibly, and intriguing, this is because by locking the statistics, I am assuring the engine that these statistics are representative of the data anytime. So, locking the statistics is not meant only as a way to prevent dbms_stats from changing them ... it is deeper than that. At least in this case, you are taking responsibility for them, and Oracle will take that in consideration.

xplan: dbms_metadata.get_ddl for tables referenced by the plan

As a minor but useful new feature, xplan is now able to integrate into its report the DDL of tables (and indexes) referenced by the plan, calling dbms_metadata.get_ddl transparently.

This is mostly useful to get more details about referenced tables' constraints and partitions definition - to complement their CBO-related statistics that xplan reports about.

This feature can be activated by specifing dbms_metadata=y or dbms_metadata=all (check xplan.sql header of xplan.sql for more informations).

We spoke about xplan in general here.

tweet …

For people that have asked - I'm not blogging simply because I'm feverishly working on a tool of mine that I'm very fond of :)

Guess what it does from the following next two screenshots ...

tool_lines2

tool_prof1

11gR2: new algorithm for fast refresh of on-commit materialized views

This post investigates the improvements that have been made in 11gR2 to the fast refresh engine of materialized views (MVs) that are set to be automatically refreshed at commit time. We speak about join-only materialized views only in this post, as always with the help of a test case.

As noted in the post of mine "11gR2: materialized view logs changes", in 11gR2 a new column, xid$$, is now part of materialized view logs; this column records the id of the transaction that logged the changes of the base table which the log is defined on. It is important to stress that this column is added regardless of the type of the MV log, that is, to both the brand-new "commit SCN-based" logs and the old fashioned "timestamp-based" ones. That means that both types of MV logs can take advantage of the new improvements - albeit I haven't tested whether MVs (logs) migrated from a previous version are automatically upgraded by the migration scripts and get the new xid$$ column added.

algorithm before 11gR2

In versions before 11gR2, the refresh algorithm for on-commit MVs was the same as the one for on-demand ones, with only minor variants. That is, the algorithm was almost completely the same, just triggered by the commit event instead of by the user.

For an in-depth analysis of the algorithm, I will refer the reader to the discussion about the on-demand algorithm in the post "fast refresh of join-only materialized views - algorithm summary"; in passing, the test case for this post is in fact the very same three-table join MV, just redefined as "on commit" instead of "on demand". To recap, the "old" algorithm (until 11.1.0.7) was:

1) new log rows are inserted with snaptime$$=4000 A.D;
2) at refresh time (commit time), a snapshot of the new rows is taken, that is, all new rows are marked with snaptime$$= "commit time", using the statement

update MLOG$_TEST_T1
   set snaptime$$ = :1
 where snaptime$$ > to_date('2100-01-01:00:00:00','YYYY-MM-DD:HH24:MI:SS')

3) all modifications whose snaptime$$ is between the date of the last refresh (excluded) and the commit date(included) are propagated to the MV. The propagation consists of two steps.
First a DEL step:

/* MV_REFRESH (DEL) */
delete from test_mv
 where test_t1_rowid in
       (
select * from
       (
select chartorowid (m_row$$)
  from mlog$_test_t1
 where snaptime$$ > :1
       ) -- no "as of snapshot (:2)" clause
       )

Then an INS one:

/* MV_REFRESH (INS) */
insert into test_mv
select jv.j1_2, jv.x1, jv.pk1, jv.rid$,
       mas2.j2_1, mas2.j2_3, mas2.x2, mas2.pk2, mas2.rowid,
       mas3.j3_2, mas3.x3, mas3.pk3, mas3.rowid
  from (
select log.rowid rid$, log.*
  from test_t1 log
 where rowid in
       (
select chartorowid(log.m_row$$)
  from mlog$_test_t1
 where snaptime$$ > :1
       )
       ) jv, -- no "as of snapshot (:2) jv" clause
       test_t2 as of snapshot (:2)  mas2,
       test_t3 as of snapshot (:2)  mas3
 where   jv.j1_2 = mas2.j2_1
   and mas2.j2_3 = mas3.j3_2

Note that the only small difference from the on-demand case is the absence of the "as of snapshot" clause, but the statements are otherwise identical. Note also that the rows in the MV log are identified in both statements by snaptime, using the subquery

select chartorowid(log.m_row$$)
  from mlog$_test_t1
 where snaptime$$ > :1

4) all obsolete log rows are deleted, that is, all rows whose snaptime$$ is less than or equal the lowest of all refresh times are removed from the log, using the the statement

delete from mlog$_test_t1
 where snaptime$$ <= :1

algorithm starting from 11gR2

In 11gR2, the on-commit algorithm is still almost the same as the on-demand one; the "only" change is how modified rows to be propagated are identified, and in general, how logs are managed. Not surprisingly, log rows are now directly identified by the transaction id, which is logged in xid$$. In detail:

1) new log rows are inserted with xid$$ = transaction id;
2) at refresh time (commit time), no snapshot is taken, that is, the MV log is not updated at all;
3) all modifications made by the committing transaction are propagated to the MV, still using the same two steps.
The DEL step is now:

/* MV_REFRESH (DEL) */
delete from test_mv
 where test_t1_rowid in
       (
select * from
       (
select chartorowid (m_row$$)
  from mlog$_test_t1
 where xid$$ = :1
       )
       )

The INS one is:

/* MV_REFRESH (INS) */
insert into test_mv
select jv.j1_2, jv.x1, jv.pk1, jv.rid$,
       mas2.j2_1, mas2.j2_3, mas2.x2, mas2.pk2, mas2.rowid,
       mas3.j3_2, mas3.x3, mas3.pk3, mas3.rowid
  from (
select log.rowid rid$, log.*
  from test_t1 log
 where rowid in
       (
select chartorowid(log.m_row$$)
  from mlog$_test_t1
 where xid$$ = :1
       )
       ) jv, -- no "as of snapshot (:2) jv" clause
       test_t2 as of snapshot (:2)  mas2,
       test_t3 as of snapshot (:2)  mas3
 where   jv.j1_2 = mas2.j2_1
   and mas2.j2_3 = mas3.j3_2

Hence, the big difference from the previous versions case is that rows in the MV log are identified very simply by the transaction that logged them (the committing transaction, of course), by the subquery

select chartorowid(log.m_row$$)
  from mlog$_test_t1
 where xid$$ = :1

4) all obsolete log rows are deleted, that is, the rows logged by the committing transaction are removed, using the the statement

delete from mlog$_test_t1
 where where xid$$ = :1

The new algorithm is for sure much simpler and more elegant. Performance is improved since the snapshot step has been removed, and the other steps are more or less as expensive as before.

practical implications: an example

I strongly believe that studying the internals is the best way to learn how to make the best use of any feature. Let's see an example of how the few bits of "internal knowledge" I shared here can be used in practice - that is, how a little investment in investigation makes for huge savings in effort afterwards, and huge gains in effectiveness of your work as well.

It is well-known that it can be sometimes beneficial, in pre-11gR2, to place an index on the log (indexing the log is even suggested by support note 258252 "MATERIALIZED VIEW REFRESH: Locking, Performance, Monitoring"). The scenario that benefits the most from such an index is when the log is composed of mostly-empty blocks, and hence an index access is preferable over a full table(log) scan; you get mostly-empty blocks, for example, when there are peeks in activity on the master tables that keep the log High Water Mark very high.

From the above discussion, it is obvious that in pre-11gR2, the best index for join-only MVs was on (snaptime$$, m_row$$) - not on snaptime$$ alone as it is sometimes suggested - to make the refresh operation an index-only one.

Starting from 11gR2, the best index is now on (xid$$, m_row$$). Not only that, but having no snapshot step, and hence no update on the index, makes the indexing option even more attractive.

Could you see these implications so easily, without knowing the internals? I don't.

11gR2: materialized view logs changes

In this post we are going to discuss some 11gR2 changes to materialized view logs that are aimed at increasing the performance of the fast-refresh engine of materialized views (MVs), especially the on-commit variant.

The MV logs, in 10gr2, now comes in two flavours: the traditional (and still the default) timestamp-based one and the brand new commit SCN-based one; you choose the latter type by specifing the "WITH COMMIT SCN" clause at MV log creation time. Interestingly, the "old" timestamp-based implementation has been changed as well. Let's examine both with the help, as usual, of a test case.

Timestamp-based MV logs (the "old" type)

The test case configures an MV log as "log everything", that is, it activates all the logging options:

create materialized view log on test_t1
with sequence, rowid, primary key (x1)
including new values;

In pre-11gR2 (e.g. in 11.1.0.7, 10.2.0.4), the MV log columns were:

pk1             number(22)
x1              number(22)
m_row$$         varchar2(255)
sequence$$      number(22)
snaptime$$      date(7)
dmltype$$       varchar2(1)
old_new$$       varchar2(1)
change_vector$$ raw(255)

now in 11gR2 (10.2.0.1):

pk1             number(22)
x1              number(22)
m_row$$         varchar2(255)
sequence$$      number(22)
snaptime$$      date(7)
dmltype$$       varchar2(1)
old_new$$       varchar2(1)
change_vector$$ raw(255)
xid$$           number(22)

the only difference is the new column xid$$ (transaction id) that uniquely identifies the transaction that made the changes to the row. For the curious, the number is a combination of the elements of the triplet (undo segment number, undo slot, undo sequence); it is simply the binary concatenation of the three numbers shifted by (48, 32, 0) bits respectively (as checked in the script).

The xid$$ column is used by the 11gR2 on-commit fast refresh engine, which can now easily retrieve the changes made by the just-committed transaction by its xid; at the opposite, the on-demand fast refresh one keeps using snaptime$$ as it did in previous versions. I will speak about this in more detail in an upcoming post.

Commit SCN-based MV logs (the "new" type in 11gR2)

Let's recreate the same MV log, this time adding the commit SCN clause (new in 11GR2):

create materialized view log on test_t1
with sequence, rowid, primary key (x1), COMMIT SCN
including new values;

The columns of the MV log are:

pk1             number(22)
x1              number(22)
m_row$$         varchar2(255)
sequence$$      number(22)
dmltype$$       varchar2(1)
old_new$$       varchar2(1)
change_vector$$ raw(255)
xid$$           number(22)

so, the only difference from the 11gR2 timestamp-based case is that snaptime$$ is no longer a column of the MV log; the only difference from the pre-11gR2 is that snaptime$$ has been replaced with xid$$.

For this log flavour only, the mapping between the xid that modified the table and its commit-time SCN is now tracked in a new view, all_summap (probably named after "SUMmary MAP", "summary" being yet another synonym for "MV"), which is (as of 11.2.0.1) a straight "select *" of the dictionary table sys.snap_xcmt$. To illustrate, the script makes one insert, one update and one delete on the base table, which translates into 4 rows inside the MV log with the same xid:

SQL> select distinct xid$$ from mlog$_test_t1;

                XID$$
---------------------
     1126024460895690

after the commit, we get

SQL> select * from all_summap where xid in (select xid$$ from mlog$_test_t1);

                  XID COMMIT_SCN
--------------------- ----------
     1126024460895690    2885433

hence, it is now possible to know the infinite-precision time (the SCN) when every modification became visible to an external observer (the commit SCN) by simply joining the MV log and all_summap (or sys.snap_xcmt$). Note that the commit SCN is not propagated to the MV log at all.

commit SCN-based MV logs for on-demand fast refresh

This new xid$$ column and commit-SCN mapping table are leveraged by the fast refresh of on-demand MVs as follows (on-commit ones do not need the SCN as they know exactly the xid of the committed transaction; again we will see that in an upcoming post).

With "old style" timestamp-based MV logs, the refresh is performed by using a "mark-and-propagate" algorithm, which is essentially (check this post for some additional details):
1) new log rows are inserted with snaptime$$=4000 A.D;
2) at refresh time, a snapshot of the new rows is taken, that is, all new rows are marked with snaptime$$=sysdate;
3) all modifications whose snaptime$$ is between the date of the last refresh (excluded) and sysdate(included) are propagated to the MV;
4) all obsolete log rows are deleted, that is, all rows whose snaptime$$ is less than or equal the lowest of all refresh times are removed from the log.

With "new style" SCN-based MV logs, the algorithm is, instead:
1) new log rows are inserted with xid$$=transaction id of modifing transaction;
2) at refresh time, the current SCN is retrieved (no snapshot is performed);
3) all modifications whose xid maps to a row in all_summap whose commit_scn is between the SCN of the last refresh (excluded) and the retrieved current SCN(included) are propagated to the MV;
4) obsolete rows are removed from the log as before, this time using the SCN instead of snaptime$$.

The main advantage is that the snapshot is not performed, thus removing the redo and undo generated by the update, and obviously the log visit (usually a full table scan) as well - at the cost of an additional join with all_summap (or sys.snap_xcmt$) later; if the join is calculated efficiently, that is very likely advantageous "in general" (but as always, it depends on your scenario).

It might be (rarely) beneficial to index xid$$, as it is (rarely) beneficial to index snaptime$$. In that case, having no snapshot performed reduces both the undo and redo generated for the index maintenance.

As a side and "philosophical" note, it is also worth noting that the new logging mechanism records more information - now we know which transactions modified the table and the infinite-precision time (the SCN) of modifications, and this is much more informative about the history of the logged table than the mostly meaningless refresh time contained in snaptime$$. This is definitely a better utilization of storage.

I plan to blog about how the new MV log impact fast refreshes in 11gR2 in the near future, focusing on join-only MVs; so stay tuned if you're interested.

« Previous PageNext Page »

Links (alphabetical order)

Blogroll (alphabetical order)

Blog Aggregators (alphabetical order)


Switch to our mobile site