This documents describes the Xtrace tool, available here.

Introduction

The Xtrace tool navigates and analyzes an Oracle session trace file using a Graphical User Interface composed of three panes:





The top pane lists the SQL statements, the middle one echoes the (augmented) trace lines, and the bottom one aggregates the trace content (currently, it contains various resource profiles).

All the panes contain hyperlinks (or "pointers") that link objects (say, a statement, a line, a profile element) to other related objects (for example, a db-call line to its statement, a WAIT line to the db-call that waited on the event, and vice-versa; or a line to its profile, plan, etc etc).

All the panes' content is configurable; both in the sense that you can choose what to see (for example, you might display only EXEC lines and their elapsed time - or only lines belonging to a certain statement) and how to see it (for example, you can color lines differently depending on the statement text).

Tip: you can configure the tool to see the "EXEC flow", that is, the sequence of EXEC db-calls augmented with the actual statement text and the actual bind values. This is actually one of my favourite use for this tool.

The trace lines echoed in the middle pane are augmented by calculated figures whose names start with the letter "x" (for "eXtended"); for example "xela" which is the sum of the "ela" figures of all WAIT lines attributed to a db call. Other x-figures contains the above mentioned pointers.

All of the panes can be dumped as text-only files, or even images.

We are going to illustrate the tool output by example in the following. Please note that all examples are fully functional, i.e., you can interact with them and experiment by clicking on the pointers, resizing, painting etc etc; simply click on the "Launch" button placed under each screenshot.

We are not going to describe the trace file format, that is wonderfully illustrated by [Millsap], together with many concepts and techniques referenced in this manual.

A walkthrough of the three panes

The tool GUI is composed of three panes: the statements (top), lines (middle) and reports (bottom) ones.

The statements pane





This pane contains the SQL and PL/SQL statements found in the trace (right tab), and their corresponding calculated bound statement (left tab, shown above).

You can navigate from each statement to its bound statement by following the "xbstmt" pointer. Both the statements and their bound statement are identified by their xsqlid and statement progressive number.

Every bound statement is linked to the trace lines (contained in the middle pane) that reference it by the xb pointer tree. Clicking on the xb[f] pointer will highlight the first line that reference the statement; the same goes for the xb[l] pointer that points to the last line.
Likewise, every statement is linked to the referencing trace lines by the xs pointer tree.

The xversions figure is the number of associated statements for every bound statement. All other figures (whose name does not begin with "x") come verbatim from the trace file (such as sqlid, hv, etc).

You can change the color of the lines referencing a (bound or not) statement, and the statement itself, by using the "set color" popup menu; you can choose to change the color of the statement, of the other statements, or all statements whose text matches a LIKE-expression.
You can also choose to completely hide the lines/statment by acting on the "make invisible" popup menu, that works similarly.

This possibility to color/hide is extremely powerful, since you can, for example, color differently all lines that reference (or not) a certain table/view in which you are interested, thus drastically filtering the overwhelming amount of information contained in trace files.

The lines pane





This pane (the main pane) contains the (augmented) lines of the original trace file, plus others that are added by the tool (see the Virtual Calls section).

options button

At the top of the pane, the "options" button open the panel that controls the visibility of the elements of the pane. The panel allows you to hide or display any trace figure (e,c,dep,etc), any x-figure (xe,xela,etc) or any line type (PARSE,EXEC,WAIT, etc), thus tailoring the pane output to your needs. Note that all elements are grouped by type, each group sporting a pair of "set all"/"unset all" buttons to make hidden/visible all the groups in one shot.
Some special elements are available as well, for example to display under each EXEC line the BINDS value used for its execution, or the used sql statement.

Note also the "templates" buttons at the bottom, that sets the visibility flags to pre-determined values (such as default ones); note expecially the "Exec Flow Analysis", that enables Exec Flow.

color button

At the top of the pane, the "color" button allows to choose the object type that dictates the color of the lines and the actual visibility (echoed by the text label at the right of the button).
The object type can be the bound statement(the default), the statement or the profile. The latter option can be used, for example, to color in red all the lines that contributes to the top consumers listed in a resource profile.

pane content

Each line is labeled by the tool with a progressive line number (001, 002, etc in the example).

To the left of the line number, the "navbar" (navigation bar) duplicates and colocates together the most important Pointer Trees to allow for quick navigation, using a pretty obvious notation detailed here. The navbar duplicates also the xat pointer (the "S" hyperlink) and the xprof pointer (the "P" hyperlink) at its right edge.

The actual augmented trace line is located to the right of the line number; the format is nearly the same as the original trace line, with the x-figures added. First, the line is indented with spaces by its recursive level (the xdep figure), then the original line type (PARSE, EXEC, etc) and cursor number are printed, followed by the very useful statement and bound statement pointers.

What is printed next depends on the type of the line; the goal has been to print the most useful informations first, while at the same time trying to print the original figures (copied verbatim) in the same order as the original trace file. The x-figures calculated by the tool are:
line typelist of x-figures
PARSING IN CURSORxsqlid
PARSE xe xc xela xelab xeu, xp xphy xphyb xpu, xcr xcu, xc[] xw[] xe[] xb[] xs[], xat, xpic, xprof
BINDS xdep
EXEC xe xc xela xelab xeu, xp xphy xphyb xpu, xcr xcu, xc[] xw[] xe[] xb[] xs[], xbinds, xat, xpic, xprof
FETCH xe xc xela xelab xeu, xp xphy xphyb xpu, xcr xcu, xc[] xw[] xe[] xb[] xs[], xat, xpic, xprof
WAIT xe, xdep, xphy, xw[], xat, xprof
STAT xnexecs, xb[] xs[], xstatn, xplannum, xprof
CLOSE xe xc xela xelab xeu, xp xphy xphyb xpu, xcr xcu, xc[] xw[] xe[] xb[] xs[], xat, xprof
VIRTUAL CALL xdep, xe xc xela xelab xeu, xp xphy xphyb xpu, xcr xcu, xc[] xw[] xe[] xb[] xs[], xprof
XCTEND none
PARSE ERROR xe xc xela xelab xeu, xp xphy xphyb xpu, xcr xcu, xc[] xw[] xe[] xb[] xs[], xprof
ERROR xdep, xe[], xat
RPC CALL xsqlid, xe xc xela xelab xeu, xp xphy xphyb xpu, xcr xcu, xc[] xw[] xe[] xb[] xs[], xprof
RPC BINDS none
RPC EXEC xe xc xela xelab xeu, xp xphy xphyb xpu, xcr xcu, xc[] xw[] xe[] xb[] xs[], xbinds, xrpccall, xprof
UNMAP xe xc xela xelab xeu, xp xphy xphyb xpu, xcr xcu, xc[] xw[] xe[] xb[] xs[], xat, xpic, xprof
SORT UNMAP xe xc xela xelab xeu, xp xphy xphyb xpu, xcr xcu, xc[] xw[] xe[] xb[] xs[], xat, xpic, xprof

The reports pane





This pane contains reports about the trace file, that is, the output of specialized programs that extract information from the trace file and elaborate or aggregate it.
Currently, only the hierarchical profiles have been implemented.

The programs feeds directly from the database version of the trace file, which the other two panes can be considered a straight dump of. Whenever possible, the reports pane is cross-linked and integrated with the pristine data contained in the other two panes, examples being the xprof pointer in the lines pane that points to the profile that the line contributes to, and the line-pointing pointers in the resource histograms of the hierarchical profiles.

Concepts and definitions

Bound Statements

The bound representation of a SQL (or PL/SQL) statement is obtained by replacing all literals with bind variables, and then normalizing (redundant whitespace removed, case changed to lowercase, etc). For example, the two statements
SELECT X FROM T WHERE X = 0
SELECT X FROM T WHERE X = 1
share the same bound statement
select x from t where x=:n
Bound statements have basically the same set of figures of "real" statements, for example a (manufactured) xsqlid.

The rationale behind bound statements

The bound statements are one of the most important features of the tool. In fact, most of the tool focus on the calculated bound statements instead of the "real" statements found in the trace files; for example, the hierarchical profiles aggregate figures by bound statement. This is because a bound statement is the same as a method in a common language, the bind variables values (or literal values of course) being the same as the parameters to the method itself; and it rarely (if ever) makes sense to consider the parameter values when comparing and contrasting, or aggregating, different sections of a methods trace, whatever the traced application language is.

In other words, aggregating by bound statement collects and consolidates the information that might otherwise get dispersed and thus get unnoticed. As a relevant example, consider the common case of an application that incorrectly does not use bind variables and hence issues 1,000 different statements with 1,000 different literal values, each one consuming 0.07% of some resource - if we would not group by bound statement in the profiles we would miss 70% of the resource consumption, since a lot of small consumptions of 0.07% gets easily unnoticed, while a big one of 70% definitely attracts immediately the analyst attention.
Last but not least, grouping allows us to see that we are actually seeing the same method, just called 1,000 times, not 1,000 different methods, hence greatly reducing the analysis effort.

Bound statements as methods

Consider this statement:
select * from customers where name = :n; 
The bind variable :n is the input parameter to the SQL code, code that specifies the database to "retrieve the record whose name is equal to the input parameter and pass it back as an (implicit) output parameter". In this sense, the SQL statement is a method, as it provides a perfectly defined set of functionality to the client; the fact that is not compiled ahead of time but interpreted at run-time (compiled on the fly into an execution plan and then run in the SQL Engine virtual machine) is inessential.

Consider now a client that uses (correctly or incorrectly) literals instead of bind variables:
select * from customers where name = 'JOHN'; 
select * from customers where name = 'MARY';
what is asking is exactly the same in both cases, it has simply provided the input parameters "in-line"; the essential of what the client is specifying to be done is captured by the bound statement
select * from customers where name = :s; 
The bind variable value of course may give more information, but in almost every case this is not the case and would only mask the relevant information by providing unnecessary details (e.g. check this example).

The same discussion, of course, goes for PL/SQL statements (anonymous blocks, that is the only type of PL/SQL statements a client can issue):
begin :z := 42 + my_stored_procedure (:x); end
in this case, the similarity with a method is even more obvious.

xsqlid

The xsqlid is very similar to the sqlid figure introduced in Oracle 10g (in fact the xsqlid actually contains the sqlid if available): it is a statistically-unique string identifier that maps to a statement (or bound statement).

For statements, if the sqlid is available in the trace file, the xsqlid is simply the sqlid prefixed by the first three letters of the sql statement, in uppercase. For example, for a select statement with the sqlid='gsya1fnahrtpk', the xsqlid is 'SELgsya1fnahrtpk'. Thus, it is basically the sqlid with the statement type made more obvious by the prefix.

If the sqlid is not available (e.g. for 9i traces), the hv (hash value) figure (to be found in the PARSING IN CURSOR section) is used instead of the sqlid. For example, for an insert statement with hv=2059022544, the xsqlid is 'INS2059022544'.

In the very rare occurrence in which neither the sqlid nor the hv is available, the statement text is fed to an internal MD5-based hash function, its output converted to a string with the same base characters as Oracle's sqlid but in uppercase. For example, for a begin-end block, it might be 'BEGU1PKBFB13JT2'.

For bound statements, the same internal MD5-based hash function is used on the bound statement text, and then the output is prefixed by ":" followed by the usual three letter, but in lowercase. An example for a select statement might be ':sel2U49229L3WZXD'.

In addition, for both statements and bound statement, if a collision is detected (two or more different statements that share the same xsqlid are found), a progressive postfix is added to the colliding xsqlid (starting with the second); hence you might see (very rarely, and probably almost never for 10g+ traces) xsqlids such as 'SELgsya1fnahrtpk', 'SELgsya1fnahrtpk_1', 'SELgsya1fnahrtpk_2', etc. Thus, note that the xsqlid perfectly uniquely identifies a statement inside a trace file, and the part before the postfix only statistically uniquely identify a statement across trace files (as the sqlid does).

statement progressive number

The statement progressive number is simply a number linked to (different) statements, incremented by 1 whenever a new statement is found in the trace file. For bound statements, the number is printed prefixed with ":".
This progressive number is obviously inferior to the xsqlid, since the latter (statistically) uniquely identify the statement across trace files, while the former is meaningful only inside a particular trace file. The progressive number is used only because it is much shorter and hence consume less precious screen real estate, in places (e.g. "pointers") where using the xsqlid would be useless or redundant.

Pointer trees

One of the tool main features is the ability to automatically associate (or "attribute") lines to either other lines or statements. For example, WAIT lines are attributed to the (parent) db-call that caused them; recursive db-calls are associated to their (parent) db call that issued them; almost every line is associated to both its statement and bound statement, etc.

The association (attribution) is implemented with a "parent pointer", that is, an x-figure that contains either the parent line number or the parent (bound) statement number (aka statement progressive number). For example, the parent of a WAIT line is contained in the "xwt" (X Wait parenT) figure; the parent db-call in the "xct" (X Call parenT) figure, etc.

The set of lines that are associated with the same parent are, quite obviously, named either the "sibling chain" or the "children". The previous and next sibling pointers are contained in other two x-figures. For example WAIT lines have the "xwp" (X Wait Previous) and "xwn" (X Wait Next) x-figures; db-calls have the "xcp" and "xcn" x-figures; etc.

Parents have also two additional figures that points to the first and last children. For example, db-calls have the "xwf" (X Wait First) and "xwl" (X Wait Last) x-figures; db-calls have also the "xcf" and "xcl"; etc.

Note: null pointers are represented by the number zero, since all line numbers and statement progressive numbers are strictly greater than zero.

So to recap, for every association type a set of five pointers (a pointer tree) is defined: the parent, the previous sibling, the next sibling, the first child and the last child. The currently defined pointer trees are:
association typeparentprevious siblingnext siblingfirst childlast child
bound statement associationxbstmtxbpxbnxbfxbl
statement associationxstmtxspxsnxsfxsl
recursive callsxctxcpxcnxcfxcl
WAIT lines attributionsxwtxwpxwnxwfxwl
ERROR lines attributionsxetxepxenxefxel
Note the naming convention, which is usually consistent: the only exceptions are "xbstmt" and "xstmt" (which should be really named "xbt" and "xst").

The pointers are always displayed grouped together by square brackets, for obvious visual convenience; null pointers are not displayed at all. So for example, if xct=0,xcp=11,xcn=22,xcf=0 and xcl=0, on the screen you will see xc[p=11 n=22].
The "xbstmt" and "xstmt" pointers are an exception; they are so important that they get special treatment, that is, they are printed always near the beginning of the line, separated by a slash. So if xbstmt=11 and xstmt=99, on the screen you will see xbstmt=:11/99 (note the usual colon near the bound statement progressive number).

Note also that all major pointers are grouped together in the "navbar" at the left of every line as well, to make it easier to navigate the pointer trees. The navbar is divided into four sections corresponding to the four most important pointer trees (associations) types (b,s,c,w). In each section, each pointer is implemented by a single-letter hyperlink with its name (t,p,n,f,l). So for example, the xwt pointer is is to be found in section "w" and it is named "t".
Null (zero) pointers are not shown and are left as blank, in order to drastically reduce the visual clutter.

The Xb (bound statements) Pointer tree

This pointer tree links all lines with their bound statement. So, you can navigate from a line (in the middle pane) to its bound statement (in the top pane) by following the xbstmt pointer, or navigate to the next/previous line by following the xbp/xbn pointers; from the bound statement, you can navigate to the first/last line in the trace file that uses the bound statement by following the xbf/xbl pointers. Hence, you can visit all the points of the trace that are associated with the bound statement you are interested into.

To build the tree, the following (conceptual) algorithm is used. Each line has a cursor number (e.g. FETCH #3), and the statement text is contained into the first previous PARSING IN CURSOR line with the same cursor number (e.g. PARSING IN CURSOR #3); hence it is rather easy to calculate the bound statement text from the PARSING IN CURSOR line, and then associate it with the line. Note: if the statement text is not known (usually because the trace file was truncated), the line is associated with the "statement unknown" bound statement.

The Xs (statements) Pointer tree

This pointer tree is structured and built like the previous one, obviously considering the original statement text, not the calculated bound statement text.

The Xc (calls) Pointer tree

This pointer tree links all db-call lines (PARSE, EXEC, FETCH, CLOSE, etc.) with their recursive db-calls. If the db-call was issued directly by the client, the parent (xct) pointer is set to null (zero); otherwise, it is set to the line number of the (parent) call. So, you could, for example, navigate all calls made by the client by following the xcn (or xcp) pointer, ignoring all recursive calls; or, you can drill down to the recursive calls of any call of interest by following the xcf/xcl pointers.

For example, in the main example, the PL/SQL anonymous block with xsqlid=':dec1COLA8D4K9RPW' is submitted directly by the client (hence the xct pointers of its PARSE,EXEC,FETCH etc are all zero).
The EXEC db-call of this block causes a series of recursive db-calls on the two statements "SELECT X FROM T WHERE X = 0" and "SELECT X FROM T WHERE X = 1"; hence, all the db-calls of these two statements have an xct that points to the EXEC line of the anonymous block.

The algorithm used to build the tree is (conceptually) very simple: for every possible value of the figure "dep", a double-linked list of db-calls is maintained (say, list[dep]). When a new db-call is found while parsing forward the trace file, say with a dep=d, the db-call is linked as the next sibling of list[d], and list[d+1] is moved as the child chain of the current db-call.
Note: if a non-empty list[x], with x > d+1, is found, an 'phantom call' Virtual Call is injected (with xdep=d+1), since this means that an uninstrumented db-call has occurred. Moreover, if the process starts to wait for the client (on the events 'SQL*Net message from client', 'SQL*Net message from dblink', 'PX Idle Wait', etc - see also the discussion about the 'wait-for-client' Virtual Call), and the list[1] is not empty, a 'phantom call (dep=0)' Virtual Call is injected (with xdep=0), since that means that an uninstrumented dep=0 call was issued (this happens, for example, in 11g when some database triggers are fired). Waiting for the client closes all active list[x] as well, by injecting as many 'phantom call' Virtual Calls as necessary: by definition, if the process is waiting for the client to be informed about the next thing to do, all previous calls must have finished.

The Xw (WAIT lines attributions) Pointer tree

This pointer tree links all WAIT lines with the db-call line that caused (is responsible for) the wait event, thus linking the user actions (the db-calls) with their resource-consuming consequences (the wait events). This is obviously of paramount importance, and in fact this pointer tree is one of the most useful ones.

WAIT lines are almost always attributed forward (that is, to a db-call that occurs after the WAIT line in the trace file), but in some infrequent case they can be attributed backward. Forward attribution is signaled with "+WAIT", backward attribution with "-WAIT". For more details, see rationale for backward attribution section.

So you can see the db-call the WAIT line has been attributed to by following the xwt (wait parent) pointer, or, from the db-call, you can get all the attributed WAIT events by following the xwf(xwl) pointer, and then iterate forward(backward) on the child chain by following the xwn(xwp) pointer.
For example, note how the WAIT lines for the 'db file scattered read' event are attributed to the FETCH line for the statement "select x from t" in this little classic example:



Note also that WAITs for the client (such as "SQL*Net message from client" or "SQL*Net message from dblink") are always attributed to the 'wait-for-client' Virtual Call.

The wait attribution algorithm in detail is (conceptually) as follows:
  1. check the cursor number of the WAIT line, and identify the previous and next db-call with the same cursor number, without crossing the border of the current documented client request - that is, looking only inside the line set delimited by the previous and next wait for the client (those linked with a 'wait-for-client' Virtual Call).
  2. attribute the wait (backward) to the previous db-call if one of these apply: If no previous db-call exists, the WAIT line is attributed to the latest 'unattributed WAITs' Virtual Call.
  3. otherwise, attribute the wait (forward) to the next db-call.

The Xe (ERROR lines attributions) Pointer tree

This pointer tree links all ERROR lines to the db-call that raised the error; so you can find the line that caused the ERROR by following the xet pointer, and the first/last ERROR lines associated with the db-call by following the xef/xel pointers; obviously, when no error is associated with the db-call, both xef and xel are set to null (zero).
Note: multiple ERROR lines for the same db-call are supported by xtrace, although I haven't found yet a real trace with more than one ERROR line for the same db-call.

In the following example, note how the EXEC of the insert statement caused is associated with the ERROR line, the figure "err" being equal to 12899 (ORA-12899 is "value too large for column"):




The algorithm for building this pointer tree is trivial: the ERROR line is associated with the nearest previous db-call with the same cursor number.

Rationale for backward attribution

Consider the following trace fragment:
1: EXEC #11:c=30043,e=55177,p=0,cr=0,cu=2,mis=1,r=0,dep=0,og=1,tim=21395238121
2: ERROR #11:err=12899 tim=2139307
3: WAIT #11: nam='SQL*Net break/reset to client' ela= 6 driver id=1413697536 break?=1 p3=0 obj#=49815 tim=21395247806
4: WAIT #11: nam='SQL*Net break/reset to client' ela= 1610 driver id=1413697536 break?=0 p3=0 obj#=49815 tim=21395253026
5: WAIT #11: nam='SQL*Net message to client' ela= 7 driver id=1413697536 #bytes=1 p3=0 obj#=49815 tim=21395257121
6: WAIT #11: nam='SQL*Net message from client' ela= 3328 driver id=1413697536 #bytes=1 p3=0 obj#=49815 tim=21395286485
It is impossible to forward-attribute the first three WAIT lines, for the simple reason that there is no db-call after the three WAIT lines. The fourth line is, in fact, a wait on the client connection, that signals that the process is waiting for something to do, after having finished everything that the client had previously asked for.
Moreover, it is obvious that the three WAIT lines are caused by the EXEC failure, that has raised the ERROR line; the two 'SQL*Net break/reset to client' events tell the client that the EXEC db-call has been interrupted and the process state reset, and the 'SQL*Net message to client' is the standard wait that happens when the connection is "slow" and the write on the connection (socket) by the process takes a measurable amount of time.
Hence, it is logical to attribute the WAIT lines to the failed EXEC, so that the resources consumed to service the EXEC failure will be debited to the EXEC itself - even if this means to attribute backward a WAIT that occurred after the EXEC db-call ended, that is, a WAIT that was not directly waited on by the EXEC db-call.

The current version of wait attribution algorithm is able to backward-attribute the three WAITs to the failed EXEC:




Note that the wait attribution algorithm is heuristic, and hence not perfect, albeit it handles gracefully the vast majority of interesting scenarios I've seen in practice.

Side note: back to the example above - the kernel db-call servicing code structure is probably, in pseudocode, the following:
  dbcall_network() {
    try {
      dbcall() -> emits EXEC #11
    } catch (...) {
      send_break_reset() -> emits ERROR #11, WAIT #11: nam='SQL*Net break/reset to client'
    } 
    write_to_socket() -> emits WAIT #11: nam='SQL*Net message to client'
  }
That is, the kernel-traced db-call function is not the one that is called "from the network" (dbcall_network) but the internal one (dbcall); hence the network-supporting code (that includes the sending back of the result on the connection) produces lines that appear after the db-call end line.

Virtual Calls

"Virtual Calls" are output lines that are created by the tool and that do not exist in the original trace file (hence the adjective "virtual"). Their main purpose is to make the file structure regular, by, for example, allowing to build Pointer Trees with no exception case even if the original trace file is damaged or inconsistent (e.g. due to uninstrumented code in the kernel). For example, any recursive db-call in the output has always a parent, which is normally a db-call found in the original trace file or, lacking that, a virtual call artificially (virtually) inserted by the tool. In a sense, virtual calls are injected by the tool to repair every missing branch (usually because of an uninstrumented section in the kernel code) in the pointer trees, while at the same time marking the existence of such a missing branch.

Having a regular pointer trees structure is very important, since the tool architecture is based on a first step that "parses" the trace file and populates a database with the extracted infos, and then a second step where a set of modules navigates the database and produce the output reports. For example, the statements and lines panes contain a simple straight "dump" of the database; the reports pane contains, instead, rather complex aggregations/transformations of the database information.
Needless to say, a regular pointer trees structure makes writing the reporting modules much easier, less error-prone and much more maintainable.

Of course, many virtual calls also organize, and make more readable, the tool output.

Virtual Calls mainly related to the db-call pointer tree

The following virtual calls are created mainly to repair the Xc (calls) pointer tree: Further details about both types of virtual call can be found in the Xc tree algorithm explanation.

The following example illustrates the 'phantom call' virtual call. It has been obtained by commenting out a dep=1 EXEC db-call; the tools steps in and repair the Xc pointer tree by inserting an xdep=1 'phantom call' virtual call.




Virtual Calls that link exception lines

When a line cannot be successfully linked in some pointer tree, or is damaged for whatever reason, it gets linked to one of the following virtual calls. Hence, by starting from one of these special virtual calls (or from the statement or bound statement that links every kind of these lines) and following the usual pointers, you can visit and examine all these (hopefully few) exception lines; at the same time, these potentially alarming lines are automatically factored in the reports output and easily recognized as linked to these easily distinguishable special virtual calls.

The wait-for-client Virtual Call

A WAIT on the event 'SQL*Net message from client' (and the equivalent 'SQL*Net message from dblink') means that the server process is idle, waiting for the next client call (actually it waits on the client-process connection, until any byte arrives through the connection coming from the client). Very frequently a whole set of adjacent lines of this kind of event is found, since long waitings are usually interrupted (e.g. by timeouts) to allow the server process to perform various types of periodic housekeeping or checks, after which the process resumes waiting.

The tool creates a 'wait-for-client' virtual call after every such set of wait events, and attributes all the WAIT lines of the set to this virtual call (you can see this happening in all the examples provided in this manual). We are going to discuss the advantages below, but you can picture this call as a silent(implicit, virtual) db-call issued by the client ordering the server to wait for the next db-call, to which the server responded by actually WAITing.

The tool does the same for other similar events; for example, the event 'PX Idle Wait' for parallel processes, that signals that the parallel process is idle waiting to be associated with a Query Coordinator (i.e. the next client) to run the next parallel statement. Another example is the event 'rdbms ipc message', on which background processes (notably LGWR) wait while not servicing requests. An internal configuration file dictates which events to handle this way.

Thus, the 'wait-for-client' virtual call: Note that the "idle" time (time elapsed not working for the client, which is the client think/idle time plus the client-to-server transmission overhead) is accounted automatically as the duration of the 'wait-for-client' profile consumer group.

It is also worth noting that the tool parser uses this kind of events to improve the tool analysis, by "resetting" the parser state whenever a wait-for-client event (Virtual Call) is detected. For example, it is obviously impossible that anything (a WAIT line, a db-call, etcetera) can be attributed "across" this kind of events, and the parser has been coded accordingly to leverage this knowledge in its internal algorithms (see the descriptions of the various Pointer Trees building algorithms).

The client-message-received Virtual Call

The 'client-message-received' virtual call is inserted as the next line of each 'wait-for-client' virtual call, to stress further the point in time when the traced process started working on a client message after a period of inactivity. It can be turned into an 'unattributed WAITs' virtual call if necessary, thus signaling the more important presence of unattributed waits in the tracing of the current client message servicing code.
This message is the norm in many background processes (e.g. LGWR) traces, since background processes are not called by any client and thus get all their non-waiting-for-client WAIT lines attributed to this virtual call.

The null Virtual Call

The 'null call' virtual call has to be ignored. It is sometimes inserted to simplify the tool algorithms, and it has no meaning at all.

Measures

For every db-call, the five native measures performed by the Oracle kernel and found in the trace file are:
measuremeaning
eelapsed time
cCPU time (time spent using the processor)
pphysical blocks (number of blocks physically accessed on disk)
crconsistent reads (number of blocks accessed in the library cache in consistent mode)
cucurrent reads (number of blocks accessed in the library cache in current mode)

For every WAIT event, the kernel measures the wait duration (the value of the figure "ela").
For some WAIT events (such as 'db file sequential read', 'db file scattered read', etc), the number of physical blocks accessed are recorded by the kernel and reported in one of the parameters (p1,p2,p3) of the event; the tool extracts and makes more evident this measure in the figure "xphy" of the WAIT line.

These seven native measures are used by the tool to derive the x-figures detailed below.

xe,xc,xp,xcr,xcu

For each of the five db-call native measures (e,c,p,cr,cu), the tool defines a corresponding x-figure (xe,xc,xp,xcr,xcu) that replaces the native measure in all the tool calculations. These figures are a slight generalization of the native ones and are needed in order to
  1. provide an estimation of the measures for line types that have no native measures (PARSE ERROR, VIRTUAL CALL, RPC CALL) in the current Oracle versions;
  2. support the WAIT lines backward attribution.
The first purpose is very simple; line types with no native measure CAN have child lines associated. For example, a PARSE ERROR line can have (and usually has) associated all recursive db-calls issued up to the point of the parse failure; and a VIRTUAL CALL, by definition, can have associated almost everything.
For these lines, (xe,xc,xp,xcr,xcu) are defined as being the sum of the corresponding (xe,xc,xp,xcr,xcu) of all associated child lines - this provides an approximation (a lower bound) of what the actual figures would be if the kernel had provided the native measures.

The second purpose is a bit more complex. Backward attribution is performed, as explained better in the rationale for backward attribution discussion, in order to attribute some WAIT lines that occurred after a db-call ended but that were caused by the db-call; in essence, the tools "extends" the db-call to embrace the backward-attributed WAIT line. To factor in the WAIT line, since obviously the "e" and "p" figures of the db-call do not factor in the WAIT line, the WAIT "ela" is added to the db-call "xe", and the WAIT "xphy" to the db-call "xp".

In pseudocode, the calculation algorithm is Note that if no backward attribution is performed (as it is almost always the case), (xe,xc,xp,xcr,xcu) are simply equal to (e,c,p,cr,cu) for lines that have native measures, and are the sum of the (e,c,p,cr,cu) of the recursive children for all the others.

recursive (and self) xe,xc,xp,xcr,xcu

For each line that has a set of measures (xe,xc,xp,xcr,xcu), the tool automatically maintains a set of "recursive measures" (xre,xrc,xrp,xrcr,xrcu) defined as being the sum of (xe,xc,xp,xcr,xcu) of the recursive children (first level only). The difference between the two sets, namely
self(xe,xc,xp,xcr,xcu) = (xe-xre, xc-xrc, xp-xrp, xcr-xrcr, xcu-xrcu)
is defined as being the "self" measure, a definition invented by [Millsap].

This set of related figures is printed by the tool using a concise notation, to save space on the screen; for example
xe=100/10/self=90
means that xe is 100, xre is 10, and self is 90=100-10.

All the three figures are explicitly printed because all of them can be very useful (or not) depending on the purpose of the analysis.
For example, consider an EXEC of a PL/SQL block; if the analyst is examining the trace file with the overall goal of improving response time to the user, and she is interested only in evaluating the advantage of avoiding altogether the call, she will be interested only in the overall elapsed time (xe); if she wants to optimize the call, she will of course be interested in knowing whether the most elapsed time is due to the SQL contained in the block (xre) or the execution inside the PL/SQL Engine self(xe) = (xe - xre). Of course the same holds if the overall goal is to reduce the resource consumption of the CPU (xe, xrc), of the disk (xp, xrp), or of the library cache accesses (xcr, xrcr and xcu, xrcu).

xela, xelab, xphy and xphyb

Whenever a WAIT line is attributed to a db-call, the wait elapsed time contained in the "ela" native measure is added to either the xela figure of the db-call, if the WAIT was forward-attributed, or to the xelab (xela backward) if the WAIT was backward-attributed.

Similarly, the xphy measure of the WAIT line (the number of disk blocks accessed by the wait operation, contained in one of the p1,p2,p3 event parameters) is added to either the xphy or xphyb (xphy backward) figures of the db-call.

Hence, xela+xelab contains the time spent waiting on all the wait event caused by the db-call, and xphy+xphyb the disk blocks accessed while waiting. Xela (and xphy) factors in the forward-attributed waits only.

Needless to say, knowing the wait time of a db-call is of paramount importance in most of the trace files analysis.

Unaccounted-for measures: xeu and xpu

The unaccounted-for elapsed-time (xeu) for a db-call is the fraction of elapsed time that is not accounted as time spent performing recursive work, using the CPU or waiting for some event. Its mathematical definition is
xeu = xe - xre - (xc - xrc) - (xela + xelab) 
    = self(xe) - self(xc)   - (xela + xelab)
As explained in [Millsap], xeu is the sum of the time spent waiting for the CPU (i.e. on the OS scheduler "ready" queue) plus all the measurement errors of the factors xe,xre,xc,xrc,xela,xelab plus missing time due to uninstrumented piece of code. That is, beside measurement errors and uninstrumented kernel code, xeu is non-zero only if the CPU load was so high that the Oracle process had to wait its turn to use the CPU.

The unaccounted-for number of physical blocks accessed (xpu) is the fraction of disk blocks that was recorded as been accessed (by the measure xp) but that weren't accounted in the attributed wait events (by the measure xphy from p1,p2,p3). Its mathematical definition is
xpu = xp - xrp - (xphy + xphyb)
    = self(xp) - (xphy + xphyb)
A non-zero value means that some disk-accessing code was uninstrumented or that the number of blocks accessed was not propagated in one of the p1,p2,p3 wait event parameters.

Xprof

The xprof identifier uniquely identifies the nested profiles contained in the reports pane; it is a meaningless positive integer (equal to zero for the root profile). It is shown in curly brackets to differentiate it better from the other pointers, and thus to allow easier searching in the plain text version of the xtrace tool output.

Every trace line that contributes to a nested profile (for example, a WAIT line that contributes with its "ela" figure to a "xe" profile) has an "xprof" pointer that points to the contributed profile section (obviously located in the reports pane). This way it is possible to immediately check, for example, how much the current line resource consumption impacts the overall profile. This pointer is also replicated in the "navbar".

Xat

The xat (Stat Parent) pointer points to the STAT line group that is associated with the trace line (if available); this way, the plan (and associated statistics) is immediately available for every line. The STAT association is made very conservatively; in order to be associated with a line, all the trace lines in-between must have not recorded a miss in the library cache (mis=0).

Xdep

Every trace line has associated the xdep figure, which is the recursive level (starting with zero) that the line was emitted under. This figure is the same as "dep" if the latter is available for the line, otherwise it is calculated by replicating the value of "dep" of the first previous line associated to the same cursor.

Xpic

The xpic pointer points to the line number of the PARSING IN CURSOR line associated with the current line. It is currently defined for PARSE and EXEC lines.

Xrpccall

The xrpccall pointer points to the line number of the RPC CALL line associated with the current line. It is currently defined for RPC EXEC lines.

Xbinds

The xbinds pointer points to the BINDS line associated with the EXEC line (or zero if no BINDS section is associated), that is, it points to the section containing the bind variables values used by the execution of the cursor.

Xplannum

The xplannum is a progressive id associated by the tool to every plan; every time a different plan is found (in the STAT lines), a new xplannum is generated. Two plans are the same if they have exactly the same row source operations (op) and access the same objects (obj).

Xnexecs

This figure belongs to STAT lines and it records the number of EXEC calls that are associated with the STAT group, that is, the number of times that the STAT plan has been (re)executed since it was associated with the cursor. For example :
PARSE #2 mis=whatever
 EXEC #2 mis=0
 EXEC #2 mis=0
 STAT #2 xnexecs = 2
This figure is used to compute the average of the various statistics associated with the row source operation:
avg(statistic) = statistic / xnexecs  (e.g. avg (cnt) = cnt / xnexecs)
since of course, this kind of average can be compared directly with the CBO estimations.
Indeed, avg(cnt) correlates directly with card, and avg(cr) correlates, albeit much more weakly in recent Oracle versions, and for queries only, with the cost.

Xstatn

The xstatn pointer points to the next STAT line of the group of (usually) consecutive STAT lines that record the cursor plan and related statistics; it is null(zero) for the last STAT line. Note that STAT lines are not always consecutive; sometimes they are intermixed with recursive calls performed to complete the to-be-printed informations (such as to retrieve the object names not available in the cursor plan memory structures).

The hierarchical profiles

By definition, a (resource) profile is the list of the consumers of a resource, each one listed with the amount of resource consumed. The list is always printed ordered by decreasing amount (strongest consumers first), since when investigating the resource consumption, the analyst obviously concentrates on the biggest contributors first ("low hanging fruits").

The tool offers, in the reports pane, profiles for five types of resources: duration (xe), CPU (xc), physical block reads (xp), consistent block reads (xcr), current block reads (xcu). Each one has a dedicated tab inside the pane.
Here is an example of an xe profile:




Just after the hierarchical profile, the corresponding flat profile is printed (not shown in the picture above).

consumer groups

The actual resource consumers are the (non-comment) trace lines; for example, db-calls can consume any type of resource, WAIT lines can consume duration and, in some cases, physical reads. A profile at this low granularity level is usually not very useful, hence the tool (as all profilers do) aggregates the actual consumers by some common characteristics into consumer groups, and reports, for each consumer group, the total amount of resource consumed and other sometimes useful informations such as the number of lines belonging to the consumer group (count), the average resource consumed (avg) and the resource histogram.

nested profiles

The report is composed of a hierarchy of nested profiles, each one identified by the xprof identifier. Clicking on a consumer group (more precisely, clicking the xprof link column) opens another profile that further subdivides (drills-down) the consumer group resource amount into (sub)consumers groups, and so on, recursively, until a consumer group is no longer further divisible.
Appended to each profile you can also find other informations about the profiled consumer group; currently, you can find the execution plans (if available and applicable) and the list of associated ERROR lines.
The idea of using nested profiles was taken from [TVD$XTAT], also described in [Antognini].

root profile

The root profile (the first printed at the top, with xprof equal to zero) aggregates all lines that are directly attributable to the client, that is, whose xdep is equal to zero; the remaining lines are aggregated in the nested profiles. Hence the root profile consumer groups are aggregations of: Hence, by examining the root profile you can immediately see, for example, which db-call was the most expensive, and then drill-down on it; or how much resource was consumed waiting for the client versus working for the client; or simply see the application signature (the set of db-calls issued directly by the client).

Db-calls are always grouped by bound statement and call type (PARSE, EXEC, etc). The reason why we group by bound statement is discussed in general and in particular for profiles in the importance of bound statements section.
Grouping by call type (PARSE, EXEC, FETCH, etc) is performed because different call types are very different operations, and issues are better diagnosed by examining them separately. For example, knowing that a statement is "slow" does not tell us a lot; it's only when we see that its FETCHes are slow that we start investigating about a bad plan or slow disks, at which point all the other contributors (PARSE, EXEC, etc) are only noise and are better factored-out.

To see an example of a root profile, notice in the example how the topmost consumer group is "exec (:dec1COLA8D4K9RPW)", that is, the EXECutions (actually a single EXECUTION as the "count" column shows) of the bound statement with xsqlid=":dec1COLA8D4K9RPW" (which is, by the way, an anonymous block); this group accounts for about 85% of the trace file duration.
The second consumer group is "virtual call (wait-for-client)", that is, the time spent waiting for the client (in this example, WAITing for a "SQL*Net message from client"); it accounts for about 17% of the trace file duration and it aggregates two ("count") lines.

nested profiles of the root profile

Clicking on a root consumer group (i.e. a consumer group listed in the root profile) opens the nested resource profile for the clicked consumer group. This nested profile, in order to produce the (sub)consumer groups that show the distribution of the resource amount of the root consumer group, aggregates measures coming from both the child lines associated to the root lines (i.e. the lines aggregated in the root consumer group) and the root lines themselves.

In detail, the child lines are: And the measures coming from the root lines are: As an example of computation for a self measure, consider the processing of the self(xc). To recap, all db-calls reports the amount of CPU they used, and the tool is able to calculate the "self CPU" self(xc), that is, the amount of CPU the db-call used to execute its own kernel code (factoring out the CPU used to execute the recursive db-calls). Each root line whose self(xc) is not zero adds its self(xc) to the consumer group named "self cpu != 0" in the nested profile - this tells how much of the root consumer group duration was spent on the CPU. The same is performed, of course, for the other relevant self measures self(xcr) and self(xcu).

Similarly, the unaccounted-for measures xeu (unaccounted-for duration) and xpu (unaccounted-for physical block reads) of the root lines whose values are not zero are added to the nested profile consumer group named "unaccounted duration != 0" and "unaccounted physical blocks != 0".

To see an example of a nested profile, try going to the root profile in the example and clicking the topmost consumer group "exec (:dec1COLA8D4K9RPW)" we talked about previously. You can see that about 50% of the duration of that root consumer group is unaccounted, about 33% comes from the PARSing of the bound statement with xsqlid=":sel1K6Y4JRMZEHWN", and about 17% comes from the FETCHes of the same select.

next-level nested profiles

Of course, a nested profile of the root profile can have nested profiles itself, that you can explore as well. How they are calculated is obvious.

profile resource histogram

The resource histogram, located at the right end of each consumer group, shows the distribution of the amount of resource consumed by the lines that are aggregated in each consumer group. Each bin of the histogram contains a counter that displays the number of lines whose amount belongs to the bin interval (whose upper extreme is indicated in the bin column header; the interval includes the upper extreme and excludes the lower extreme, unless otherwise indicated).

Moreover, for each bin, the line that contributed the most to the bin is recorded; you can navigate to that line by clicking on the counter. This way you can, for example, investigate an outlier in the amount distribution.

profile associated error lines

If the profiled consumer group has some contributors (db-calls) with associated ERROR lines, the error codes (ORA-XXXXX), the number of lines carrying each error code, and the first and last ERROR line pointers are printed.

profile execution plans

If the profiled consumer group aggregates db-calls, all available plans associated (by the xat pointer) to the aggregated lines in the trace file are printed inside the "plan(s)" tab. Of course, there might be more than one plan because db-calls are aggregated by bound statement, and the same bound statement might have different plan associated: the associated statements can have different literal values that might lead to different plans, and in any case, it might always happen that the execution plan gets flushed from the library cache and then differently re-optimized while the process is being traced.

All the available row source operation figures (including the new Oracle 11g "cost","card" and "size") are reported. For the measures "cnt", "cr", "pr", "pw" and "time", the average over all the associated executions (the xnexecs x-figure) is reported as well.

Note that the column order of the plan is designed for optimal visual effectiveness; for example, "avg(cnt)" and "cbo:card" are printed next to each other to help the analyst check whether the CBO estimation was adequate, a check of paramount importance (that has been popularized under the name of "Tuning by Cardinality Feedback" by [Breitling]).

The flat profiles

The flat profile of a hierarchical profile is built by retrieving and consolidating all the "wait", "self" and "unaccounted-for" consumer groups from all the profiles that are nested under the current one. It is printed by the tool under each profile.

Flat profiles are mostly useful to highlight any issue of the "system as a whole" that impacts the current profile. For example, if the flat profile shows a lot of time elapsed while waiting on "db file scattered read", with an average wait of 100ms and an histogram showing a lot of spikes, it is probably worth investigating the sanity of the storage subsystem first, instead of concentrating on the nested profiles.

Exec Flow

The "Exec Flow" is a report that replicates the series of EXEC db-calls as they are found in the trace file, decorated with the SQL (or PL/SQL) text and, possibly, the bind variable values from the BINDS section associated with the exec line. For example:

EXEC : xbsqlid=:upd0WJS0BMCM1WZZ/UPD813658919 dep=1
 update "DELLERA"."MLOG$_TEST_T2" set snaptime$$ = :1  where snaptime$$ > to_date('2100-01-01:00:00:00','YYYY-MM-DD:HH24:MI:SS')
  Bind#0
   kxsbbbfp=05076c13  bln=07  avl=07  flg=09
   value="8/4/2009 16:6:45"
   
EXEC : xbsqlid=:sel0EW7NRBW2B1QT/SEL3133130461 dep=1
 select dmltype$$, max(snaptime$$) from "DELLERA"."MLOG$_TEST_T3"  where snaptime$$ <= :1  group by dmltype$$
  Bind#0
   kxsbbbfp=08d1c7fc  bln=07  avl=07  flg=05
   value="8/4/2009 16:6:45"

The lines are also reported indented by recursive level (xdep), to strees their hierarchy.

This kind of report is extremely useful when investigating how an application interacts with the database (for example, the fragment above is obtained from the trace of a materialized view fast refresh).

You can generate an EXEC flow by clicking the "options" button of the lines pane and then the "Exec Flow analysis" template button. Of course, you can then further customize the output by switching the flags (for example, hide the bind values, or make other figures visible). You can of course change the colors of the lines or hide them (maybe by sql text), etc etc.

References

[Millsap] "Optimizing Oracle Performance", by Cary Millsap with Jeff Holt, 2003, ISBN 978-0596005276
[Antognini] "Troubleshooting Oracle Performance", by Christian Antognini, 2008, ISBN 978-1590599174
[TVD$XTAT] "Trivadis Extended Tracefile Analysis Tool (TVD$XTAT)", by Christian Antognini
[Breitling] "Tuning by Cardinality Feedback", by Wolfgang Breitling


Copyright © 2008 - 2010 Alberto Dell'Era. All rights reserved.