This documents describes the Xtrace tool, available here.
Introduction
The Xtrace tool navigates and analyzes an Oracle session trace file using a Graphical User Interface
composed of three panes:
The top pane lists the SQL statements, the middle one echoes the (augmented) trace lines, and the bottom one aggregates
the trace content (currently, it contains various resource profiles).
All the panes contain hyperlinks (or "pointers") that link objects (say, a statement, a line, a profile element) to other related
objects (for example, a db-call line to its statement, a WAIT line to the db-call that waited on the event, and vice-versa;
or a line to its profile, plan, etc etc).
All the panes' content is configurable; both in the sense that you can choose what to see (for example, you might display
only EXEC lines and their elapsed time - or only lines belonging to a certain statement) and how to see it (for example,
you can color lines differently depending on the statement text).
Tip: you can configure the tool to see the "EXEC flow", that is, the sequence of EXEC db-calls augmented with the actual statement text
and the actual bind values. This is actually one of my favourite use for this tool.
The trace lines echoed in the middle pane are augmented by calculated figures whose names start with the letter "x" (for "eXtended");
for example "xela" which is the sum of the "ela" figures of all WAIT lines attributed to a db call. Other x-figures contains the
above mentioned pointers.
All of the panes can be dumped as text-only files, or even images.
We are going to illustrate the tool output by example in the following. Please note that all examples are fully functional, i.e., you
can interact with them and experiment by clicking on the pointers, resizing, painting etc etc; simply click on the "Launch" button
placed under each screenshot.
We are not going to describe the trace file format, that is wonderfully illustrated by [Millsap], together
with many concepts and techniques referenced in this manual.
A walkthrough of the three panes
The tool GUI is composed of three panes: the statements (top), lines (middle) and reports (bottom) ones.
The statements pane
This pane contains the SQL and PL/SQL statements found in the trace (right tab), and their corresponding calculated
bound statement (left tab, shown above).
You can navigate from each statement to its bound statement by following the "xbstmt" pointer.
Both the statements and their bound statement are identified by their xsqlid and statement progressive number.
Every bound statement is linked to the trace lines (contained in the middle pane) that reference it by the xb pointer tree. Clicking on the
xb[f] pointer will highlight the first line that reference the statement; the same goes for the xb[l] pointer that points to the last line.
Likewise, every statement is linked to the referencing trace lines by the xs pointer tree.
The xversions figure is the number of associated statements for every bound statement.
All other figures (whose name does not begin with "x") come verbatim from the trace file (such as sqlid, hv, etc).
You can change the color of the lines referencing a (bound or not) statement, and the statement itself, by using the "set color"
popup menu; you can choose to change the color of the statement, of the other statements, or all statements whose text
matches a LIKE-expression.
You can also choose to completely hide the lines/statment by acting on the "make invisible" popup menu, that works similarly.
This possibility to color/hide is extremely powerful, since you can, for example, color differently all lines that reference (or not)
a certain table/view in which you are interested, thus drastically filtering the overwhelming amount of information
contained in trace files.
The lines pane
This pane (the main pane) contains the (augmented) lines of the original trace file, plus others that are added
by the tool (see the Virtual Calls section).
options button
At the top of the pane, the "options" button open the panel that controls the visibility of the elements of the
pane. The panel allows you to hide or display any trace figure (e,c,dep,etc), any x-figure (xe,xela,etc) or any line type
(PARSE,EXEC,WAIT, etc), thus tailoring the pane output to your needs. Note that all elements are grouped by type,
each group sporting a pair of "set all"/"unset all" buttons to make hidden/visible all the groups in one shot.
Some special elements are available as well, for example to display under each EXEC line the BINDS value used for
its execution, or the used sql statement.
Note also the "templates" buttons at the bottom, that sets the visibility flags to pre-determined values (such as
default ones); note expecially the "Exec Flow Analysis", that enables Exec Flow.
color button
At the top of the pane, the "color" button allows to choose the object type that dictates the color of the lines
and the actual visibility (echoed by the text label at the right of the button).
The object type can be the bound statement(the default), the statement or the profile. The latter option can be used,
for example, to color in red all the lines that contributes to the top consumers listed in a resource profile.
pane content
Each line is labeled by the tool with a progressive line number (001, 002, etc in the example).
To the left of the line number, the "navbar" (navigation bar) duplicates and colocates together the most important
Pointer Trees to allow for quick navigation, using a pretty obvious notation detailed
here. The navbar duplicates also the xat pointer (the "S" hyperlink) and
the xprof pointer (the "P" hyperlink) at its right edge.
The actual augmented trace line is located to the right of the line number; the format is nearly the same as the original trace
line, with the x-figures added. First, the line is indented with spaces by its recursive level (the xdep figure),
then the original line type (PARSE, EXEC, etc) and cursor number are printed, followed by the very useful
statement and bound statement pointers.
What is printed next depends on the type of the line; the goal has been to print the most useful informations first,
while at the same time trying to print the original figures (copied verbatim) in the same order as the original trace file.
The x-figures calculated by the tool are:
| line type | list of x-figures |
| PARSING IN CURSOR | xsqlid |
| PARSE | xe xc xela xelab xeu, xp xphy xphyb xpu, xcr xcu, xc[] xw[] xe[] xb[] xs[], xat, xpic, xprof |
| BINDS | xdep |
| EXEC | xe xc xela xelab xeu, xp xphy xphyb xpu, xcr xcu, xc[] xw[] xe[] xb[] xs[], xbinds, xat, xpic, xprof |
| FETCH | xe xc xela xelab xeu, xp xphy xphyb xpu, xcr xcu, xc[] xw[] xe[] xb[] xs[], xat, xpic, xprof |
| WAIT | xe, xdep, xphy, xw[], xat, xprof |
| STAT | xnexecs, xb[] xs[], xstatn, xplannum, xprof |
| CLOSE | xe xc xela xelab xeu, xp xphy xphyb xpu, xcr xcu, xc[] xw[] xe[] xb[] xs[], xat, xprof |
| VIRTUAL CALL | xdep, xe xc xela xelab xeu, xp xphy xphyb xpu, xcr xcu, xc[] xw[] xe[] xb[] xs[], xprof |
| XCTEND | none |
| PARSE ERROR | xe xc xela xelab xeu, xp xphy xphyb xpu, xcr xcu, xc[] xw[] xe[] xb[] xs[], xprof |
| ERROR | xdep, xe[], xat |
| RPC CALL | xsqlid, xe xc xela xelab xeu, xp xphy xphyb xpu, xcr xcu, xc[] xw[] xe[] xb[] xs[], xprof |
| RPC BINDS | none |
| RPC EXEC | xe xc xela xelab xeu, xp xphy xphyb xpu, xcr xcu, xc[] xw[] xe[] xb[] xs[], xbinds, xrpccall, xprof |
| UNMAP | xe xc xela xelab xeu, xp xphy xphyb xpu, xcr xcu, xc[] xw[] xe[] xb[] xs[], xat, xpic, xprof |
| SORT UNMAP | xe xc xela xelab xeu, xp xphy xphyb xpu, xcr xcu, xc[] xw[] xe[] xb[] xs[], xat, xpic, xprof |
The reports pane
This pane contains reports about the trace file, that is, the output of specialized programs that extract information
from the trace file and elaborate or aggregate it.
Currently, only the hierarchical profiles have been implemented.
The programs feeds directly from the database version of the trace file, which the other two panes can be considered a
straight dump of. Whenever possible, the reports pane is cross-linked and integrated with the pristine data contained in
the other two panes, examples being the xprof pointer in the lines pane that points to the profile
that the line contributes to, and the line-pointing pointers in the resource histograms
of the hierarchical profiles.
Concepts and definitions
Bound Statements
The bound representation of a SQL (or PL/SQL) statement is obtained by replacing all literals with bind variables, and then normalizing
(redundant whitespace removed, case changed to lowercase, etc). For example, the two statements
SELECT X FROM T WHERE X = 0
SELECT X FROM T WHERE X = 1
share the same bound statement
select x from t where x=:n
Bound statements have basically the same set of figures of "real" statements, for example a (manufactured)
xsqlid.
The rationale behind bound statements
The bound statements are one of the most important features of the tool. In fact, most of the tool focus on the calculated
bound statements instead of the "real" statements found in the trace files; for example, the
hierarchical profiles aggregate figures by bound statement.
This is because a bound statement is the same as a method in a common language, the bind
variables values (or literal values of course) being the same as the parameters to the method itself; and it rarely (if ever)
makes sense to consider the parameter values when comparing and contrasting, or aggregating, different sections of a methods
trace, whatever the traced application language is.
In other words, aggregating by bound statement collects and consolidates the information that might otherwise get dispersed
and thus get unnoticed. As a relevant example, consider the common case of an application that incorrectly does not use bind
variables and hence issues 1,000 different statements with 1,000 different literal values, each one consuming 0.07% of some resource -
if we would not group by bound statement in the profiles we would miss 70% of the resource consumption, since a lot of small
consumptions of 0.07% gets easily unnoticed, while a big one of 70% definitely attracts immediately the analyst attention.
Last but not least, grouping allows us to see that we are actually seeing the same method, just called 1,000 times,
not 1,000 different methods, hence greatly reducing the analysis effort.
Bound statements as methods
Consider this statement:
select * from customers where name = :n;
The bind variable :n is the input parameter to the SQL code, code that specifies the database to "retrieve the record
whose name is equal to the input parameter and pass it back as an (implicit) output parameter".
In this sense, the SQL statement is a method, as it provides a perfectly defined set of functionality to the client;
the fact that is not compiled ahead of time but interpreted at run-time (compiled on the fly into an execution plan and
then run in the SQL Engine virtual machine) is inessential.
Consider now a client that uses (correctly or incorrectly) literals instead of bind variables:
select * from customers where name = 'JOHN';
select * from customers where name = 'MARY';
what is asking is exactly the same in both cases, it has simply provided the input parameters "in-line"; the essential
of what the client is specifying to be done is captured by the bound statement
select * from customers where name = :s;
The bind variable value of course may give more information, but in almost every case this is not the case and would only
mask the relevant information by providing unnecessary details (e.g. check this example).
The same discussion, of course, goes for PL/SQL statements (anonymous blocks, that is the only type of PL/SQL statements
a client can issue):
begin :z := 42 + my_stored_procedure (:x); end
in this case, the similarity with a method is even more obvious.
xsqlid
The xsqlid is very similar to the sqlid figure introduced in Oracle 10g (in fact the xsqlid actually contains the sqlid if available):
it is a statistically-unique string identifier that maps to a statement (or bound statement).
For statements, if the sqlid is available in the trace file, the xsqlid is simply the sqlid prefixed by the first
three letters of the sql statement, in uppercase. For example, for a select statement with the sqlid='gsya1fnahrtpk',
the xsqlid is 'SELgsya1fnahrtpk'. Thus, it is basically the sqlid with the statement type made more obvious by the
prefix.
If the sqlid is not available (e.g. for 9i traces), the hv (hash value) figure (to be found in the PARSING IN CURSOR section) is used
instead of the sqlid. For example, for an insert statement with hv=2059022544, the xsqlid is 'INS2059022544'.
In the very rare occurrence in which neither the sqlid nor the hv is available, the statement text is fed to an internal MD5-based hash function, its output converted
to a string with the same base characters as Oracle's sqlid but in uppercase. For example, for a begin-end block,
it might be 'BEGU1PKBFB13JT2'.
For bound statements, the same internal MD5-based hash function is used on the bound statement text, and then the output is prefixed
by ":" followed by the usual three letter, but in lowercase. An example for a select statement might be ':sel2U49229L3WZXD'.
In addition, for both statements and bound statement, if a collision is detected (two or more different statements that
share the same xsqlid are found), a progressive postfix is added to the colliding xsqlid (starting with the second); hence you might see
(very rarely, and probably almost never for 10g+ traces) xsqlids such as 'SELgsya1fnahrtpk', 'SELgsya1fnahrtpk_1', 'SELgsya1fnahrtpk_2', etc.
Thus, note that the xsqlid perfectly uniquely identifies a statement inside a trace file, and the part before the postfix only statistically
uniquely identify a statement across trace files (as the sqlid does).
statement progressive number
The statement progressive number is simply a number linked to (different) statements, incremented by 1 whenever a new statement
is found in the trace file. For bound statements, the number is printed prefixed with ":".
This progressive number is obviously inferior to the xsqlid, since the latter (statistically) uniquely identify the statement
across trace files, while the former is meaningful only inside a particular trace file. The progressive number is used only because
it is much shorter and hence consume less precious screen real estate, in places (e.g. "pointers") where using the xsqlid would
be useless or redundant.
Pointer trees
One of the tool main features is the ability to automatically associate (or "attribute") lines to either other lines or statements.
For example, WAIT lines are attributed to the (parent) db-call that caused them; recursive db-calls are associated to their
(parent) db call that issued them; almost every line is associated to both its statement and bound statement, etc.
The association (attribution) is implemented with a "parent pointer", that is, an x-figure that contains either the parent line
number or the parent (bound) statement number (aka statement progressive number). For example, the parent of a WAIT line is contained in the "xwt" (X Wait parenT) figure;
the parent db-call in the "xct" (X Call parenT) figure, etc.
The set of lines that are associated with the same parent are, quite obviously, named either the "sibling chain" or the "children".
The previous and next sibling pointers are contained in other two x-figures. For example WAIT lines have the "xwp" (X Wait Previous)
and "xwn" (X Wait Next) x-figures; db-calls have the "xcp" and "xcn" x-figures; etc.
Parents have also two additional figures that points to the first and last children. For example, db-calls have the
"xwf" (X Wait First) and "xwl" (X Wait Last) x-figures; db-calls have also the "xcf" and "xcl"; etc.
Note: null pointers are represented by the number zero, since all line numbers and statement progressive numbers are strictly greater than zero.
So to recap, for every association type a set of five pointers (a pointer tree) is defined: the parent, the previous sibling,
the next sibling, the first child and the last child. The currently defined pointer trees are:
| association type | parent | previous sibling | next sibling | first child | last child |
| bound statement association | xbstmt | xbp | xbn | xbf | xbl |
| statement association | xstmt | xsp | xsn | xsf | xsl |
| recursive calls | xct | xcp | xcn | xcf | xcl |
| WAIT lines attributions | xwt | xwp | xwn | xwf | xwl |
| ERROR lines attributions | xet | xep | xen | xef | xel |
Note the naming convention, which is usually consistent: the only exceptions are "xbstmt" and "xstmt" (which should be really
named "xbt" and "xst").
The pointers are always displayed grouped together by square brackets, for obvious visual convenience; null pointers are not displayed
at all. So for example, if xct=0,xcp=11,xcn=22,xcf=0 and xcl=0, on the screen you will see xc[p=11 n=22].
The "xbstmt" and "xstmt" pointers are an exception; they are so important that they get special treatment, that is, they are printed
always near the beginning of the line, separated by a slash. So if xbstmt=11 and xstmt=99, on the screen you will see xbstmt=:11/99 (note
the usual colon near the bound statement progressive number).
Note also that all major pointers are grouped together in the "navbar" at the left of every line
as well, to make it easier to navigate the pointer trees. The navbar is divided into four sections corresponding to the four most
important pointer trees (associations) types (b,s,c,w). In each section, each pointer is implemented by a single-letter hyperlink
with its name (t,p,n,f,l). So for example, the xwt pointer is is to be found in section "w" and it is named "t".
Null (zero) pointers are not shown and are left as blank, in order to drastically reduce the visual clutter.
The Xb (bound statements) Pointer tree
This pointer tree links all lines with their bound statement. So, you can navigate from a line (in the middle pane) to its
bound statement (in the top pane) by following the xbstmt pointer, or navigate to the next/previous line by following
the xbp/xbn pointers; from the bound statement, you can navigate to the first/last line in the trace file that uses
the bound statement by following the xbf/xbl pointers. Hence, you can visit all the points of the trace that
are associated with the bound statement you are interested into.
To build the tree, the following (conceptual) algorithm is used. Each line has a cursor number (e.g. FETCH #3), and the statement text
is contained into the first previous PARSING IN CURSOR line with the same cursor number (e.g. PARSING IN CURSOR #3); hence
it is rather easy to calculate the bound statement text from the PARSING IN CURSOR line, and then associate it
with the line. Note: if the statement text is not known (usually because the trace file was truncated), the line
is associated with the "statement unknown" bound statement.
The Xs (statements) Pointer tree
This pointer tree is structured and built like the previous one, obviously considering the original statement
text, not the calculated bound statement text.
The Xc (calls) Pointer tree
This pointer tree links all db-call lines (PARSE, EXEC, FETCH, CLOSE, etc.) with their recursive db-calls. If the
db-call was issued directly by the client, the parent (xct) pointer is set to null (zero); otherwise, it is set to
the line number of the (parent) call. So, you could, for example, navigate all calls made by the client by
following the xcn (or xcp) pointer, ignoring all recursive calls; or, you can drill down to the recursive calls of any
call of interest by following the xcf/xcl pointers.
For example, in the main example, the PL/SQL anonymous block with xsqlid=':dec1COLA8D4K9RPW'
is submitted directly by the client (hence the xct pointers of its PARSE,EXEC,FETCH etc are all zero).
The EXEC db-call of this block causes a series of recursive db-calls on the two statements
"SELECT X FROM T WHERE X = 0" and "SELECT X FROM T WHERE X = 1"; hence, all the db-calls of these two statements
have an xct that points to the EXEC line of the anonymous block.
The algorithm used to build the tree is (conceptually) very simple: for every possible value of the figure "dep",
a double-linked list of db-calls is maintained (say, list[dep]). When a new db-call is found while parsing forward the trace file,
say with a dep=d, the db-call is linked as the next sibling of list[d], and list[d+1] is moved
as the child chain of the current db-call.
Note: if a non-empty list[x], with x > d+1, is found, an 'phantom call' Virtual Call is injected (with xdep=d+1),
since this means that an uninstrumented db-call has occurred. Moreover, if the process starts to wait for the client (on the
events 'SQL*Net message from client', 'SQL*Net message from dblink', 'PX Idle Wait', etc - see also the discussion about the
'wait-for-client' Virtual Call), and the list[1] is not empty, a
'phantom call (dep=0)' Virtual Call is injected (with xdep=0), since that means that an
uninstrumented dep=0 call was issued (this happens, for example, in 11g when some database triggers are fired).
Waiting for the client closes all active list[x] as well, by injecting as many 'phantom call' Virtual Calls as necessary:
by definition, if the process is waiting for the client to be informed about the next thing to do, all previous calls
must have finished.
The Xw (WAIT lines attributions) Pointer tree
This pointer tree links all WAIT lines with the db-call line that caused (is responsible for) the wait event, thus linking
the user actions (the db-calls) with their resource-consuming consequences (the wait events). This is obviously of paramount
importance, and in fact this pointer tree is one of the most useful ones.
WAIT lines are almost always attributed forward (that is, to a db-call that occurs after the WAIT line in the trace file), but
in some infrequent case they can be attributed backward. Forward attribution is signaled with "+WAIT",
backward attribution with "-WAIT". For more details, see rationale for backward
attribution section.
So you can see the db-call the WAIT line has been attributed to by following the xwt (wait parent) pointer, or, from the db-call,
you can get all the attributed WAIT events by following the xwf(xwl) pointer, and then iterate forward(backward) on the child chain
by following the xwn(xwp) pointer.
For example, note how the WAIT lines for the 'db file scattered read' event are attributed to the FETCH line for
the statement "select x from t" in this little classic example:
Note also that WAITs for the client (such as "SQL*Net message from client" or "SQL*Net message from dblink") are
always attributed to the 'wait-for-client' Virtual Call.
The wait attribution algorithm in detail is (conceptually) as follows:
- check the cursor number of the WAIT line, and identify the previous and next db-call with the same cursor number, without crossing
the border of the current documented client request - that is, looking only inside the line set delimited by the previous and next wait
for the client (those linked with a 'wait-for-client' Virtual Call).
- attribute the wait (backward) to the previous db-call if one of these apply:
- no next db-call exists;
- the event name is flagged internally as "to be always attributed backward" (currently, only 'SQL*Net break/reset to client'
and 'SQL*Net break/reset to dblink' are flagged as such).
If no previous db-call exists, the WAIT line is attributed to the latest
'unattributed WAITs' Virtual Call.
- otherwise, attribute the wait (forward) to the next db-call.
The Xe (ERROR lines attributions) Pointer tree
This pointer tree links all ERROR lines to the db-call that raised the error; so you can find the
line that caused the ERROR by following the xet pointer, and the first/last ERROR lines associated
with the db-call by following the xef/xel pointers; obviously, when no error is associated with the
db-call, both xef and xel are set to null (zero).
Note: multiple ERROR lines for the same db-call are supported by xtrace, although I haven't found yet a real trace
with more than one ERROR line for the same db-call.
In the following example, note how the EXEC of the insert statement caused is associated with the ERROR line, the
figure "err" being equal to 12899 (ORA-12899 is "value too large for column"):
The algorithm for building this pointer tree is trivial: the ERROR line is associated with the
nearest previous db-call with the same cursor number.
Rationale for backward attribution
Consider the following trace fragment:
1: EXEC #11:c=30043,e=55177,p=0,cr=0,cu=2,mis=1,r=0,dep=0,og=1,tim=21395238121
2: ERROR #11:err=12899 tim=2139307
3: WAIT #11: nam='SQL*Net break/reset to client' ela= 6 driver id=1413697536 break?=1 p3=0 obj#=49815 tim=21395247806
4: WAIT #11: nam='SQL*Net break/reset to client' ela= 1610 driver id=1413697536 break?=0 p3=0 obj#=49815 tim=21395253026
5: WAIT #11: nam='SQL*Net message to client' ela= 7 driver id=1413697536 #bytes=1 p3=0 obj#=49815 tim=21395257121
6: WAIT #11: nam='SQL*Net message from client' ela= 3328 driver id=1413697536 #bytes=1 p3=0 obj#=49815 tim=21395286485
It is impossible to forward-attribute the first three WAIT lines, for the simple reason that there is
no db-call after the three WAIT lines. The fourth line is, in fact, a wait on the client connection, that signals
that the process is waiting for something to do, after having finished everything that the client had previously
asked for.
Moreover, it is obvious that the three WAIT lines are caused by the EXEC failure, that has raised
the ERROR line; the two 'SQL*Net break/reset to client' events tell the client that the EXEC db-call has been
interrupted and the process state reset, and the 'SQL*Net message to client' is the standard wait
that happens when the connection is "slow" and the write on the connection (socket) by the process takes a measurable
amount of time.
Hence, it is logical to attribute the WAIT lines to the failed EXEC, so that the resources consumed to service
the EXEC failure will be debited to the EXEC itself - even if this means to attribute backward a WAIT that occurred
after the EXEC db-call ended, that is, a WAIT that was not directly waited on by the EXEC db-call.
The current version of wait attribution algorithm is able to backward-attribute the three
WAITs to the failed EXEC:
Note that the wait attribution algorithm is heuristic, and hence not perfect, albeit it handles
gracefully the vast majority of interesting scenarios I've seen in practice.
Side note: back to the example above - the kernel db-call servicing code structure is probably, in pseudocode, the following:
dbcall_network() {
try {
dbcall() -> emits EXEC #11
} catch (...) {
send_break_reset() -> emits ERROR #11, WAIT #11: nam='SQL*Net break/reset to client'
}
write_to_socket() -> emits WAIT #11: nam='SQL*Net message to client'
}
That is, the kernel-traced db-call function is not the one that is called "from the network" (dbcall_network) but
the internal one (dbcall); hence the network-supporting code (that includes the sending back of the result on the
connection) produces lines that appear after the db-call end line.
Virtual Calls
"Virtual Calls" are output lines that are created by the tool and that do not exist in the original trace file
(hence the adjective "virtual"). Their main purpose is to make the file structure regular, by, for example,
allowing to build Pointer Trees with no exception case even if the original trace file
is damaged or inconsistent (e.g. due to uninstrumented code in the kernel). For example, any recursive db-call in the
output has always a parent, which is normally a db-call found in the original trace file or, lacking that, a virtual call
artificially (virtually) inserted by the tool. In a sense, virtual calls are injected by the tool to repair
every missing branch (usually because of an uninstrumented section in the kernel code) in the pointer trees,
while at the same time marking the existence of such a missing branch.
Having a regular pointer trees structure is very important, since the tool architecture is based on a first step
that "parses" the trace file and populates a database with the extracted infos, and then a second step
where a set of modules navigates the database and produce the output reports. For example,
the statements and lines panes contain a simple straight
"dump" of the database; the reports pane contains, instead, rather complex
aggregations/transformations of the database information.
Needless to say, a regular pointer trees structure makes writing the reporting modules much easier,
less error-prone and much more maintainable.
Of course, many virtual calls also organize, and make more readable, the tool output.
Virtual Calls mainly related to the db-call pointer tree
The following virtual calls are created mainly to repair the Xc (calls) pointer tree:
-
'phantom call' : this virtual call is injected when a missing parent link is found. E.g. if a dep=3 call is found whose next call
is a dep=1 call, an instance of this virtual call is created (with xdep=2) to serve as the intermediate link between the
two calls (as child of the dep=1 call and parent of the dep=3 call). Missing db-calls usually happen because the kernel code that
implements the intermediate (dep=2 in this example) call is uninstrumented.
-
'phantom call (dep=0)' : this virtual call is exactly the same as the previous one, but for the dep=0 level; it is created
when a dep=1 call has no corresponding dep=0 parent call. It is further qualified with the "(dep=0)" suffix to highlight it,
since the dep=0 calls are the ones directly issued from the client application, and hence have a distinguished meaning in
almost every investigation or tuning exercise.
Phantom dep=0 calls are actually the most frequent kind of phantom calls. They happen for example in some 11.1.0.6 traces
when some database-level triggers are fired (probably because the firing kernel code is uninstrumented).
Further details about both types of virtual call can be found in the Xc tree algorithm explanation.
The following example illustrates the 'phantom call' virtual call. It has been obtained
by commenting out a dep=1 EXEC db-call; the tools steps in and repair the Xc pointer tree by inserting
an xdep=1 'phantom call' virtual call.
Virtual Calls that link exception lines
When a line cannot be successfully linked in some pointer tree, or is damaged for whatever reason,
it gets linked to one of the following virtual calls. Hence, by starting from one of these special virtual
calls (or from the statement or bound statement that links every kind of these lines) and following
the usual pointers, you can visit and examine all these (hopefully few) exception lines; at the same time,
these potentially alarming lines are automatically factored in the reports output and
easily recognized as linked to these easily distinguishable special virtual calls.
- 'bad lines' : this singleton virtual call is inserted at the trace top, right before the first non-comment line,
when at least one "bad line" (a line that was not successfully processed by the tool) is found in the trace file.
Its children are all the "bad" db-calls and WAIT lines. It is rarely found in trace files, and when found it usually links
only the very first few lines of the trace file, since trace files are usually not opened for writing at a
"clean point" (e.g. when the process is idle) but when the process is working, thus "truncating" the top context
of the trace file.
- 'unattributed WAITs' : these virtual calls link as children (obviously in their
Xw (WAIT lines attributions) pointer tree) any WAIT line that was not attributed to any db-call,
either because there was not enough information in the trace files (due to uninstrumented kernel code) or, of course,
because of some limitation of the tool algorithms. They can be found immediately after a
'wait-for-client' virtual call, grouping all the unattributed wait events
found before the next wait-for-client virtual call, and replacing the usual
'client-message-received' virtual call. There can be also a singleton call at the
very beginning of the trace file, next to the 'bad lines' singleton call, if a WAIT line was found before the very first
wait-for-client virtual call.
- 'unexpected-end-of-file' : this singleton virtual call is inserted at the trace bottom when lines that can be only
attributed forward have no corresponding forward lines, thus signaling that the tracing was stopped before the forward
lines generation. It usually links the last few lines, since the action of stopping the tracing activity usually corrupt
the bottom context of the trace file (classic example: a dep=1 db-call whose dep=0 db-call was in-flight when the tracing
was stopped externally).
The wait-for-client Virtual Call
A WAIT on the event 'SQL*Net message from client' (and the equivalent 'SQL*Net message from dblink') means that the
server process is idle, waiting for the next client call (actually it waits on the client-process connection, until any byte arrives
through the connection coming from the client). Very frequently a whole set of adjacent lines of this kind of event is found,
since long waitings are usually interrupted (e.g. by timeouts) to allow the server process to perform various types of periodic
housekeeping or checks, after which the process resumes waiting.
The tool creates a 'wait-for-client' virtual call after every such set of wait events, and attributes all the WAIT lines of the set to
this virtual call (you can see this happening in all the examples provided in this manual). We are going to discuss the advantages
below, but you can picture this call as a silent(implicit, virtual) db-call issued by the client ordering the server to wait
for the next db-call, to which the server responded by actually WAITing.
The tool does the same for other similar events; for example, the event 'PX Idle Wait' for parallel processes, that signals
that the parallel process is idle waiting to be associated with a Query Coordinator (i.e. the next client) to run the next
parallel statement. Another example is the event 'rdbms ipc message', on which background processes (notably LGWR) wait while
not servicing requests. An internal configuration file dictates which events to handle this way.
Thus, the 'wait-for-client' virtual call:
- highlights this kind of events by attributing them to an easily-distinguishable, dedicated virtual call, hence breaking
visually the client-server conversation into a succession of smaller talks (groups of db-calls), thus obviously
helping the investigator into understanding the trace file structure.
Note: to stress further the talks borders, in the lines pane an horizontal line is printed after the virtual call
and before the first WAIT lines associated to it.
- makes the pointer trees structure more regular and navigable (for example, this virtual call is linked into the Xc pointer tree as the
sibling of the previous and next dep=0 db-call; and in general, also thanks to the introduction of this virtual call,
all WAIT lines are attributed to a call, virtual of not).
Note that the "idle" time (time elapsed not working for the client, which is the client think/idle time plus
the client-to-server transmission overhead) is accounted automatically as the duration of the 'wait-for-client'
profile consumer group.
It is also worth noting that the tool parser uses this kind of events to improve the tool analysis, by "resetting" the parser
state whenever a wait-for-client event (Virtual Call) is detected. For example, it is obviously impossible that anything (a
WAIT line, a db-call, etcetera) can be attributed "across" this kind of events, and the parser has been coded accordingly
to leverage this knowledge in its internal algorithms (see the descriptions of the various Pointer Trees
building algorithms).
The client-message-received Virtual Call
The 'client-message-received' virtual call is inserted as the next line of each
'wait-for-client' virtual call, to stress further the point in time when the traced process
started working on a client message after a period of inactivity. It can be turned into an
'unattributed WAITs' virtual call if necessary, thus signaling the more important
presence of unattributed waits in the tracing of the current client message servicing code.
This message is the norm in many background processes (e.g. LGWR) traces, since background processes are not called by any client
and thus get all their non-waiting-for-client WAIT lines attributed to this virtual call.
The null Virtual Call
The 'null call' virtual call has to be ignored. It is sometimes inserted to simplify the tool algorithms, and it has no meaning at all.
Measures
For every db-call, the five native measures performed by the Oracle kernel and found in the trace file are:
| measure | meaning |
| e | elapsed time |
| c | CPU time (time spent using the processor) |
| p | physical blocks (number of blocks physically accessed on disk) |
| cr | consistent reads (number of blocks accessed in the library cache in consistent mode) |
| cu | current reads (number of blocks accessed in the library cache in current mode) |
For every WAIT event, the kernel measures the wait duration (the value of the figure "ela").
For some WAIT events (such as 'db file sequential read', 'db file scattered read', etc), the number of physical blocks accessed
are recorded by the kernel and reported in one of the parameters (p1,p2,p3) of the event; the tool extracts and
makes more evident this measure in the figure "xphy" of the WAIT line.
These seven native measures are used by the tool to derive the x-figures detailed below.
xe,xc,xp,xcr,xcu
For each of the five db-call native measures (e,c,p,cr,cu), the tool defines a corresponding x-figure (xe,xc,xp,xcr,xcu) that
replaces the native measure in all the tool calculations. These figures are a slight generalization of the native ones and are needed
in order to
- provide an estimation of the measures for line types that have no native measures (PARSE ERROR, VIRTUAL CALL, RPC CALL) in the current
Oracle versions;
- support the WAIT lines backward attribution.
The first purpose is very simple; line types with no native measure CAN have child lines associated. For example, a PARSE ERROR
line can have (and usually has) associated all recursive db-calls issued up to the point of the parse failure; and a VIRTUAL CALL,
by definition, can have associated almost everything.
For these lines, (xe,xc,xp,xcr,xcu) are defined as being the sum of the corresponding (xe,xc,xp,xcr,xcu) of all
associated child lines - this provides an approximation (a lower bound) of what the actual figures would be if the kernel had
provided the native measures.
The second purpose is a bit more complex. Backward attribution is performed, as explained better in the
rationale for backward attribution discussion, in order to
attribute some WAIT lines that occurred after a db-call ended but that
were caused by the db-call; in essence, the tools "extends" the db-call to embrace the backward-attributed WAIT line.
To factor in the WAIT line, since obviously the "e" and "p" figures of the db-call do not factor in the WAIT line, the WAIT "ela"
is added to the db-call "xe", and the WAIT "xphy" to the db-call "xp".
In pseudocode, the calculation algorithm is
- Initialize (xe,xc,xp,xcr,xcu) equal to (e,c,p,cr,cu) for lines that records the native measures, to zero for others;
- Whenever a child db-call is associated to a line with no native measure, add its (xe,xc,xp,xcr,xcu) to its parent ones;
- whenever a WAIT line is attributed backward, sum its "ela" to the parent "xe" and its "xphy" to the parent "xp".
Note that if no backward attribution is performed (as it is almost always the case), (xe,xc,xp,xcr,xcu) are simply equal
to (e,c,p,cr,cu) for lines that have native measures, and are the sum of the (e,c,p,cr,cu) of the recursive children for
all the others.
recursive (and self) xe,xc,xp,xcr,xcu
For each line that has a set of measures (xe,xc,xp,xcr,xcu), the tool automatically maintains a set of "recursive measures"
(xre,xrc,xrp,xrcr,xrcu) defined as being the sum of (xe,xc,xp,xcr,xcu) of the recursive children (first level only).
The difference between the two sets, namely
self(xe,xc,xp,xcr,xcu) = (xe-xre, xc-xrc, xp-xrp, xcr-xrcr, xcu-xrcu)
is defined as being the "self" measure, a definition invented by [Millsap].
This set of related figures is printed by the tool using a concise notation, to save space on the screen; for example
xe=100/10/self=90
means that xe is 100, xre is 10, and self is 90=100-10.
All the three figures are explicitly printed because all of them can be very useful (or not) depending on the purpose of the analysis.
For example, consider an EXEC of a PL/SQL block; if the analyst
is examining the trace file with the overall goal of improving response time to the user, and she is interested only in evaluating
the advantage of avoiding altogether the call, she will be interested only in the overall elapsed time (xe); if she wants to optimize
the call, she will of course be interested in knowing whether the most elapsed time is due to the SQL contained in the block (xre)
or the execution inside the PL/SQL Engine self(xe) = (xe - xre). Of course the same holds if the overall goal is to reduce the resource
consumption of the CPU (xe, xrc), of the disk (xp, xrp), or of the library cache accesses (xcr, xrcr and xcu, xrcu).
xela, xelab, xphy and xphyb
Whenever a WAIT line is attributed to a db-call, the wait elapsed time contained in the "ela" native measure is added
to either the xela figure of the db-call, if the WAIT was forward-attributed, or to the xelab (xela backward)
if the WAIT was backward-attributed.
Similarly, the xphy measure of the WAIT line (the number of disk blocks accessed by the wait operation, contained in
one of the p1,p2,p3 event parameters) is added to either the xphy or xphyb (xphy backward) figures of the db-call.
Hence, xela+xelab contains the time spent waiting on all the wait event caused by the db-call, and xphy+xphyb the
disk blocks accessed while waiting. Xela (and xphy) factors in the forward-attributed waits only.
Needless to say, knowing the wait time of a db-call is of paramount importance in most of the trace files analysis.
Unaccounted-for measures: xeu and xpu
The unaccounted-for elapsed-time (xeu) for a db-call is the fraction of elapsed time that is not accounted as time spent
performing recursive work, using the CPU or waiting for some event. Its mathematical definition is
xeu = xe - xre - (xc - xrc) - (xela + xelab)
= self(xe) - self(xc) - (xela + xelab)
As explained in [Millsap], xeu is the sum of the time spent waiting for the CPU (i.e. on the OS scheduler "ready"
queue) plus all the measurement errors of the factors xe,xre,xc,xrc,xela,xelab plus missing time due to uninstrumented piece of code.
That is, beside measurement errors and uninstrumented kernel code, xeu is non-zero only if the CPU load was so high that the Oracle
process had to wait its turn to use the CPU.
The unaccounted-for number of physical blocks accessed (xpu) is the fraction of disk blocks that was recorded as been accessed
(by the measure xp) but that weren't accounted in the attributed wait events (by the measure xphy from p1,p2,p3).
Its mathematical definition is
xpu = xp - xrp - (xphy + xphyb)
= self(xp) - (xphy + xphyb)
A non-zero value means that some disk-accessing code was uninstrumented or that the number of blocks accessed was not
propagated in one of the p1,p2,p3 wait event parameters.
Xprof
The xprof identifier uniquely identifies the nested profiles contained in the reports pane; it is
a meaningless positive integer (equal to zero for the root profile). It is shown in curly brackets to differentiate it better
from the other pointers, and thus to allow easier searching in the plain text version of the xtrace tool output.
Every trace line that contributes to a nested profile (for example, a WAIT line that contributes with
its "ela" figure to a "xe" profile) has an "xprof" pointer that points to the contributed profile section (obviously located
in the reports pane). This way it is possible to immediately check, for example, how much the current line
resource consumption impacts the overall profile. This pointer is also replicated in the "navbar".
Xat
The xat (Stat Parent) pointer points to the STAT line group that is associated with the trace line (if available); this way,
the plan (and associated statistics) is immediately available for every line.
The STAT association is made very conservatively; in order to be associated with a line, all the trace lines in-between
must have not recorded a miss in the library cache (mis=0).
Xdep
Every trace line has associated the xdep figure, which is the recursive level (starting with zero) that the line was emitted under.
This figure is the same as "dep" if the latter is available for the line, otherwise it is calculated by replicating the
value of "dep" of the first previous line associated to the same cursor.
Xpic
The xpic pointer points to the line number of the PARSING IN CURSOR line associated with the current line. It is currently
defined for PARSE and EXEC lines.
Xrpccall
The xrpccall pointer points to the line number of the RPC CALL line associated with the current line. It is currently
defined for RPC EXEC lines.
Xbinds
The xbinds pointer points to the BINDS line associated with the EXEC line (or zero if no BINDS section is associated), that is,
it points to the section containing the bind variables values used by the execution of the cursor.
Xplannum
The xplannum is a progressive id associated by the tool to every plan; every time a different plan is found (in the STAT lines),
a new xplannum is generated. Two plans are the same if they have exactly the same row source operations (op) and access the same
objects (obj).
Xnexecs
This figure belongs to STAT lines and it records the number of EXEC calls that are associated with the STAT group, that is,
the number of times that the STAT plan has been (re)executed since it was associated with the cursor. For example :
PARSE #2 mis=whatever
EXEC #2 mis=0
EXEC #2 mis=0
STAT #2 xnexecs = 2
This figure is used to compute the average of the various statistics associated with the row source operation:
avg(statistic) = statistic / xnexecs (e.g. avg (cnt) = cnt / xnexecs)
since of course, this kind of average can be compared directly with the CBO estimations.
Indeed, avg(cnt) correlates directly with card, and avg(cr) correlates, albeit much more weakly in recent Oracle
versions, and for queries only, with the cost.
Xstatn
The xstatn pointer points to the next STAT line of the group of (usually) consecutive STAT lines that record the cursor plan
and related statistics; it is null(zero) for the last STAT line. Note that STAT lines are not always consecutive; sometimes
they are intermixed with recursive calls performed to complete the to-be-printed informations (such as to retrieve the object names
not available in the cursor plan memory structures).
The hierarchical profiles
By definition, a (resource) profile is the list of the consumers of a resource, each one listed with the amount of resource
consumed. The list is always printed ordered by decreasing amount (strongest consumers first), since when investigating
the resource consumption, the analyst obviously concentrates on the biggest contributors first ("low hanging fruits").
The tool offers, in the reports pane, profiles for five types of resources:
duration (xe), CPU (xc), physical block reads (xp),
consistent block reads (xcr), current block reads (xcu). Each one has a
dedicated tab inside the pane.
Here is an example of an xe profile:
Just after the hierarchical profile, the corresponding flat profile is printed (not shown in the picture above).
consumer groups
The actual resource consumers are the (non-comment) trace lines; for example, db-calls can consume any type of resource,
WAIT lines can consume duration and, in some cases, physical reads. A profile at this low granularity level is usually not
very useful, hence the tool (as all profilers do) aggregates the actual consumers by some common characteristics into
consumer groups, and reports, for each consumer group, the total amount of resource consumed and other sometimes
useful informations such as the number of lines belonging to the consumer group (count), the average resource consumed (avg)
and the resource histogram.
nested profiles
The report is composed of a hierarchy of nested profiles, each one identified by the xprof identifier.
Clicking on a consumer group (more precisely, clicking the xprof link column) opens another profile that further subdivides
(drills-down) the consumer group resource amount into (sub)consumers groups, and so on, recursively, until a consumer group
is no longer further divisible.
Appended to each profile you can also find other informations about the profiled consumer group; currently, you can find
the execution plans
(if available and applicable) and the list of associated ERROR lines.
The idea of using nested profiles was taken from [TVD$XTAT], also described in
[Antognini].
root profile
The root profile (the first printed at the top, with xprof equal to zero) aggregates all lines that are
directly attributable to the client, that is, whose xdep is equal to zero; the remaining lines are
aggregated in the nested profiles.
Hence the root profile consumer groups are aggregations of:
- the dep=0 db-calls (the db-calls issued directly by the calling client);
- the 'wait-for-client' virtual call lines, that is, the virtual calls
that consumed resources while waiting for orders from the client;
- other exception virtual calls, that can be usually ignored.
Hence, by examining the root profile you can immediately see, for example, which db-call was the most expensive,
and then drill-down on it; or how much resource was consumed waiting for the client versus working for the client;
or simply see the application signature (the set of db-calls issued directly by the client).
Db-calls are always grouped by bound statement and call type (PARSE, EXEC, etc). The reason
why we group by bound statement is discussed in general and in particular for profiles in the
importance of bound statements section.
Grouping by call type (PARSE, EXEC, FETCH, etc) is performed because different call types are very different operations, and issues
are better diagnosed by examining them separately. For example, knowing that a statement is "slow" does not tell us a lot; it's only
when we see that its FETCHes are slow that we start investigating about a bad plan or slow disks, at which point all the other
contributors (PARSE, EXEC, etc) are only noise and are better factored-out.
To see an example of a root profile, notice in the example how the topmost consumer group is
"exec (:dec1COLA8D4K9RPW)", that is, the EXECutions (actually a single EXECUTION as the "count" column shows) of the bound statement
with xsqlid=":dec1COLA8D4K9RPW" (which is, by the way, an anonymous block); this group accounts for about 85%
of the trace file duration.
The second consumer group is "virtual call (wait-for-client)", that is, the time spent waiting for the client
(in this example, WAITing for a "SQL*Net message from client"); it accounts for about 17% of the trace file duration and it
aggregates two ("count") lines.
nested profiles of the root profile
Clicking on a root consumer group (i.e. a consumer group listed in the root profile) opens the nested resource profile for the
clicked consumer group. This nested profile, in order to produce the (sub)consumer groups that show the distribution of
the resource amount of the root consumer group, aggregates measures coming from both the child lines associated to the
root lines (i.e. the lines aggregated in the root consumer group) and the root lines themselves.
In detail, the child lines are:
And the measures coming from the root lines are:
As an example of computation for a self measure, consider the processing of the self(xc).
To recap, all db-calls reports the amount of CPU they used, and the tool is able to calculate the "self CPU"
self(xc), that is, the amount of CPU the db-call used to execute its own kernel code (factoring out the CPU used
to execute the recursive db-calls). Each root line whose self(xc) is not zero adds its self(xc)
to the consumer group named "self cpu != 0" in the nested profile - this tells how much of the root consumer group duration
was spent on the CPU.
The same is performed, of course, for the other relevant self measures self(xcr) and
self(xcu).
Similarly, the unaccounted-for measures xeu (unaccounted-for duration) and
xpu (unaccounted-for physical block reads) of the root lines whose values are not zero are added to the
nested profile consumer group named "unaccounted duration != 0" and "unaccounted physical blocks != 0".
To see an example of a nested profile, try going to the root profile in the example and clicking
the topmost consumer group "exec (:dec1COLA8D4K9RPW)" we talked about previously. You can see that about 50% of the duration of
that root consumer group is unaccounted, about 33% comes from the PARSing of the bound statement with
xsqlid=":sel1K6Y4JRMZEHWN", and about 17% comes from the FETCHes of the same select.
next-level nested profiles
Of course, a nested profile of the root profile can have nested profiles itself, that you can explore as well. How they are calculated
is obvious.
profile resource histogram
The resource histogram, located at the right end of each consumer group, shows the distribution of the amount of
resource consumed by the lines that are aggregated in each consumer group. Each bin of the histogram contains a counter
that displays the number of lines whose amount belongs to the bin interval (whose upper extreme is indicated in the bin
column header; the interval includes the upper extreme and excludes the lower extreme, unless otherwise indicated).
Moreover, for each bin, the line that contributed the most to the bin is recorded; you can navigate to that line by clicking
on the counter. This way you can, for example, investigate an outlier in the amount distribution.
profile associated error lines
If the profiled consumer group has some contributors (db-calls) with associated ERROR lines, the error codes (ORA-XXXXX), the number
of lines carrying each error code, and the first and last ERROR line pointers are printed.
profile execution plans
If the profiled consumer group aggregates db-calls, all available plans associated (by the xat pointer) to
the aggregated lines in the trace file are printed inside the "plan(s)" tab. Of course, there might be more than one plan
because db-calls are aggregated by bound statement, and the same bound statement might have
different plan associated:
the associated statements can have different literal values that might lead to different plans, and in any case, it might
always happen that the execution plan gets flushed from the library cache and then differently re-optimized while the process
is being traced.
All the available row source operation figures (including the new Oracle 11g "cost","card" and "size") are reported.
For the measures "cnt", "cr", "pr", "pw" and "time", the average over all the associated executions
(the xnexecs x-figure) is reported as well.
Note that the column order of the plan is designed for optimal visual effectiveness; for example, "avg(cnt)" and "cbo:card"
are printed next to each other to help the analyst check whether the CBO estimation was adequate, a check of paramount importance
(that has been popularized under the name of "Tuning by Cardinality Feedback" by [Breitling]).
The flat profiles
The flat profile of a hierarchical profile is built by retrieving and consolidating all the "wait",
"self" and "unaccounted-for" consumer groups from all the profiles that are nested under the current one.
It is printed by the tool under each profile.
Flat profiles are mostly useful to highlight any issue of the "system as a whole" that impacts the current profile. For example, if the flat
profile shows a lot of time elapsed while waiting on "db file scattered read", with an average wait of 100ms and an histogram showing a lot of
spikes, it is probably worth investigating the sanity of the storage subsystem first, instead of concentrating on the nested profiles.
Exec Flow
The "Exec Flow" is a report that replicates the series of EXEC db-calls as they are found in the trace file, decorated
with the SQL (or PL/SQL) text and, possibly, the bind variable values from the BINDS section associated with the exec line.
For example:
EXEC : xbsqlid=:upd0WJS0BMCM1WZZ/UPD813658919 dep=1
update "DELLERA"."MLOG$_TEST_T2" set snaptime$$ = :1 where snaptime$$ > to_date('2100-01-01:00:00:00','YYYY-MM-DD:HH24:MI:SS')
Bind#0
kxsbbbfp=05076c13 bln=07 avl=07 flg=09
value="8/4/2009 16:6:45"
EXEC : xbsqlid=:sel0EW7NRBW2B1QT/SEL3133130461 dep=1
select dmltype$$, max(snaptime$$) from "DELLERA"."MLOG$_TEST_T3" where snaptime$$ <= :1 group by dmltype$$
Bind#0
kxsbbbfp=08d1c7fc bln=07 avl=07 flg=05
value="8/4/2009 16:6:45"
The lines are also reported indented by recursive level (xdep), to strees their hierarchy.
This kind of report is extremely useful when investigating how an application interacts with the database (for example, the fragment above
is obtained from the trace of a materialized view fast refresh).
You can generate an EXEC flow by clicking the "options" button of the lines pane and then the
"Exec Flow analysis" template button. Of course, you can then further customize the output by switching the flags (for example,
hide the bind values, or make other figures visible). You can of course change the colors of the lines or hide them (maybe
by sql text), etc etc.
References
Copyright © 2008 - 2010 Alberto Dell'Era. All rights reserved.