CBO: “NewDensity” replaces “density” in 11g, 10.2.0.4 (densities part III)

In this post we are going to explore and explain the rationale for the formula used by the CBO to compute the “NewDensity” figure that replaces, from 10.2.0.4 onwards, the “density” column statistic in the cardinality estimation formulae for columns with height-balanced (HB) histograms defined.

In a previous post, we already discussed the pre-10.2.0.4 scenario: we saw how and when the “density” column statistic is used in the cardinality formula for equality filter predicates, we explained its statistical rationale and defining formula, introduced the concept of the NPS (Not Popular Subtable), and built a test case. Now we are going to use the very same test case and explain the differences in the most recent versions (the previous post zip file contains logs for them also).

To summarize the test case – we have a table T with a single column VALUE, exponentially distributed, and with a SIZE 5 Height-Balanced histogram collected on. The histogram is:

SQL> select ep, value, popularity from formatted_hist;

EP VALUE POPULARITY

---------- ---------- ----------

0 1 0

1 16 0

5 64 1

Thus, we have a single popular value, 64; all the others are unpopular.

In this “densities” series of post, we focus on a SQL statement that contains only an equality filter predicate on table T:

select ...

from t

where value = 2.4;

the literal value is not a popular value (but inside the 1-64 interval) and hence, in pre-10.2.0.4, the formula used for the expected cardinality calculation is equal to:

E[card] = density * num_rows;

We discussed, in the previous post, how density is carefully calculated by dbms_stats to get back the expected cardinality of the family (class) of all possible equality filter predicate statements that hit the NPS, under the usual “non-empty result set assumption” and the further (strange and strong) assumption that the more a value is represented in the NPS, the higher the probability that the value is used as the literal of the equality predicate (an assumption that mathematically translates into the formula “w(:x) = count(:x) / num_rows_nps”).

Switching to 10.2.0.4 – the formula for E[card] is still the same, but with “density” replaced by “NewDensity” (as hinted by the fact that “density” is reported as “OldDensity” in the 10053 trace files, as we are going to see in a moment):

E[card] = NewDensity * num_rows;

NewDensity is not stored anywhere in the data dictionary, but it is computed at query optimization time by the CBO (note that density is still computed by dbms_stats using the old formula, but then it is ignored by the CBO). The NewDensity formula is based mainly on some histogram-derived figures; using the same names found in 10053 traces:

NewDensity = [(BktCnt - PopBktCnt) / BktCnt] / (NDV - PopValCnt)

Where BktCnt (“Bucket Count”) is the number of buckets (the “N” in the “SIZE N” clause);

PopBktCnt (“Popular Bucket Count”) the number of buckets covered by the popular values;

PopValCnt (“Popular Value Count”) is the number of popular values; NDV (“Number of Distinct Values”) is the traditional name used by CBO developers for the num_distinct column statistic. With the exception of NDV, all these values are derived from the histogram.

Side note: if the numerator is equal to zero, NewDensity is set to 0.5 / num_rows, thus giving an E[card] = 0.5, as far as I have seen (not exaustively) in a few test cases; it looks like a lower-bound “sanity check”. The denominator cannot be zero for HB histograms.

To illustrate the formula: the histogram of our test case has 5 buckets, hence BktCnt=5; 64 is the only popular value, hence PopValCnt =1; this popular value covers 4 buckets (since its EP is 5 and the previous EP is 1), hence PopBktCnt=4; we know that the column has num_distinct=6, hence NDV=6. This is in fact what we see in the 10053 trace file (in 11.1.0.7 and 11.2.0.1):

SINGLE TABLE ACCESS PATH

Single Table Cardinality Estimation for T[T]

Column (#1):

NewDensity:0.040000, OldDensity:0.115789 BktCnt:5, PopBktCnt:4, PopValCnt:1, NDV:6

Using density: 0.040000 of col #1 as selectivity of unpopular value pred

Table: T Alias: T

Card: Original: 95.000000 Rounded: 4 Computed: 3.80 Non Adjusted: 3.80

So NewDensity = [(5-4)/5] / (6-1) = 1/25 = 0.04 and E[card]=0.04*95=3.8, which is exactly what we see in the above trace fragment.

Page 1 of 2 | Next page