CBO: NewDensity for Frequency Histograms,11g- (densities part IV)

As we have seen in the previous posts of this series, in 11g a new figure named “NewDensity” has been introduced as a replacement for the “density” column statistic for columns whose histogram has been collected; this change has been backported in also.

In the previous post we discussed how NewDensity influences the CBO cardinality estimate for Height-Balanced histograms; in this one we are going to investigate the same for Frequency Histograms. We will see that the most important change is the introduction of the “half the least popular” rule (see the “Histogram change” post by Jonathan Lewis, which distills the findings of Randolf Geist and Riyaj Shamsudeen) – a surprising rule that might easily cause trouble (in fact as Jonathan reports in the comments – bug 6082745 was opened against this rule).

The test case (script density_post_freq.sql) considers the same test statement we focused on in the previous post (a single equality filter predicate which asks for a value inside the min-max range of the column):

select ...

from t

where value = 64.5;

Of course we compute a Frequency instead of an Height-Balanced histogram, and use a slightly different value distribution in order to highlight the new rule:

SQL> select value, count(*)

2 from t

3 group by value

4 order by value;


---------- ----------

8 8

16 16

64 64

128 128

The histogram generated by the test case is (from DBA_HISTOGRAMS):


---------- ---------- ----------

8 8 8

16 24 16

64 88 64

128 216 128

VALUE is an abbreviation for ENDPOINT_VALUE, EP for ENDPOINT_NUMBER.

BKT is the number of buckets covered by the value (i.e.: EP minus the previous EP), that is, the number of rows whose column value was equal to VALUE at statistics collection time.

When the filter predicate selects a value contained in the histogram, the new releases behave the same as the old ones (but check the “bottom note about singleton values” at the bottom for a a minor but interesting detail): neither density nor NewDensity is used, and the cardinality estimate is the usual intuitive one. In the complementary case of a value not contained in the histogram (but still inside the min-max interval), the cardinality used to be calculated as density*num_rows and it is now NewDensity*num_rows. Note the simmetry with the Height-Balanced case: the formula is the same, with NewDensity simply replacing density.

NewDensity with the “half the least popular” rule active

By default the rule is active, and in this case, NewDensity is set to

NewDensity = 0.5 * bkt(least_popular_value) / num_rows

and hence, for non-existent values:

E[card] = (0.5 * bkt(least_popular_value) / num_rows) * num_rows

= 0.5 * bkt(least_popular_value)

For our test case, the least_popular_value is 8 and bkt(8) = 8, hence E[card] = 0.5 * 8 = 4 thanks to NewDensity being equal to 0.5 * 8 / 216 = 0.018518519. In fact, we can verify in the 10053 traces (in,, for our statement, that asks for a not-existent value (64.5), that E[card] and NewDensity are set as above:

NewDensity:0.018519, OldDensity:0.002315 BktCnt:216, PopBktCnt:216, PopValCnt:4, NDV:4

Using density: 0.018519 of col #1 as selectivity of unpopular value pred

Table: T Alias: NOT_EXISTENT

Card: Original: 216.000000 Rounded: 4 Computed: 4.00 Non Adjusted: 4.00

As another check, let’s see what happens if bkt(least_popular_value) = 1, that is, if there is (at least) one value that occurred exactly one time (a singleton value) at statistics collection time. Adding such a row to our test case is trivial (just uncomment the first insert row in the script); in this scenario, our formula above predicts E[card] = 0.5 with NewDensity = 0.5 / 217 = .002304147, and in fact (check the *_least_is_one.trc traces):

NewDensity:0.002304, OldDensity:0.002304 BktCnt:217, PopBktCnt:216, PopValCnt:4, NDV:5

Using density: 0.002304 of col #1 as selectivity of unpopular value pred

Table: T Alias: NOT_EXISTENT

Card: Original: 217.000000 Rounded: 1 Computed: 0.50 Non Adjusted: 0.50

note that E[card] gets rounded up from 0.5 to 1 (as usual).

What is the rationale behind this rule? Thanks to Randolf Geist (see the comment in Jonathan’s blog entry above), we know that it was introduced as a patch to solve one particular scenario (see bug 5483301) and then included in the main release, for some reason. Luckily, the rule can be disabled and the old sane behaviour can be restored.

Page 1 of 2 | Next page