CBO: the formula for the “density” column statistic (densities part II)

The meaning of the above particular shape of w(:x) is that the probability that the client submits a certain value for :x is proportional to the number of rows (in the NPS) that has that value; more precisely, that if X% of rows has a certain common value, X% of user-submitted statements that “hit” the NPS will ask for that value. Under this assumption, dbms_stats precomputes “density” to give back the above E[card] when the literal is known to be not popular, hence hitting the NPS – remember that the CBO operates under the “non-empty result set assumption“, hence if the literal does not hit a popular value, it must hit a value of the NPS.

The above assumption for w(:x) is quite a strange assumption – and in fact, we will see in the next post that in 11g (and 10.2.0.4), this assumption has been dropped and replaced with a more standard one. The “density” column statistics is in fact ignored in 10.2.0.4+ and a value computed at run-time, named “newDensity” in 10053 trace files, is used instead.

Other posts belonging to this series:

densities part I

densities part III

densities part IV

Page 2 of 2 | Previous page

6 comments on this post.
  1. Rudy:

    I checked Lewis’ book, but he doesn’t explain how he worked out the density formula – he will publish it in the upcoming book, so we have to wait (how long?!?).

    Your explanation about w(:x) is very interesting, although the post is hard for me to read with your notation. I must say it’s convincing, why is it strange for you?

    No doubt it’s an easy task for dbms_stats to compute the estimated value for count(:x), or E[count(:x)]… it’s count(:x) indeed, no need to estimate :-)))

  2. Alberto Dell'Era:

    @rudy

    In Lewis’ book the formula was introduced using words instead of math symbols but still understandable; Wolfgang uses a more formal mathematical definition.

    What I think is “strange” is the assumption about the shape of w(:x); why a customer that has ordered 100 books should ask about the order book list 100 times as frequently as another customer that has ordered only 1 book ? 11g (and 10.2.0.4) corrects this with “NewDensity”, as we will see in the next post.

    About E[count(:x)] = “observed count”: that is not true in general, since one might use a statistical model that infer E[count(:x)] from the observed count, for example taking into account the way the table is modified over time (e.g: that an insert_date column is constantly increasing its max value). That would be, of course, very complicated in practice, but statistically sound and possible.

  3. Alberto Dell’Era’s Oracle blog » CBO: “NewDensity” replaces “density” in 11g, 10.2.0.4 (densities part III):

    [...] a previous post, we already discussed the pre-10.2.0.4 scenario: we saw how and when the “density” column statistic [...]

  4. Blogroll Report 09/10/2009-16/10/2009 « Coskan’s Approach to Oracle:

    [...] Alberto Dell’Era-CBO: the formula for the “density” column statistic (densities part II) [...]

  5. Alberto Dell’Era’s Oracle blog » CBO: about the statistical definition of “cardinality” (densities part I):

    [...] posts belonging to this series: densities part II densities part III densities part [...]

  6. density estimate:

    [...] bone reduces resulting in osteoporosis with porous bone fragility and high risk of bone fracture,Alberto Dell’Era’s Oracle blog CBO: the formula for the …In this post we are going to explore and explain the rationale for the formula used by dbms_stats to [...]

Leave a comment