# CBO: NewDensity for Frequency Histograms,11g-10.2.0.4 (densities part IV)

**NewDensity with the “half the least popular” rule disabled**

To disable the new rule, just switch off the patch 5483301:

alter session set “_fix_control”=’5483301:off’;

(or alter system if you want to make it permanent)

with this setting, NewDensity becomes simply

NewDensity = 0.5 / num_rows

and hence, again for non-existent values:

E[card] = 0.5

which is exactly what we got in pre-10.2.0.4, where density was used (and density was, and is still, set to 0.5 / num_rows by dbms_stats). So the cardinality estimate is 0.5 (rounded up to 1).

For our test case, we predict NewDensity = 0.5 / 216 = 0.002314815. In fact our 10053 traces tell us:

NewDensity:0.002315, OldDensity:0.002315 BktCnt:216, PopBktCnt:216, PopValCnt:4, NDV:4

Table: T Alias: NOT_EXISTENT_OFF

Card: Original: 216.000000 Rounded: 1 Computed: 0.50 Non Adjusted: 0.50

The rationale for this behaviour is sound; the CBO knows that no row with the requested value existed at statistics collection time, hence it returns the minimal cardinality estimate compatible with the non empty result set assumption (check this post for the importance of this assumption). If the statistics are reasonably fresh, this is the only sane estimate that can be made.

**Playing with density – a warning**

If you set your own column stats using dbms_stats.set_column_stats, the behaviour is different; I haven’t made any extensive investigations but as far as I can tell, the value you provide for density is used instead of NewDensity. User-provided column statistics are flagged with dba_tab_cols.user_stats = ‘YES’. You can disguise your user statistics as non-user by setting the flags parameter of dbms_stats.set_column_stats to 2 – but since the latter parameter is labeled as “for Oracle internal use only”, I would do it only for investigations purposes – that is, never in production.

—

*Bottom note about singleton values*: actually in pre-10.2.0.4 versions, if the value was present in the Frequency histogram but covering a single bucket (hence it was present in the table exactly one time at statistic collection time), it used to be classified as “unpopular” and hence used to get the same treatment as a value not in the histogram – the end result being that the cardinality was estimated as 0.5 rounded up to 1; now it is 1 before rounding as one would intuitively expects. I hope to be able to investigate whether this change fixes the issues about join cardinality estimation I investigated – see “the mystery of halving” in this investigation of mine if interested.

Other posts belonging to this series:

Page 2 of 2 | Previous page

Blogroll Report 16/10/2009-23/10/2009 « Coskan’s Approach to Oracle:

November 3rd, 2009 at 14:48

[...] Alberto Dell’Era-CBO: NewDensity for Frequency Histograms,11g-10.2.0.4 (densities part IV) [...]

Randolf Geist:

December 3rd, 2009 at 16:38

Alberto,

regarding your comment “Playing with density”: I’ve suggested in a recent discussion to refer to your excellent blog series on Oracle-l (and try to set the density manually to get the “old” density behaviour back) – yesterday Wolfgang Breitling mentioned when we met at the UKOUG that manually setting the density actually removes the histogram, and generating a histogram manually will still use the NewDensity but not the density provided as part of the user-generated statistics.

I haven’t had time yet to test this personally, but now that Wolfgang has mentioned it I’m quite sure I remember that he’s right regarding setting the density manually – it removes the complete histogram immediately.

Can you comment on what you meant by above: “If you set your own column stats using dbms_stats.set_column_stats…”?

Have you tried to modify the density only or generating your own histogram?

Randolf

Alberto Dell'Era:

December 3rd, 2009 at 17:59

Hi Randolf,

thank you very much for your positive feedback, I really appreciate it – as you “may” know, it takes a lot of time to write about this kind of things trying to be clear and exhaustive …

I use the following tiny SP to set the density while preserving the histogram:

I’ve tested it right now again, albeit on 11.1.0.7 only, and it actually preserves the histogram. It is a script I’ve used for some years now.

Notice how it works : it fetches the histogram from the data dictionary, and rewrites it back verbatim with a new density, possibly changing the “flags” as well …

Randolf Geist:

December 3rd, 2009 at 18:44

Alberto,

thanks for the clarification.

So you mean to say: Using your SP you’ve manually modified the density while preserving the histogram – and the CBO uses then the manually modified density from the dictionary for calculation but not the NewDensity any longer?

I thought that Wolfgang meant to say that he has generated a histogram manually along with a density, but the CBO sill applied the NewDensity algorithm to his made-up histogram.

At least two things come into my mind that might matter:

- I don’t know the version he has used to test

- He seems to have setup a made-up histogram rather than preserving an existing one, however I don’t see where this exactly should make any difference

Anyway, thanks for the feedback. If I find some time I’ll test it myself resp. ask Wolfgang for clarification what he did and what version he used to test.

Randolf

Alberto Dell'Era:

December 3rd, 2009 at 20:18

Randolf,

>So you mean to say: Using your SP you’ve manually modified the density

>while preserving the histogram – and the CBO uses then the manually

>modified density from the dictionary for calculation but not the NewDensity any longer?

Yes if you use the default and only supported value (null) for the “flags” parameter of dbms_stats.set_column_stats – as far as I understand, in this case the statistics get labeled as “user generated stats” (dba_tab_cols.user_stats = ‘YES’). If you set “flags” to 2, you are pretending to be dbms_stats or a system routine (dba_tab_cols.user_stats = ‘NO’) and hence NewDensity gets used.

It makes sense – if the user has provided her own value for “density”, use it instead of NewDensity.

NB: I have exchanged in the past many, many emails with Wolfgang and I’ve noticed that He commonly uses flags=2 in his scripts … that should explain everything nicely :)

Xiang Rao:

August 22nd, 2012 at 20:59

Hi Alberto,

I am wondering if the num_rows inside the “half the least popular” formula NewDensity = 0.5 * bkt(least_popular_value)/num_rows should be the last endponiter_number? In your case, both of your row count and the buckets are the same: 216.

I have a table with 99,826,738 rows, the concerned column has 60 distinct values ranged between 3 and 451, with frequncy histogram of 17 buckets, and density 5.17e-9. The last endponiter_number is 5,327 and the least popular value 303 has only 1 bucket. For value 4 (not any of the endpoint_value, but beteween first 2 values), Oracle gives cardinality estimate as 9,370 and newDensity as 0.000094, which is 0.5*bkt(303)/last_endpoint_number = 0.5*1/5,327. I was looking for the theory behind the newDensity value 0.000094, and google brought me to your article again.

Thanks,

Xiang

Alberto Dell'Era:

August 25th, 2012 at 18:56

Hi Xiang,

yes, you’re right – in all my examples I have always used no sampling when gathering statistics (to keep the discussion simple and the examples deterministic), and when no sampling, the last endpoint_number is always exactly the same as num_rows. But it is the last endpoint_number that is to be used in the formula, as you kindly point out – which makes a difference to you since you are sampling for sure.

Just to be complete, here’s a demo that shows the effect of sampling on endpoint_number for a table with num_rows=100000:

Thanks for commenting!

Rickey:

January 14th, 2013 at 05:34

Definitely believe that that you stated. Your

favorite reason seemed to be on the net the easiest thing to

take into accout of. I say to you, I definitely get annoyed even as folks think

about concerns that they just do not know about.

You managed to hit the nail upon the top as well as defined out

the whole thing with no need side-effects , other people could take a signal.

Will likely be again to get more. Thanks

Sayan Malakshinov:

December 3rd, 2013 at 21:43

Alberto,

could you tell me please: is this my view dba_newdensity written correctly?

Sayan Malakshinov:

December 3rd, 2013 at 21:49

without unnecessary:

Alberto Dell'Era:

January 25th, 2014 at 16:26

Sayan,

sorry for the long delay, my notification module has gone berserk again ;(

There was a NewDensity calculation function inside the test case, have you checked that ? As far as I know my function is correct; if that’s not the case, I would like to know the reason