Conclusion

Starting from Jonathan Lewis' test case "Discrete Dangers" on page 126 of his book "Cost Based Oracle",
we have investigated (and illustrated with graphs) the CBO cardinality estimation algorithm for range-based predicates:
```  select x from t where x >  low_x and x <  high_x  (open  , open  )
select x from t where x >= low_x and x <= high_x  (closed, closed)
select x from t where x >  low_x and x <= high_x  (open,   closed)
select x from t where x >= low_x and x <  high_x  (closed, open)

template:
select x from t where x #low low_x and x #high high_x
#low  in (">", ">=")
#high in ("<", "<=")
```
when column x has no associated histogram.

Letting
```  min_x = min (x) over all rows
max_x = max (x) over all rows
B = (max_x - min_x) / num_distinct (Band width)
```
and defining
```  left band      : min_x     < x < min_x + B
central region : min_x + B < x < max_x - B
right band     : max_x - B < x < max_x
```
we have seen that
• For ranges completely contained in one of the two bands
```     min_x     <= low_x < high_x <= min_x + B
or max_x - B <= low_x < high_x <= max_x
```
the cardinality can depend on both low_x and high_x, only low_x, only high_x, or none (= constant), depending on the value of #low and #high, in a somewhat counterintuitive way. We have found the relevant formulae.
• In all the other cases, a slightly modified (modifications underlined) version of Jonathan Lewis' "standard formula" applies:
```  cardinality = num_rows * (high_x_effective - low_x_effective) / (max_x - min_x)
+ num_rows * correction_for_or_equal_operators
- num_rows * correction_for_special_case

if #low = ">=" and low_x inside the left band then
low_x_effective = min_x + B
else
low_x_effective = low_x
end

if #high = "<=" and high_x inside the right band
high_x_effective = max_x - B
else
high_x_effective = high_x
end

correction_for_or_equal_operators = 0
+ 1/num_distinct if #low  = ">="
+ 1/num_distinct if #high = "<="

correction_for_special_case = 0
+ 1/num_distinct if low_x  = min_x and #low  = ">"
+ 1/num_distinct if high_x = max_x and #high = "<"
```
This fitness test verifies the formula against the CBO, using its pl/sql implementation.
We have also graphically illustrated, hopefully for a better intuitive understanding, the case of selection over both an infinitesimal and a finite range for the (open,open) and (closed, closed) cases.

The discussion starts here.

And here's my speculation about the rationale behind this strange (but quite sound in hindsight) behaviour.

For corrections / feedback:
alberto.dellera@gmail.com