﻿ Alberto Dell’Era’s Oracle blog » overlapping ranges with priority

### overlapping ranges with priority

A customer of ours (a leading Italian consumer goods retailer) has asked us to solve the following problem, that occurs quite frequently and that is not trivial to solve efficiently - and that is very interesting to design and fun to blog about!

the problem
Prices of skus (i.e. goods) have validity ranges (time intervals) and can overlap; on an overlapping range, the strongest priority (lower number) wins. In pictures:

```b---(0,\$200)---d
c---(1,\$300)---e
```

the expected output is, since priority 0 is stronger then 1:

```b----(\$200)----d---(\$300)---e
```

I'm going to illustrate in detail the pure-SQL algorithm I have designed, and then discuss about its performance and efficiency. As usual, everything I discuss is supported by an actual demo. Please note that the algorithm uses analytics functions very, very heavily.

pure SQL solution
The input table:

```create table ranges (
sku   varchar2(10) not null,
a     int not null,
b     int not null,
prio  int not null,
price int not null
);
alter table ranges add constraint ranges_pk primary key (sku, a, b);
```

Let's provide the opening example as input:

```insert into ranges(sku, a, b, prio, price) values ('sku1', ascii('b'), ascii('d'), 0, 200);
insert into ranges(sku, a, b, prio, price) values ('sku1', ascii('c'), ascii('e'), 1, 300);
```

The algorith is implemented as a single view; let's comment each step and show its output over the example:

```create or replace view ranges_output_view
as
```

The instants in time where the ranges start or begin:

```with instants as (
select sku, a as i from ranges
union
select sku, b as i from ranges
),
```

Output: b,c,d,e.

The base ranges, i.e. the consecutive ranges that connect all the instants:

```base_ranges as (
select *
from (
select sku,
i as ba,
lead(i) over (partition by sku order by i) as bb
from instants
)
where bb is not null
),
```
` b------c------d------e `

The original ranges factored over the base ranges; in other words, "cut" by the instants:

```factored_ranges as (
select i.sku, bi.ba, bi.bb, i.a, i.b, i.prio, i.price
from ranges i, base_ranges bi
where i.sku = bi.sku
and (i.a <= bi.ba and bi.ba < i.b)
),
```
```b---(0,\$200)---c---(0,\$200)---d
c---(1,\$300)---d---(1,\$300)---e
```

Then, let's filter out the factored ranges with weaker priority (that have a stronger priority range with the same extremes "covering" them):

```strongest_factored_ranges as (
select sku, ba, bb, prio, price
from (
select sku, ba, bb, prio, price,
dense_rank () over (partition by sku, ba, bb order by prio) as rnk
from factored_ranges
)
where rnk = 1
),
```
```b---(0,\$200)---c---(0,\$200)---d---(1,\$300)---e
```

The problem could be now considered solved, if you could live with consecutive intervals showing the same price (such as b--c and c--d above). If you can't for whatever reason (I couldn't), we can join them using analytics again in a way similar to this asktom technique (look at the bottom for "Analytics to the Rescue (Again)").
First, we calculate "step", a nonnegative number that will be zero if a range can be joined to the previous one, since:
a) they are consecutive (no gap between them)
b) they have the same price:

```ranges_with_step as (
select sku, ba, bb, prio, price,
decode ( price, lag(price) over (partition by sku order by ba),  ba - lag(bb) over (partition by sku order by ba), 1000 ) step
from strongest_factored_ranges
),
```
```RANGE_CODED                    STEP
------------------------ ----------
b---(0,\$200)---c               1000
c---(0,\$200)---d                  0
d---(1,\$300)---e               1000
```

Then we compute the integral of step over the ranges; joinable ranges will hence have the same value for "interval" since step is zero:

```ranges_with_step_integral as (
select sku, ba, bb, prio, price, step,
sum(step) over (partition by sku order by ba rows between unbounded preceding and current row) as integral
from ranges_with_step
),
```
```RANGE_CODED                INTEGRAL
------------------------ ----------
b---(0,\$200)---c               1000
c---(0,\$200)---d               1000
d---(1,\$300)---e               2000
```

The joined joinable ranges :

```ranges_joined as (
select *
from (
select sku, ba, bb, prio, price, step, integral,
min(ba) over (partition by sku, integral) as a,
max(bb) over (partition by sku, integral) as b
from ranges_with_step_integral
)
where step > 0
)
select sku, a, b, price from ranges_joined;
```
```b---(0,\$200)---c---(1,\$300)---e
```

predicate-"pushability"
The first desirable property of this view is that a predicate (such as an equality predicate, but it works even for the "between" operator, less-than, etc) on sku can be pushed down the view to the base tables. For:

```select * from ranges_output_view where sku = 'k100';
```

the plan is:

```----------------------------------------------------------
| Id  | Operation                            | Name      |
----------------------------------------------------------
...
|  10 |           TABLE ACCESS BY INDEX ROWID| RANGES    |
|* 11 |            INDEX RANGE SCAN          | RANGES_PK |
...
|* 17 |                INDEX RANGE SCAN      | RANGES_PK |
|* 18 |                INDEX RANGE SCAN      | RANGES_PK |
----------------------------------------------
---
11 - access("I"."SKU"='k100')
---
17 - access("SKU"='k100')
18 - access("SKU"='k100')
```

That means that only the required SKU(s) are fed to the view, and proper indexes (such as RANGES_PK in this case) can be used. So, if you need to refresh only a few skus the response time is going to be almost istantaneous - provided that you have only sane (a few) ranges per sku. Hence you can use the same view for both calculating prices of all skus (say, in a nightly batch) and calculating a small subset of skus (say, online), and that is a great help for maintenance and testing.

running in parallel
Another desirable property is that the view can operate efficiently in parallel, at least in 11.2.0.3 (I have not tested other versions):

```-------------------------------------------------------------------
| Operation                                   |IN-OUT| PQ Distrib |
-------------------------------------------------------------------
| SELECT STATEMENT                            |      |            |
|  PX COORDINATOR                             |      |            |
|   PX SEND QC (RANDOM)                       | P->S | QC (RAND)  |
|    VIEW                                     | PCWP |            |
|     WINDOW SORT                             | PCWP |            |
|      VIEW                                   | PCWP |            |
|       WINDOW SORT                           | PCWP |            |
|        VIEW                                 | PCWP |            |
|         WINDOW BUFFER                       | PCWP |            |
|          VIEW                               | PCWP |            |
|           WINDOW SORT PUSHED RANK           | PCWP |            |
|            HASH JOIN                        | PCWP |            |
|             PX RECEIVE                      | PCWP |            |
|              PX SEND HASH                   | P->P | HASH       |
|               PX BLOCK ITERATOR             | PCWC |            |
|                TABLE ACCESS FULL            | PCWP |            |
|             PX RECEIVE                      | PCWP |            |
|              PX SEND HASH                   | P->P | HASH       |
|               VIEW                          | PCWP |            |
|                WINDOW SORT                  | PCWP |            |
|                 PX RECEIVE                  | PCWP |            |
|                  PX SEND HASH               | P->P | HASH       |
|                   VIEW                      | PCWP |            |
|                    SORT UNIQUE              | PCWP |            |
|                     PX RECEIVE              | PCWP |            |
|                      PX SEND HASH           | P->P | HASH       |
|                       UNION-ALL             | PCWP |            |
|                        PX BLOCK ITERATOR    | PCWC |            |
|                         INDEX FAST FULL SCAN| PCWP |            |
|                        PX BLOCK ITERATOR    | PCWC |            |
|                         INDEX FAST FULL SCAN| PCWP |            |
-------------------------------------------------------------------
```

There's no point of serialization (all servers communicate parallel-to-parallel), the rows are distributed evenly using an hash distribution function (probably over the sku) and all operations are parallel.

sku subsetting and partitioning
It is well known that analytics functions use sort operations heavily, and that means (whether or not you are running in parallel) that the temporary tablespace may be used a lot, possibly too much - as it actually happened to me , leading to (in my case) unacceptable performance.

Side note: as I'm writing this, I realize now that I had probably been hit by the bug illustrated by Jonathan Lewis in Analytic Agony, but of course, overuse of temp could happen, for large datasets, without the bug kicking in.

A possible solution is to process only a sub-batch of the skus at a time, to keep the sorts running in memory (or with one-pass to temp), leveraging the predicate-pushability of the view. In my case, I have made one step further: I have partitioned the table "ranges" by "sku_group", replaced in the view every occurrence of "sku" with the pair "sku_group, sku", and then run something like:

``` for s in (select sku_group from "list of sku_group") loop
select .. from ranges_output_view where sku_group = s.sku_group;
end loop;
```

The predicate gets pushed down to the table, hence partition elimination kicks in and I can visit the input table one time only, one partition at a time, using a fraction of the resources at a time, and hence vastly improving performance.

And that naturally leads to "do-it-yourself parallelism": running a job for every partition in parallel. I'm going to implement it since the customer is salivating about it ... even if it is probably over-engineering :D .

1. #### Overlapping ranges with priority | An Oracle Programmer

Saturday, March 22, 2014

[...] few years ago, Alberto Dell’Era blogged about product prices with overlapping date ranges; “on an overlapping range, the strongest priority (lower number) wins.” His analysis, [...]

Thursday, August 14, 2014

Hence, a reasonable service warranty gets counted as an critical trait and top
quality of a fantastic plumber. The best place to look will
have to be on the internet.

3. #### city2345.com

Friday, August 15, 2014

Those who are interested in nature trails and campsites, the key location are in Mt.
New York City has two main Airports, La Guardia and JFK Airport.
Some people take a couple weeks every summer, or whenever they have the most time off, and target one of these fantastic tourist destinations.

4. #### anxiety treatment online

Sunday, August 17, 2014

My brother recommended I might like this website. He was entirely
right. This post truly made my day. You cann’t imagine simply how much time I had
spent for this info! Thanks!

5. #### css slider code

Monday, August 18, 2014

Hi, I think your blog might be having browser
compatibility issues. When I look at your blog in Opera, it looks fine but when opening in Internet
Explorer, it has some overlapping. I just wanted to give you
a quick heads up! Other then that, wonderful blog!

Please enclose SQL code inside the tag pair [sql] ... [/sql]

Subscribe without commenting