overlapping ranges with priority

A customer of ours (a leading Italian consumer goods retailer) has asked us to solve the following problem, that occurs quite frequently and that is not trivial to solve efficiently – and that is very interesting to design and fun to blog about!

the problem

Prices of skus (i.e. goods) have validity ranges (time intervals) and can overlap; on an overlapping range, the strongest priority (lower number) wins. In pictures:

b---(0,$200)---d

c---(1,$300)---e

the expected output is, since priority 0 is stronger then 1:

b----($200)----d---($300)---e

I’m going to illustrate in detail the pure-SQL algorithm I have designed, and then discuss about its performance and efficiency. As usual, everything I discuss is supported by an actual demo. Please note that the algorithm uses analytics functions very, very heavily.

pure SQL solution

The input table:

create table ranges (

sku varchar2(10) not null,

a int not null,

b int not null,

prio int not null,

price int not null

);

alter table ranges add constraint ranges_pk primary key (sku, a, b);

Let’s provide the opening example as input:

insert into ranges(sku, a, b, prio, price) values ('sku1', ascii('b'), ascii('d'), 0, 200);

insert into ranges(sku, a, b, prio, price) values ('sku1', ascii('c'), ascii('e'), 1, 300);

The algorith is implemented as a single view; let’s comment each step and show its output over the example:

create or replace view ranges_output_view

as

The instants in time where the ranges start or begin:

with instants as (

select sku, a as i from ranges

union

select sku, b as i from ranges

),

Output: b,c,d,e.

The base ranges, i.e. the consecutive ranges that connect all the instants:

base_ranges as (

select *

from (

select sku,

i as ba,

lead(i) over (partition by sku order by i) as bb

from instants

)

where bb is not null

),

b------c------d------e

The original ranges factored over the base ranges; in other words, “cut” by the instants:

factored_ranges as (

select i.sku, bi.ba, bi.bb, i.a, i.b, i.prio, i.price

from ranges i, base_ranges bi

where i.sku = bi.sku

and (i.a <= bi.ba and bi.ba < i.b)

),

b---(0,$200)---c---(0,$200)---d

c---(1,$300)---d---(1,$300)---e

Then, let’s filter out the factored ranges with weaker priority (that have a stronger priority range with the same extremes “covering” them):

strongest_factored_ranges as (

select sku, ba, bb, prio, price

from (

select sku, ba, bb, prio, price,

dense_rank () over (partition by sku, ba, bb order by prio) as rnk

from factored_ranges

)

where rnk = 1

),

b---(0,$200)---c---(0,$200)---d---(1,$300)---e

The problem could be now considered solved, if you could live with consecutive intervals showing the same price (such as b–c and c–d above). If you can’t for whatever reason (I couldn’t), we can join them using analytics again in a way similar to this asktom technique (look at the bottom for “Analytics to the Rescue (Again)”).

First, we calculate “step”, a nonnegative number that will be zero if a range can be joined to the previous one, since:

a) they are consecutive (no gap between them)

b) they have the same price:

ranges_with_step as (

select sku, ba, bb, prio, price,

decode ( price, lag(price) over (partition by sku order by ba), ba - lag(bb) over (partition by sku order by ba), 1000 ) step

from strongest_factored_ranges

),

RANGE_CODED STEP

------------------------ ----------

b---(0,$200)---c 1000

c---(0,$200)---d 0

d---(1,$300)---e 1000

Then we compute the integral of step over the ranges; joinable ranges will hence have the same value for “interval” since step is zero:

ranges_with_step_integral as (

select sku, ba, bb, prio, price, step,

sum(step) over (partition by sku order by ba rows between unbounded preceding and current row) as integral

from ranges_with_step

),

RANGE_CODED INTEGRAL

------------------------ ----------

b---(0,$200)---c 1000

c---(0,$200)---d 1000

d---(1,$300)---e 2000

The joined joinable ranges :

ranges_joined as (

select *

from (

select sku, ba, bb, prio, price, step, integral,

min(ba) over (partition by sku, integral) as a,

max(bb) over (partition by sku, integral) as b

from ranges_with_step_integral

)

where step > 0

)

select sku, a, b, price from ranges_joined;

b---(0,$200)---c---(1,$300)---e

predicate-”pushability”

The first desirable property of this view is that a predicate (such as an equality predicate, but it works even for the “between” operator, less-than, etc) on sku can be pushed down the view to the base tables. For:

select * from ranges_output_view where sku = 'k100';

the plan is:

Page 1 of 3 | Next page