OLTP compression: migrated rows are compressed

In his articles Compression in Oracle – Part 2: Read-Only Data and Compression in Oracle – Part 3: OLTP Compression, Jonathan Lewis has shown that block (re)compression is never attempted on updates – it is attempted only on inserts (and, of course, only if the used space crosses the PCTFREE threshold).

Now, since a row migration could be considered a row re-insertion – does it trigger a compression attempt of the block that the row is migrated into? The answer is yes, as I will show you in a while.

It is worth remembering that row migration can be quite dramatic in an OLTP-compressed table, because (as shown by Jonathan above) when a tokenized column of a compressed row is updated to a new value, the whole token is expanded, modified and left uncompressed, even if the new value could be substituted by a token already present in the block. Hence any kind of update, even the tiniest, has the potential to inflate the row considerably, and of course, if the row can’t fit into the free space, the row has to be migrated to a new block.

Compressing migrated rows has the benefit that at least the whole size of the table is not probably going to grow much, assuming a reference scenario where rows were inserted randomly in the first place (i.e. without cleverly colocating similar rows in the same block, e.g. by inserting them in bulk using a well-designed order by), and assuming that the updating process that causes migration is random as well (which is probably almost always true). The “only” overhead is the additional block get necessary to follow the rowid that acts as the “forwarding address” from the original row position (where only the row header is stored) to the new (where the row piece is now stored).

Side note: it’s interesting to note that this overhead is not present when doing a full table scan, since full scans simply do not follow the “forwarding address” since they are going to find the row piece anyway (a well-known fact that is also checked in my test case for completeness). Since, as reasoned about above for our reference scenario, the table size probably does not change much, a frequently-full-scanning DWH is probably going to enjoy similar performance even after updates (and this is a bit ironic for a feature named “OLTP compression”); not so for OLTP systems or DWHs that use single row access a lot (e.g. by following rowids from bitmap indexes), that have to pay the migration penalty.

But let’s leave speculations alone, and get back to hacking …

test case: migrated rows are compressed

Note: the test case (main.sql), its spool (main.lst) and block dumps (main.trc) are available in oltp_compress_migrated_rows.zip, alongside other test cases mentioned in the following.

For simplicity, we are going to use a tablespace with segment space managed manually, i.e. where free blocks are linked together by one freelist, and with a block size of 4K; Oracle version is 11.2.0.3.

The test table is (note the PCTFREE set to 50%):

create table t (x varchar2(400 char))

pctfree 50

compress for oltp

;

The initial row set is generated by:

insert /*+ append */ into t

select rpad( 'A', 400, 'A') from dual connect by level <= 10000;

That is, all identical rows, all chars set to ‘A’ (coded as 0×41 in the WE8ISO8859P1 single-byte charset of my test database). Note that each row is sized 400 bytes and hence, when not compressed, only 10 rows can fit in a 4K block, and only 5 rows would be inserted given the PCTFREE=50 setting.

The block dump of the first block after the segment header shows that we have 2017 bytes free (tosp=0×7e1), as expected. In the row dump we find, in order, first the (lonely) token:

tab 0, row 0, @0xdf3

tl: 405 fb: --H-FL-- lb: 0x0 cc: 1

col 0: [400] {repeat 41, 400 times}

bindmp: 00 8a fa 01 90 {repeat 41, 400 times}

and then 137 highly-compressed rows, all with the same structure, referencing the token:

tab 1, row 0, @0xdee

tl: 5 fb: --H-FL-- lb: 0x0 cc: 1

col 0: [400] {repeat 41, 400 times}

bindmp: 2c 00 01 01 00

Note: check the above mentioned articles by Jonathan for a detailed explanation of the compression format. The bindmp line shows what’s actually stored in the block: “2c 00 01″ are the flag byte, the lock byte, and the column count; “01″ means “1 token follows” and “00″ is the pointer to token zero.

Let’s update every row to lowercase. i.e. to a string of 400 ‘a’ (0×61), thus causing massive row migration:

update t set x = lower(x);

Page 1 of 2 | Next page