Ticket #312 (new enhancement)

Opened 19 months ago

RowLog global queue rowkeys could be shorter

Reported by: bruno Owned by: evert
Priority: major Milestone:
Component: RowLog Version:
Keywords: Cc:

Description

Each rowkey in the global rowlog queue table contains the full subscriptionname. For the indexer, these names have the form "IndexUpdater_{indexname}". This will typically be 20-40 bytes.

In addition, there are two longs (timestamp and sequencenr) and the original row key (for plain uuid=17 bytes).

If we could map the subscription name onto a short number (2 or 4 bytes), and maybe reduce the sequence number to 4 bytes (or use a varlength encoding), we could save quite some bytes, possibly dividing by two or more.

The rowkey is also repeated in all the delete thombstones, so keeping it short will save there too.

The importance of short keys probably should not be overestimated either, since a lot depends on the record id's a user chooses. Often users use their own unique id's leading to keys like 'USER.1bef2ccd-24c8-3293-95b3-2d8bd2eb556b' which uses 38 bytes rather than 17. Still it would be good to optimize what we can.

Important: the current approach which stores the full rowlog name has the "advantage" that one can delete and recreate an index and that the existing messages in the queue will still be processed. This has actually been used by users to deal with "stuck messages" due to rowlog bugs. This behavior could be kept if we don't delete entries from the mapping ever and reuse the existing mappings. Obviously, this behavior can also have undesirable results, e.g. very old messages being re-executed when adding an index with the same name as one that existed long ago. Therefore we should have a cleanup tool, but that's another issue.

Note: See TracTickets for help on using tickets.