Ticket #224 (closed enhancement: fixed)

Opened 2 years ago

Last modified 2 years ago

rowlog MQ optimization in message processing

Reported by: bruno Owned by: evert
Priority: major Milestone: 1.0
Component: RowLog Version:
Keywords: Cc:

Description

This is an idea to reduce the number of HBase operations done in the MQ-variant of the rowlog.

Currently, the processor reads a batch of records (default maximum 100), and processes them. As part of the processing of each individual message, a delete is performed on the MQ table after the message is processed. Rather than doing this for each message individually, we could do this per batch of processed messages using a multi-delete. Assuming full batches of 100 records are processed, this would mean a 100-to-1 reduction in the number of HBase operations (though that 1 operation will be somewhat heavier).

There is no danger in delaying these deletes, since when a message would not be removed from the MQ table, it will be processed again, but then it will already be marked as done in (or removed from) the row-local queue.

Change History

comment:1 Changed 2 years ago by evert

  • Status changed from new to closed
  • Resolution set to fixed

In [4760]:

Reducing the number of delete operations on the RowLogShard? HBase table by batching the messages to be removed.
A batch of messages to be removed are deleted with one delete operation, when the batchSize is reached, the last time a message was removed was 5min ago or when the next batch of messages is requested from the shard.
Fixes #224

comment:2 Changed 2 years ago by evert

In [4761]:

Caching the payload inside the RowLogMessage? so that it doesn't have to be retrieved multiple times when the message is being processed for multiple subscriptions (fixes #265)

Also adding some extra synchronization and checks to avoid conflicts and redundant work when removing messages in batch from the RowLogShard? (see #224)

Note: See TracTickets for help on using tickets.