Ticket #89 (new task)

Opened 3 years ago

Blob extracted content cache

Reported by: bruno Owned by: bruno
Priority: major Milestone:
Component: Indexer Version:
Keywords: Cc: lily-developers@…

Description

When a record with blob fields needs to be reindexed, the content for those blobs needs to be re-extracted (via tika) each time.

To avoid re-extracting the content from blobs when only non-blob fields have changed, we could keep a 'blob extracted content cache' (e.g. in the form of an HBase table).

Note: See TracTickets for help on using tickets.