Ticket #841 (closed Bug: fixed)
Fulltext Search extremly slow with large result sets (>1k docs)
|Reported by:||Matthias Bauer <matthias.bauer.drs@…>||Owned by:||somebody|
|Component:||Repository - querying and indexing||Version:||2.4|
Fulltext search was extremely slow with large result sets. Example: The search for a quite common search term returns some 12k documents. This search took 45 secs for initial query and 15 secs for successive queries. Executing the same query on the same index data using lucene's index analysis tool luke (https://code.google.com/p/luke/) just takes some 20 ms.
Profiling showed that retrieving the dataset in DocumentHitCollector?.java:71 is causing this slow down. At this place, we were retrieving the complete data set including the full text-extract ("content" field). But in this part of the code we only need the DocID, BranchID and LanguageID fields. The fix (see attached patch) instructs lucene to only retrieve the fields we need from the index.
This improves the search time for the example a lot. We get 17 secs for the initial search and 280 ms for successive searches (when everything is cached).