Skip to content
This repository has been archived by the owner on Jun 22, 2020. It is now read-only.

think about pagination #15

Open
jeremybmerrill opened this issue Apr 5, 2016 · 0 comments
Open

think about pagination #15

jeremybmerrill opened this issue Apr 5, 2016 · 0 comments

Comments

@jeremybmerrill
Copy link
Contributor

There are some cases where the question of what to consider as a "document" -- i.e. the fundamental unit of search indexing in ElasticSearch -- is questionable.

Two prototypical cases:

  • Really long documents, like hundred-some page reports. These are hard because they often cover multiple topics and it's hard to get ElasticSearch to tell us where in that sort of document a hit occurs. The temptation is to split them into chapters or individual pages for indexing. But then you may want to continue reading the whole document
  • Smushed together documents. Sometimes FOIAs show up as one (or a few) PDFs with multiple responsive documents all squished together in one PDF. These documents are sometimes multiple pages long. Indexing, say, 5 very long documents is not a good idea, since the documents don't have anything in common. But splitting on pages, again, separates the pages of multi-page documents.

Possible solutions:

  • add an additional field in ElasticSearch and a button in the interface to go the next/prev page in a multi-page document (regardless of type).
  • continue to tweak elasticsearch to store locations so we can scroll you to the location of your hits in the detail view.
  • other ideas???
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant