[Task]: Setup scripts to populate the search index from the DB #2092

acouch · 2024-09-17T19:33:32Z

Migrated from navapbc#10
Originally created by @chouinar on Wed, 15 May 2024 17:26:42 GMT

Summary

Should be usable in a non-script way (ie. functions we can call with specific opportunity records - we’ll use it for tests as well)
Should only load records that aren’t drafts + have an opportunity status
Should make a new index with configurable values (number of shards)
Should setup an alias
Should use bulk uploads for performance
https://opensearch.org/docs/latest/im-plugin/index-templates/

Acceptance criteria

No response

#47) Fixes HHS#2092 Setup a script to populate the search index by loading opportunities from the DB, jsonify'ing them, loading them into a new index, and then aliasing that index. Several utilities were created for simplifying working with the OpenSearch client (a wrapper for setting up configuration / patterns) Iterating over the opportunities and doing something with them is a common pattern in several of our scripts, so nothing is really different there. The meaningful implementation is how we handle creating and aliasing the index. In OpenSearch you can give any index an alias (including putting multiple indexes behind the same alias). The approach is pretty simple: * Create an index * Load opportunities into the index * Atomically swap the index backing the `opportunity-index-alias` * Delete the old index if they exist This approach means that our search endpoint just needs to query the alias, and we can keep making new indexes and swapping them out behind the scenes. Because we could remake the index every few minutes, if we ever need to re-configure things like the number of shards, or any other index-creation configuration, we just update that in this script and wait for it to run again. I ran this locally after loading `83250` records, and it took about 61s. You can run this locally yourself by doing: ```sh make init make db-seed-local poetry run flask load-search-data load-opportunity-data ``` If you'd like to see the data, you can test it out on http://localhost:5601/app/dev_tools#/console - here is an example query that filters by the word `research` across a few fields and filters to just forecasted/posted. ```json GET opportunity-index-alias/_search { "size": 25, "from": 0, "query": { "bool": { "must": [ { "simple_query_string": { "query": "research", "default_operator": "AND", "fields": ["agency.keyword^16", "opportunity_title^2", "opportunity_number^12", "summary.summary_description", "opportunity_assistance_listings.assistance_listing_number^10", "opportunity_assistance_listings.program_title^4"] } } ], "filter": [ { "terms": { "opportunity_status": [ "forecasted", "posted" ] } } ] } } } ```

…avapbc#47) Fixes #2092 Setup a script to populate the search index by loading opportunities from the DB, jsonify'ing them, loading them into a new index, and then aliasing that index. Several utilities were created for simplifying working with the OpenSearch client (a wrapper for setting up configuration / patterns) Iterating over the opportunities and doing something with them is a common pattern in several of our scripts, so nothing is really different there. The meaningful implementation is how we handle creating and aliasing the index. In OpenSearch you can give any index an alias (including putting multiple indexes behind the same alias). The approach is pretty simple: * Create an index * Load opportunities into the index * Atomically swap the index backing the `opportunity-index-alias` * Delete the old index if they exist This approach means that our search endpoint just needs to query the alias, and we can keep making new indexes and swapping them out behind the scenes. Because we could remake the index every few minutes, if we ever need to re-configure things like the number of shards, or any other index-creation configuration, we just update that in this script and wait for it to run again. I ran this locally after loading `83250` records, and it took about 61s. You can run this locally yourself by doing: ```sh make init make db-seed-local poetry run flask load-search-data load-opportunity-data ``` If you'd like to see the data, you can test it out on http://localhost:5601/app/dev_tools#/console - here is an example query that filters by the word `research` across a few fields and filters to just forecasted/posted. ```json GET opportunity-index-alias/_search { "size": 25, "from": 0, "query": { "bool": { "must": [ { "simple_query_string": { "query": "research", "default_operator": "AND", "fields": ["agency.keyword^16", "opportunity_title^2", "opportunity_number^12", "summary.summary_description", "opportunity_assistance_listings.assistance_listing_number^10", "opportunity_assistance_listings.program_title^4"] } } ], "filter": [ { "terms": { "opportunity_status": [ "forecasted", "posted" ] } } ] } } } ```

acouch added the Migrated label Sep 17, 2024

acouch assigned chouinar Sep 17, 2024

acouch mentioned this issue Sep 17, 2024

[Task]: Setup scripts to populate the search index from the DB navapbc/simpler-grants-gov#10

Closed

github-project-automation bot added this to Simpler.Grants.gov Product Backlog Sep 17, 2024

github-project-automation bot moved this to Icebox in Simpler.Grants.gov Product Backlog Sep 17, 2024

acouch closed this as completed Sep 17, 2024

github-project-automation bot moved this from Icebox to Done in Simpler.Grants.gov Product Backlog Sep 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Task]: Setup scripts to populate the search index from the DB #2092

[Task]: Setup scripts to populate the search index from the DB #2092

acouch commented Sep 17, 2024

[Task]: Setup scripts to populate the search index from the DB #2092

[Task]: Setup scripts to populate the search index from the DB #2092

Comments

acouch commented Sep 17, 2024

Summary

Acceptance criteria