Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: carry curation rules and synonyms to new collection after scraper runs #66

Merged

Conversation

tharropoulos
Copy link
Contributor

Change Summary

Updates the scraper workflow to ensure that user-defined synonyms and curation rules are preserved across runs. Specifically, it adds functionality to copy these rules from the old collection to the new collection before the old collection is deleted. This change helps maintain user configurations without disruption during the scraping process.

Changes include:

  • Copying synonyms and curation rules from the old collection to the new collection.
  • Testing functionality for collections with or without curation rules and synonyms.

PR Checklist

@tharropoulos tharropoulos marked this pull request as draft July 17, 2024 08:20
@tharropoulos tharropoulos marked this pull request as ready for review July 17, 2024 08:28
@tharropoulos tharropoulos force-pushed the feature/curation-synonym-carrying branch from 6e024ff to 6c1719e Compare July 17, 2024 08:39
@tharropoulos tharropoulos force-pushed the feature/curation-synonym-carrying branch from b8d139c to f86b210 Compare August 1, 2024 07:24
tharropoulos and others added 5 commits August 1, 2024 10:28
- Add default typesense server configuration variables to a .env file
during pipeline run
- Change the volume mapping for the Typesense Docker container in the
GitHub Actions workflow. The data directory is now correctly mapped to
`/tmp/typesense`.`
- Sleep for 10 seconds before sending the request to the Typesense
server.
pass

self.typesense_client.aliases.upsert(self.alias_name, {'collection_name': self.collection_name_tmp})
self.typesense_client.aliases.upsert(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tharropoulos Could we only update the alias and point it to the new collection, once all the synonyms and curation rules have been transferred over?

Because as it currently stands, there will be a brief moment when searches will be sent to the new collection, without synonyms and curation rules copied over yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This creates side effects indeed, resolved in 41aa511. Still kept the deletion of the old collection being executed last in the workflow.

- Move transfer operations after alias upsert to ensure smooth
transition.
- Delete old collection after successful transfer.
- Replace static ChromeDriver path with WebDriver Manager for automatic
driver installation.
- Add webdriver-manager to Pipfile dependencies.
- Simplify the `itertext` method by consolidating the yield statements
  (addresses a pylint warning).
@tharropoulos
Copy link
Contributor Author

tharropoulos commented Aug 22, 2024

Just saw that pylint now fails the check because of duplicate code on the test files. It has to do with not syncing the lock file with the Pipfile and having an older version of pylint as a result. Should we break this out on fixtures to avoid rewrites, or add a rule to ignore the duplicate code message in the pylint config file?

- After updating, pylint returned errors for duplicate code on tests, so
  marking the flag as ignored resolves the issue
@jasonbosco jasonbosco merged commit 2c505db into typesense:master Aug 26, 2024
1 check passed
@jasonbosco
Copy link
Member

This is now available in v0.10.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants