Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store content object collections in search index [FC-0062] #35469

Merged
merged 48 commits into from
Sep 13, 2024

Conversation

pomegranited
Copy link
Contributor

@pomegranited pomegranited commented Sep 11, 2024

Description

This PR ensures that when a content object is added/removed from a Collection, its search index document is updated to reflect this change. The collection.key values for each collection are stored in a new collections list.

To support this, we added a new event called CONTENT_OBJECT_ASSOCIATIONS_CHANGED, and trigger it when an object's collections change. This event is also now triggered when an object's tags change. The previous CONTENT_OBJECT_TAGS_CHANGED event is still emitted for tag changes, but is now deprecated.

Useful information to include:

  • Which edX user roles will this change impact? Course Authors who have enabled Content Libraries V2.

Supporting information

Related Tickets:

Depends on / blocked by:

Testing instructions

  • Setup tutor with this PR and https://github.com/open-craft/tutor-contrib-meilisearch plugin, plus ensure you are running with the latest versions of openedx-learning + openedx-events.
  • Run migrations: tutor dev run cms python manage.py migrate
  • In the content-authoring MFE, create a library, e.g. lib:SampleTaxonomyOrg1:AL1
  • Add some blocks to the library, and note their block keys.
    Hint: you can retrieve the block keys using the REST API, e.g. http://studio.local.edly.io:8001/api/libraries/v2/lib:SampleTaxonomyOrg1:AL1/blocks/
  • Run tutor dev run cms ./manage.py cms reindex_studio --experimental to add the new collections field.
  • Run below snippet to add some collections -- reindexing should happen automatically.
# tutor dev run cms python manage.py cms shell
from opaque_keys.edx.keys import UsageKey
from opaque_keys.edx.locator import LibraryLocatorV2
from openedx.core.djangoapps.content_libraries import api                                                                                                
lib_key_str = "lib:<your lib key>"
block_key_strs = [
    "lb:<your block key",
    "lb:<your block key",
    # ....
]

library_key = LibraryLocatorV2.from_string(lib_key_str)
block_keys = [
    UsageKey.from_string(key_str) for key_str in block_key_strs
]

api.create_library_collection(library_key, "FAL-3787", title="Collection FAL-3787")
api.update_library_collection_components(library_key, "FAL-3787", usage_keys=block_keys)
# <Collection> (lp:1 FAL-3787:Collection FAL-3787) 
  • Open http://meilisearch.local.edly.io:7700/
    Your api key can be found with tutor config printvalue MEILISEARCH_API_KEY
  • Search for FAL-3787 and verify that each of your blocks has FAL-3787 showing in its collections list.

Deadline

Before Sumac gets cut in October.

yusuf-musleh and others added 30 commits August 30, 2024 09:28
…#674)

* chore: uses openedx-learning==0.11.3
* feat: add/remove components to/from a collection
  Sends a CONTENT_OBJECT_TAGS_CHANGED for each component added/removed.
* docs: Add warning about unstable REST APIs
* refactor: use oel_collections.get_collections as oel_collections.get_learning_package_collections has been removed.
* test: fixes flaky collection search test
* refactor: simplify the REST API params and validation
For the Collection "create" and "update" views, we call an authoring_api
method and emit an event on success. This change simplifies the view
code by moving these call to new content_libraries.api methods.
and refactors collection views to use DRF conventions.
They expect a UsageKey object, not the string object_id.
and ensures this new field is searchable and filterable.

Serializes the object's collections to a list of collection.key values.
which adds CONTENT_OBJECT_ASSOCIATIONS_CHANGED
whenever a content object's tags or collections have changed,
and handle that event in content/search.

The deprecated CONTENT_OBJECT_TAGS_CHANGED event is still emitted when
tags change; to be removed after Sumac.
…_TAGS_CHANGED

in content_tagging app, while CONTENT_OBJECT_TAGS_CHANGED is being deprecated.
Collection.key is stored here, not ID
and re-raise as api.LibraryCollectionAlreadyExists
and fixes event types documented in the hooks.
@openedx-webhooks
Copy link

Thanks for the pull request, @pomegranited!

What's next?

Please work through the following steps to get your changes ready for engineering review:

🔘 Get product approval

If you haven't already, check this list to see if your contribution needs to go through the product review process.

  • If it does, you'll need to submit a product proposal for your contribution, and have it reviewed by the Product Working Group.
    • This process (including the steps you'll need to take) is documented here.
  • If it doesn't, simply proceed with the next step.

🔘 Provide context

To help your reviewers and other members of the community understand the purpose and larger context of your changes, feel free to add as much of the following information to the PR description as you can:

  • Dependencies

    This PR must be merged before / after / at the same time as ...

  • Blockers

    This PR is waiting for OEP-1234 to be accepted.

  • Timeline information

    This PR must be merged by XX date because ...

  • Partner information

    This is for a course on edx.org.

  • Supporting documentation
  • Relevant Open edX discussion forum threads

🔘 Get a green build

If one or more checks are failing, continue working on your changes until this is no longer the case and your build turns green.

🔘 Let us know that your PR is ready for review:

Who will review my changes?

This repository is currently maintained by @openedx/wg-maintenance-edx-platform. Tag them in a comment and let them know that your changes are ready for review.

Where can I find more information?

If you'd like to get more details on all aspects of the review process for open source pull requests (OSPRs), check out the following resources:

When can I expect my changes to be merged?

Our goal is to get community contributions seen and reviewed as efficiently as possible.

However, the amount of time that it takes to review and merge a PR can vary significantly based on factors such as:

  • The size and impact of the changes that it introduces
  • The need for product review
  • Maintenance status of the parent repository

💡 As a result it may take up to several weeks or months to complete a review and merge your PR.

@openedx-webhooks openedx-webhooks added the open-source-contribution PR author is not from Axim or 2U label Sep 11, 2024
Copy link
Contributor

@rpenido rpenido left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @pomegranited!
Great work here! I added some comments, and as soon as they are addressed, this will be ready for CC review!

  • I tested this using the instructions from the PR
  • I read through the code
  • I checked for accessibility issues
  • Includes documentation


e.g. for something in Collections "COL_A" and "COL_B", this would return:
{
"collections": ["COL_A", "COL_B"],
Copy link
Contributor

@rpenido rpenido Sep 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pomegranited Sorry for the late request (I overlooked this before), but what about changing the structure here?

Suggested change
"collections": ["COL_A", "COL_B"],
"collections": [
{ "display_name": "Collection A", key: "COL_A" },
{ "display_name": "Collection B", key: "COL_B" },
],

That way, we can use the display_name as a searchable attribute and the slug/key to a dev action (like a redirect).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure @rpenido , but I decided to store the collections data like we store the tags data, e.g.

   "collections": {
      "display_name": ["Collection One", "Another Collection"],
      "key": ["collection-one", "another-key"],
   },

cf a81ea9a

Comment on lines 249 to 254
).values_list("key", flat=True)
except ObjectDoesNotExist:
log.warning(f"No component found for {object_id}")

if collections:
result[Fields.collections] = list(collections)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the component is not in any collections, we must return an empty list to ensure the index is updated with this information.

I created a small PR here: open-craft#685

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 453 to 458
mock_meilisearch.return_value.index.return_value.update_documents.assert_has_calls(
[
call([doc_problem_with_collection2]),
call([doc_problem_with_collection1]),
],
any_order=True,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can have a more deterministic approach here to assert we don't have extra calls

Suggested change
mock_meilisearch.return_value.index.return_value.update_documents.assert_has_calls(
[
call([doc_problem_with_collection2]),
call([doc_problem_with_collection1]),
],
any_order=True,
self.assertEqual(
mock_meilisearch.return_value.index.return_value.update_documents.call_count,
2,
)
mock_meilisearch.return_value.index.return_value.update_documents.assert_has_calls(
[
call([doc_problem_with_collection2]),
call([doc_problem_with_collection1]),
],

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure -- we still can't rely on the order these calls are made in, but I've added asserts for the call_count, e.g. a81ea9a#diff-f6238e12f564d80e4a90b6f385c31c1396de25784be34992c1f17d1c9ca557a1R353

Comment on lines 174 to 177
if "tags" in content_object.changes:
upsert_block_tags_index_docs(usage_key)
elif "collections" in content_object.changes:
upsert_block_collections_index_docs(usage_key)
Copy link
Contributor

@rpenido rpenido Sep 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: If we have empty changes, we assume everything will change.
Also, we cannot use elif here, or we may skip the collecions update.

Suggested change
if "tags" in content_object.changes:
upsert_block_tags_index_docs(usage_key)
elif "collections" in content_object.changes:
upsert_block_collections_index_docs(usage_key)
if not content_object.changes or "tags" in content_object.changes:
upsert_block_tags_index_docs(usage_key)
if not content_object.changes or "collections" in content_object.changes:
upsert_block_collections_index_docs(usage_key)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See 360ec35

@@ -288,7 +334,7 @@ def searchable_doc_for_collection(collection) -> dict:
found using faceted search.
"""
doc = {
Fields.id: collection.id,
Fields.id: collection.key,
Copy link
Contributor

@rpenido rpenido Sep 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are better with the id here. I fixed it in the other PR and add comments here: 1738447

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#35321 is merged, so we can update this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need the collection.key in order to call the REST APIs though -- I've re-added it in the internal PR: open-craft@5bdcc9e

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for catching this @pomegranited!

Copy link
Contributor

@ChrisChV ChrisChV left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Rebase this with the changes of #35321 and I will do another quick review

@pomegranited
Copy link
Contributor Author

@ChrisChV I think it's worth merging open-craft#684 into here first before we merge this change -- it contains a number of necessary fixes and additions.

pomegranited and others added 3 commits September 12, 2024 15:37
to update the index before the frontend re-fetches
Store Collection metadata + component count in meilisearch
@ChrisChV
Copy link
Contributor

@pomegranited I will merge this tomorrow morning

@ChrisChV ChrisChV merged commit c41fe89 into openedx:master Sep 13, 2024
49 checks passed
@ChrisChV ChrisChV deleted the jill/collection-components-search branch September 13, 2024 13:20
@edx-pipeline-bot
Copy link
Contributor

2U Release Notice: This PR has been deployed to the edX staging environment in preparation for a release to production.

@edx-pipeline-bot
Copy link
Contributor

2U Release Notice: This PR has been deployed to the edX production environment.

1 similar comment
@edx-pipeline-bot
Copy link
Contributor

2U Release Notice: This PR has been deployed to the edX production environment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
open-source-contribution PR author is not from Axim or 2U
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

6 participants