docs: ADR for an edx-platform representation of Catalog Courses. #34156

feanil · 2024-01-31T19:47:29Z

Summary; We should have a way to model the associations between multiple
course runs that are all for the same logical course. This model should
be a part of the core edx-platform models but should integrate
seamlessly with course-discovery where it makes sense.

See the ADR for more details.

Summary; We should have a way to model the associations between multiple course runs that are all for the same logical course. This model should be a part of the core edx-platform models but should integrate seamlessly with course-discovery where it makes sense. See the ADR for more details.

robrap · 2024-02-01T21:55:00Z

openedx/core/djangoapps/catalog/docs/decisions/002-course-models.rst

+
+* A data migration to make all current implicit associations explicit.
+
+* Eventing hooks to auto-associate new runs to their implicit Catalog Course


I thought you had mentioned a toggle around whether Discovery is being relied upon? In that case, is the expectation that events would instead be sent from Discovery to keep this data up-to-date?

I'm not entirely sure. Course discovery is not currently required or enabled by default in Tutor. My expectation would be that this would remain true in the future but that the new models would get auto populated when course runs are created.

It's a good question. I don't have a strong opinion on whether course-discovery should push, pull, or both. Mainly, I want to make sure that we treat course-discovery like any other external catalog system. So:

If course-discovery wants to push data to edx-platform, it should use the CRUD API mentioned below.

If course-discovery wants to pull data from edx-platform, then it should hook into these events.

So, either way, we'll want both the API and the event hooks.

arbrandes

It makes total sense to me. (Thanks for the walk-through of why and how this was done in discovery in the first place, too.)

kdmccormick

Some wording feedback, but the decisions LGTM

openedx/core/djangoapps/catalog/docs/decisions/002-course-models.rst

kdmccormick · 2024-02-07T18:54:03Z

openedx/core/djangoapps/catalog/docs/decisions/002-course-models.rst

+Historically this concept has been inferred from the names of courses with the
+runs removed (eg. "AximX/Eng101" could be a catalog course with multiple runs
+(2024H1, 2024H2, etc)). Having this explicitly defined as a new model has
+multiple benefits.


we can explain this well using real course keys, so let's do that

Suggested change

Historically this concept has been inferred from the names of courses with the

runs removed (eg. "AximX/Eng101" could be a catalog course with multiple runs

(2024H1, 2024H2, etc)). Having this explicitly defined as a new model has

multiple benefits.

Historically this concept has been inferred from the names of courses with the

runs removed (eg. `course-v1:AximX/Eng101` could be a catalog course with multiple runs

(`course-v1:AximX/Eng101:2024H1`, `course-v1:AximX/Eng1012024H2`, etc)). Having the catalog course explicitly defined as a new model has multiple benefits.

openedx/core/djangoapps/catalog/docs/decisions/002-course-models.rst

robrap · 2024-02-07T20:39:38Z

openedx/core/djangoapps/catalog/docs/decisions/002-course-models.rst

+The course-discovery service already has some models to represent this and we
+could rely on it directly but it also contains a lot of representations that
+are specific to the edx.org process and it's not valuable to spin up the
+course-discovery service just to get this information for most operators.


Is there is an existing toggle, or a new toggle, which will be used to choose between different behaviors?

Will the implementation be fully backward-compatible? Will the implementation fully support both using and not using course-discovery? For example, if discovery code is needed to call the new APIs, is that intended to be part of this implementation?

Is there any longer-term deprecation of discovery in mind that this is a part of? Or possibly just making the external catalog more pluggable?

Separate, but related, there are a number of places where the LMS makes synchronous calls to Discovery, which causes reliability issues for the LMS when Discovery is having issues. Do you know if a benefit of this design would be to replace any of these calls, even in cases where Discovery is being used as a separate service?

@robrap Can you elaborate on 4?

I know of the the programs cache, which this will definitely help.

My search just now also introduced me to the learner_skills_levels api, which this will not directly help since it relies on course-discovery's taxonomy API. In the long-term, though, taking course-discovery-querying utils out of edx-platform will have the benefit of discouraging new instances of the edx-platform-queries-course-discovery anti-pattern.

Do you know of other instances?

Regarding 4, I haven't actually looked into the problem specifically. I've just seen its affect when we've had temporary issues on discovery that back up into the LMS. I probably need to look more closely when we have an actual incident. Thanks.

FYI, I'm mostly not going to address this until we've arrived at a general agreement on what idea it would be useful to implement which we're still kind of working through. Just making a note that I haven't missed it but seems premature.

DawoudSheraz · 2024-02-09T07:15:24Z

openedx/core/djangoapps/catalog/docs/decisions/002-course-models.rst

+As long as any local data can be updated from course-discovery if
+course-discovery is setup in an environment, we should not have any conflicts
+between the two locations.


Can we specify what will be the source of truth for both cases?

@DawoudSheraz my thought would be that this would be up to the operator. If the operator wants to use an external service as the source of truth, they would hook into the API and push changes into models in the LMS and could setup permissions so that data can't change from the LMS side. If the operator wants the LMS to be the source of truth, they can setup events and listen for changes to push data into their other systems. You could also do both to allow editing on either side and syncing in both directions.

You could also do both to allow editing on either side and syncing in both directions.

A similar type of model currently exists between Discovery-Studio. It has caused sync issues on edX side in the past. As for the source of truth, if it is up to the operator, we should be mentioning that explicitly in ADR.

kdmccormick

Hey @ormsbee @feanil @robrap, I'm interested in reviving this. Please take my comment below not as a reason to abandon this effort, but rather me making sure that I understand why this is the right approach.

kdmccormick · 2024-07-18T17:56:17Z

openedx/core/djangoapps/catalog/docs/decisions/002-course-models.rst

+Course Models
+#############


@ormsbee @feanil @robrap I've been mulling this ADR over for a while and I find myself wondering about the base assumption that edx-platform should have a Catalog Course model or that the LMS should care about Catalog Courses at all.

Hypothetically, if all LMS knew about were Course Runs, and we forbade it from ever looking how different runs could be grouped into Courses and Programs, what features would we lose? Would they be Core Product features (like credentialing and roles) or would they be things we've been discussing jettisoning out of the Core anyway (like masters degrees and upsell messaging)?

I know we want to enable seamless integration between Open edX and student-information/ecommerce/enterprise/marketing systems. But does having CatalogCourse and Program models in LMS help that, or might it actually make it more complex, since those systems might disagree about what a CatalogCourse means? Could it be better to let the external systems decide what "Course" and "Program" mean to them, and then use APIs to push metadata/enrollments/roles/whatever into edx-platform Course Runs as they need?

Getting more concrete: I'm not against having a CatalogCourse model, but before approving this ADR, I'd like to see a list of specific LMS/CMS features that would actually use such a model, and a bit in the Rejected Alternatives on why it's not sufficient for LMS/CMS to just think in Course Runs.

A quick note to say I like your line of thought and questioning. The rest of this is very high-level thoughts.

I feel like the transition from edX to 2U has aided my transition from a worldview of the LMS being the center of the solar system, to its role as a planet orbiting with other planets and making up a different solar system for each operating organization.

At 2U, our catalog is much larger than the set of LMS course runs. What does that mean at the level of an LMS, and not necessarily the LMS, and how does this relate to this ADR? I know this may be more of a 2U issue, but it might inform how we think about the different pieces.

That said, if the Open edX platform always tries to provide an all-inclusive experience with as few parts as possible to make it simple for operators, we may always end up with very fuzzy boundaries. Or, if someone is able to dig into Kyle's request for specifics, maybe we can get to a cleaner solution that works for all.

I think @pdpinch hit the nail on the head with:

I think this could be really valuable, because it materializes something that users assume to be true.

Course runs are already connected to each other in a bunch of ways. The Studio course listing page is eventually going to have the runs grouped by course. They often re-use 90+% of the same content, which is part of why I'd like to eventually see multiple runs of the same course stored in the same LearningPackage. That relationship is real, even if it's implicit today because it's only really held together by the CourseKey.

I think it's also valuable to be able to group Courses together into larger collections. There, we have use cases both for role/permissions (e.g. "this staff has broad admin or analytics reporting access to this set of courses"), as well as for catalog customization.

The global source of truth for this data doesn't need to be in edx-platform. For a large company, that source of truth might be Salesforce or any number of other things that eventually push to these primitives. But I think it's still valuable to have the base primitives defining these relationships in the platform.

I feel like the balancing act will be seeing just how thin we can make these core concepts, and how disciplined we can be about keeping other kinds of data separated (e.g. pricing).

To echo what Dave said, I think we have had some implicit version of this for a long time. We're gonna start using it more in the presentation. Right now this concept lives in course-discovery but that comes with a lot of baggage that we don't want. Being able to house a core concept in edx-platform and then push a lot of the rest into things that either plugin or otherwise interface with the core bits here seems like it would make the platform more generically usable.

Another thought, would we be better served by having a concept of a CourseRun that is independent of course content? Like have a course run model that we can populate without having to create any course content. Then anyone could make plugins that allow end-users to group the runs however they want with whatever metadata they want for advertising/marketing/selling. Can we achieve this by inverting the relationship between courseoverviews and modulestore?

Another thought, would we be better served by having a concept of a CourseRun that is independent of course content? Like have a course run model that we can populate without having to create any course content. Then anyone could make plugins that allow end-users to group the runs however they want with whatever metadata they want for advertising/marketing/selling. Can we achieve this by inverting the relationship between courseoverviews and modulestore?

I don't think CourseOveviews is necessarily the right mechanism to do this, but I do broadly support the idea that provisioning the course/run and roles associated with them is upstream of content creation. I sketched out a primitive proposal for that here a while back.

The tricky part in that scenario would be figuring out the migration path and what happens to import/export of those fields that already exist in Modulestore. I think it's totally doable, but there are a lot of details to work out.

regarding defining a new CourseRun vs using the existing CourseOverview model...

Pros of using CourseOverview:

A lot of models already hang off of it.

A lot of views/serializers/etc are already written to expect CourseOverview objects.

It's already there. We'd have one model for this concept rather than two.

Cons of using CourseOverview:

The .id field (the PK) is the course key string, which is weird and probably not performant (?)

Has a huge mish mash of fields -- they were selected "by need" rather than "by design".

Bunch of weird footgun properties which invoke modulestore.get_course() anyway.

Data flow -- currently, the model is a persisted cache of modulestore, populated asynchronously upon course publish. We'd have to flip that without breaking anyone.

A lot of edx-platform duck-typedly tolerates both CourseBlock and CourseOverview instances, which can be confusing. It could be valuable to be forced to rewrite some of this code to explicitly handle only CourseRun objects.

If we make a new CourseRun field, it could be valuable to backfill course_run = OneToOneField(CourseRun) onto CourseOverview.

After listing that all out, I think I've come around to the side of defining a new CourseRun model, assuming we are very protective about what goes into it.

pdpinch · 2024-07-19T20:34:18Z

I think this could be really valuable, because it materializes something that users assume to be true.

We don't use course-discovery, so we've developed other systems to manage the relationships between courseruns and the metadata for "Catalog Courses".

At the level of abstraction described here, I think we could transition to this model and not lose anything. However, I would be wary of adding too much additional metadata to these CatalogCourses. Can we draw a bright line around what this is meant and not meant to do? Some things that would be red flags for me: ecommerce (including "entitlements"), certificates, programs (probably).

Co-authored-by: Kyle McCormick <kyle@axim.org>

robrap reviewed Feb 1, 2024

View reviewed changes

arbrandes approved these changes Feb 2, 2024

View reviewed changes

kdmccormick reviewed Feb 7, 2024

View reviewed changes

robrap reviewed Feb 7, 2024

View reviewed changes

DawoudSheraz reviewed Feb 9, 2024

View reviewed changes

kdmccormick requested changes Jul 18, 2024

View reviewed changes

kdmccormick mentioned this pull request Jul 19, 2024

[Draft] edx-platform Architectural Roadmap #35144

Open

docs: Apply suggestions from code review

41b4b07

Co-authored-by: Kyle McCormick <kyle@axim.org>

bradenmacdonald mentioned this pull request Sep 23, 2024

[DEPR]: Course Discovery Service openedx/course-discovery#4449

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: ADR for an edx-platform representation of Catalog Courses. #34156

docs: ADR for an edx-platform representation of Catalog Courses. #34156

feanil commented Jan 31, 2024

robrap Feb 1, 2024

feanil Feb 7, 2024

kdmccormick Feb 7, 2024

arbrandes left a comment

kdmccormick left a comment

kdmccormick Feb 7, 2024

robrap Feb 7, 2024

kdmccormick Feb 7, 2024 •

edited

Loading

robrap Feb 7, 2024 •

edited

Loading

feanil Aug 6, 2024

DawoudSheraz Feb 9, 2024

feanil Aug 6, 2024

DawoudSheraz Aug 7, 2024

kdmccormick left a comment

kdmccormick Jul 18, 2024 •

edited

Loading

robrap Jul 18, 2024

ormsbee Aug 4, 2024

feanil Aug 6, 2024

ormsbee Aug 6, 2024

kdmccormick Aug 7, 2024

pdpinch commented Jul 19, 2024


		* A data migration to make all current implicit associations explicit.

		* Eventing hooks to auto-associate new runs to their implicit Catalog Course

docs: ADR for an edx-platform representation of Catalog Courses. #34156

Are you sure you want to change the base?

docs: ADR for an edx-platform representation of Catalog Courses. #34156

Conversation

feanil commented Jan 31, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arbrandes left a comment

Choose a reason for hiding this comment

kdmccormick left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kdmccormick Feb 7, 2024 • edited Loading

Choose a reason for hiding this comment

robrap Feb 7, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kdmccormick left a comment

Choose a reason for hiding this comment

kdmccormick Jul 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pdpinch commented Jul 19, 2024

kdmccormick Feb 7, 2024 •

edited

Loading

robrap Feb 7, 2024 •

edited

Loading

kdmccormick Jul 18, 2024 •

edited

Loading