[MetaSchedule] Introduce `ScheduleFnDatabase` #12626

junrushao · 2022-08-27T23:19:20Z

Following #12520, this PR introduces ScheduleFnDatabase, a mocked
database to allow injecting handcrafted schedules provided by a schedule
function.

The schedule function comes with the following signature:

def schedule_fn(
  sch: tir.Schedule,
) -> bool:
  task_name = sch.mod.attrs["task_name"]
  # ^^^ provides an optional name of the task queried
  ...

This mocked database helps incorporate the existing testing utility
apply_fixed_schedule more formally into the MetaSchedule-Relay build
pipeline, and allows further extension to Relax with the same interface.

Next as another follow-up, we will introduce ConcatDatabase that allows
mixing multiple databases, including the mocked and ones from JSON
files.

cc @Hzfengsy @junrushao1994

Following apache#12520, this PR introduces `ScheduleFnDatabase`, a mocked database to allow injecting handcrafted schedules provided by a schedule function. The schedule function comes with the following signature: ```python def schedule_fn( sch: tir.Schedule, ) -> bool: task_name = sch.mod.attrs["task_name"] # ^^^ provides an optional name of the task queried ... ``` This mocked database helps incorporate the existing testing utility `apply_fixed_schedule` more formally into the MetaSchedule-Relay build pipeline, and allows further extension to Relax with the same interface. Next as another follow-up, we will introduce ConcatDatabase that allows mixing multiple databases, including the mocked and ones from JSON files.

junrushao · 2022-08-27T23:19:40Z

This PR is ready for review. CC: @masahi @sunggg @YuchenJin

Following up apache#12520 and apache#12626, this PR introduces `MergedDatabase`, which allow users to compose multiple databases so that the high-level IR could select the best tuning records among them. The `MergedDatabase` also comes with an extra field `preferred` to allow users to override tuning records from other databases. A classic usecase of the `preferred` parameter is through handcrafted schedule functions: ```python def schedule_fn(sch: tir.Schedule) -> bool: if "nn_conv2d" in sch.mod.attrs["task_name"]: handcrafted_scheduling(sch) return True return False with ms.database.MergedDatabase( databases=[database], preferred=ms.database.ScheduleFn(schedule_fn), ): lib = relay.build(...) ```

MasterJH5574

LGTM. Thanks @junrushao! Since ScheduleFnDatabase is not very intuitive, let’s get the PR and code better documented!

junrushao · 2022-08-29T06:44:41Z

This will be quite self-explanatory with the following MergedDatabase PR

masahi

If I understand this correctly, this removes the need for task extraction which was done by apply_fixed_schedules. This is very nice!

masahi · 2022-08-29T06:51:13Z

tests/python/unittest/test_link_params.py

-            config={"relay.backend.use_meta_schedule": True},
+            config={
+                "relay.backend.use_meta_schedule": True,
+                "relay.FuseOps.link_params": link_params,


I think we can remove "relay.FuseOps.link_params": link_params now, since we are not running separate task extraction for fake database creation. This config was only for making sure that task extraction and relay.build use the same link_params value.

Got it. Will do in the next PR!

sunggg

Thank you for the PR!
LGTM and have some comments below.

=====================================================================
Oops. didn't know it has been already merged. I will just keep this for the future reference.

sunggg · 2022-08-29T14:39:36Z

tests/python/unittest/test_link_params.py

@@ -407,21 +406,21 @@ def schedule_dense(sch):
    target = "llvm"
    params = {"weight": weight_np}

-    def schedule_fn(task, sch):
-        if "nn_dense" in task.task_name:
+    def schedule_fn(sch):


I'm okay with this for now and agree that this delivers the functionality we need. But like we discussed offline, I have some concerns about this approach and hope to have some discussion in the future PRs.

If developers start writing more target-specific, layout-specific manual schedules, we may end up facing complicated if-statement logics that might be hard to maintain.

Conditional statement are based on the task name, so when task name is different from what we expect, it wouldn't work. When task naming rule changes, developers might need to update this accordingly, which is not intuitive imo.

I agree with @MasterJH5574, I'm not sure if this is intuitive interface for developers. You would need to modify the schedule and return True. Wonder if we can skip manually returning true by automatically checking if the schedule has changed.

sunggg · 2022-08-29T14:47:15Z

src/meta_schedule/database/database.cc

@@ -156,7 +156,8 @@ TuningRecord TuningRecord::FromJSON(const ObjectRef& json_obj, const Workload& w

 /******** Database ********/

-Optional<TuningRecord> DatabaseNode::QueryTuningRecord(IRModule mod, Target target) {
+Optional<TuningRecord> DatabaseNode::QueryTuningRecord(const IRModule& mod, const Target& target,
+                                                       const String& workload_name) {


Why do we pass workload_name? The these three functions do not seem to use this: QueryTuningRecord, QuerySchedule, QueryIRModule.

Following up apache#12520 and apache#12626, this PR introduces `MergedDatabase`, which allow users to compose multiple databases so that the high-level IR could select the best tuning records among them. The `MergedDatabase` also comes with an extra field `preferred` to allow users to override tuning records from other databases. A classic usecase of the `preferred` parameter is through handcrafted schedule functions: ```python def schedule_fn(sch: tir.Schedule) -> bool: if "nn_conv2d" in sch.mod.attrs["task_name"]: handcrafted_scheduling(sch) return True return False with ms.database.MergedDatabase( databases=[database], preferred=ms.database.ScheduleFn(schedule_fn), ): lib = relay.build(...) ```

Following up apache#12520 and apache#12626, this PR introduces `MergedDatabase`, which allow users to compose multiple databases so that the high-level IR could select the best tuning records among them. The `MergedDatabase` also comes with an extra field `preferred` to allow users to override tuning records from other databases. A classic usecase of the `preferred` parameter is through handcrafted schedule functions: ```python def schedule_fn(sch: tir.Schedule) -> bool: if "nn_conv2d" in sch.mod.attrs["task_name"]: handcrafted_scheduling(sch) return True return False with ms.database.MergedDatabase( preferred=ms.database.ScheduleFn(schedule_fn), # ^^^^ override scheduling decisions databases=[database], fallback=libtorch_database, # ^^^^ fallback to libtorch ): lib = relay.build(...) ```

Following up apache#12520 and apache#12626, this PR introduces two database classes: `UnionDatabase` and `OrderedUnionDatabase`, both of which allow users to organically compose multiple databases together, so that the high-level IR (Relay, Relax) could select the best tuning records according to running time or a preferred order given by users. To each query, `UnionDatabase` returns the best record among all the databases given; Instead, `OrderedUnionDatabase` returns he record from the first database that responds to the query. Used together, users may specify complicated dispatching patterns like below: ```python def schedule_fn(sch: tir.Schedule) -> bool: if "nn_conv2d" in sch.mod.attrs["task_name"]: if some_other_tir_conditions(sch.mod): handcrafted_scheduling(sch) return True return False with ms.database.OrderedUnionDatabase( ms.database.ScheduleFn(schedule_fn), # hand-override some scheduling ms.database.Union( # existing databases db_for_matmul, db_for_conv2d, db_for_softmax, ) libtorch_database, # fallback to libtorch ): lib = relay.build(...) ```

Following up apache#12520 and apache#12626, this PR introduces two database classes: `UnionDatabase` and `OrderedUnionDatabase`, both of which allow users to organically compose multiple databases together, so that the high-level IR (Relay, Relax) could select the best tuning records according to running time or a preferred order given by users. To each query, `UnionDatabase` returns the best record among all the databases given; Instead, `OrderedUnionDatabase` returns he record from the first database that responds to the query. Used together, users may specify complicated dispatching patterns like below: Examples below demonstrate the usecases of and difference between UnionDatabase and OrderDatabase. Assumption: * db1, db2 do not have tuning records for the target workload. * Each of db3, db4, db5 has tuning records r3, r4, r5 for target workload respectively. ```python merged_db = ms.database.UnionDatabase( db1, # no record db2, # no record db3, # has r3 db4 # has r4 ) merged_db.query_tuning_record(..., target_workload) merged_db = ms.database.OrderedUnionDatabase( db1, # no record db2, # no record db3, # has r3 db4 # has r4 ) merged_db.query_tuning_record(..., target_workload) merged_db = ms.database.UnionDatabase( db1, # no record db2, # no record db3, # has r3 ms.database.OrderedUnionDatabase( # returns r4 db4, # has r4 db5, # has r5 ) ) merged_db.query_tuning_record(..., target_workload) merged_db = ms.database.UnionDatabase( db1, # no record db2, # no record db3, # has r3 ms.database.UnionDatabase( # returns best one between r4 and r5 db4, # has r4 db5, # has r5 ) ) merged_db.query_tuning_record(..., target_workload) merged_db = ms.database.OrderedUnionDatabase( db1, # no record db2, # no record ms.database.UnionDatabase( # returns best one between r3 and r4 db3, # has r3 db4, # has r4 ) db5, # has r5 ) merged_db.query_tuning_record(..., target_workload) ``` Co-authored-by: sunggg <49998730+sunggg@users.noreply.github.com>

Following up apache#12520 and apache#12626, this PR introduces two database classes: `UnionDatabase` and `OrderedUnionDatabase`, both of which allow users to organically compose multiple databases together, so that the high-level IR (Relay, Relax) could select the best tuning records according to running time or a preferred order given by users. To each query, `UnionDatabase` returns the best record among all the databases given; Instead, `OrderedUnionDatabase` returns he record from the first database that responds to the query. Used together, users may specify complicated dispatching patterns like below: Examples below demonstrate the usecases of and difference between UnionDatabase and OrderDatabase. Assumption: * db1, db2 do not have tuning records for the target workload. * Each of db3, db4, db5 has tuning records r3, r4, r5 for target workload respectively. ```python #### Case 1. `UnionDatabase`: merged_db = ms.database.UnionDatabase( db1, # no record db2, # no record db3, # has r3 db4 # has r4 ) # returns the better one between r3 and r4 merged_db.query_tuning_record(..., target_workload) ### Case 2. `OrderedUnionDatabase` merged_db = ms.database.OrderedUnionDatabase( db1, # no record db2, # no record db3, # has r3 db4 # has r4 ) # returns r3 merged_db.query_tuning_record(..., target_workload) ### Case 3. Mix-use scenario merged_db = ms.database.UnionDatabase( db1, # no record db2, # no record db3, # has r3 ms.database.OrderedUnionDatabase( # returns r4 db4, # has r4 db5, # has r5 ) ) # returns the better one between r3 and r4 merged_db.query_tuning_record(..., target_workload) ### Case 4. Another mix-use scenario merged_db = ms.database.UnionDatabase( db1, # no record db2, # no record db3, # has r3 ms.database.UnionDatabase( # returns the better one between r4 and r5 db4, # has r4 db5, # has r5 ) ) # returns the best one among r3, r4 and r5 merged_db.query_tuning_record(..., target_workload) ### Case 5. Yet another mix-use scenario merged_db = ms.database.OrderedUnionDatabase( db1, # no record db2, # no record ms.database.UnionDatabase( # returns the better one between r3 and r4 db3, # has r3 db4, # has r4 ) db5, # has r5 ) # returns the better one between r3 and r4 merged_db.query_tuning_record(..., target_workload) ``` Co-authored-by: sunggg <49998730+sunggg@users.noreply.github.com> # Please enter the commit message for your changes. Lines starting # with '#' will be kept; you may remove them yourself if you want to. # An empty message aborts the commit. # # Date: Sat Aug 27 17:36:54 2022 -0700 # # On branch feature/2022-08-27/merged-database # Your branch and 'origin/feature/2022-08-27/merged-database' have diverged, # and have 1 and 1 different commits each, respectively. # (use "git pull" to merge the remote branch into yours) # # Changes to be committed: # modified: include/tvm/meta_schedule/database.h # modified: python/tvm/meta_schedule/database/__init__.py # new file: python/tvm/meta_schedule/database/ordered_union_database.py # new file: python/tvm/meta_schedule/database/union_database.py # modified: src/meta_schedule/database/json_database.cc # new file: src/meta_schedule/database/ordered_union_database.cc # new file: src/meta_schedule/database/union_database.cc # modified: src/meta_schedule/utils.h # modified: tests/python/unittest/test_link_params.py # modified: tests/python/unittest/test_meta_schedule_database.py # # Please enter the commit message for your changes. Lines starting # with '#' will be kept; you may remove them yourself if you want to. # An empty message aborts the commit. # # Date: Sat Aug 27 17:36:54 2022 -0700 # # On branch feature/2022-08-27/merged-database # Your branch and 'origin/feature/2022-08-27/merged-database' have diverged, # and have 1 and 1 different commits each, respectively. # (use "git pull" to merge the remote branch into yours) # # Changes to be committed: # modified: include/tvm/meta_schedule/database.h # modified: python/tvm/meta_schedule/database/__init__.py # new file: python/tvm/meta_schedule/database/ordered_union_database.py # new file: python/tvm/meta_schedule/database/union_database.py # modified: src/meta_schedule/database/json_database.cc # new file: src/meta_schedule/database/ordered_union_database.cc # new file: src/meta_schedule/database/union_database.cc # modified: src/meta_schedule/utils.h # modified: tests/python/unittest/test_link_params.py # modified: tests/python/unittest/test_meta_schedule_database.py # # Please enter the commit message for your changes. Lines starting # with '#' will be kept; you may remove them yourself if you want to. # An empty message aborts the commit. # # Date: Sat Aug 27 17:36:54 2022 -0700 # # On branch feature/2022-08-27/merged-database # Your branch and 'origin/feature/2022-08-27/merged-database' have diverged, # and have 1 and 1 different commits each, respectively. # (use "git pull" to merge the remote branch into yours) # # Changes to be committed: # modified: include/tvm/meta_schedule/database.h # modified: python/tvm/meta_schedule/database/__init__.py # new file: python/tvm/meta_schedule/database/ordered_union_database.py # new file: python/tvm/meta_schedule/database/union_database.py # modified: src/meta_schedule/database/json_database.cc # new file: src/meta_schedule/database/ordered_union_database.cc # new file: src/meta_schedule/database/union_database.cc # modified: src/meta_schedule/utils.h # modified: tests/python/unittest/test_link_params.py # modified: tests/python/unittest/test_meta_schedule_database.py #

Following up apache#12520 and apache#12626, this PR introduces two database classes: `UnionDatabase` and `OrderedUnionDatabase`, both of which allow users to organically compose multiple databases together, so that the high-level IR (Relay, Relax) could select the best tuning records according to running time or a preferred order given by users. To each query, `UnionDatabase` returns the best record among all the databases given; Instead, `OrderedUnionDatabase` returns he record from the first database that responds to the query. Used together, users may specify complicated dispatching patterns like below: Examples below demonstrate the usecases of and difference between UnionDatabase and OrderDatabase. Assumption: * db1, db2 do not have tuning records for the target workload. * Each of db3, db4, db5 has tuning records r3, r4, r5 for target workload respectively. ```python #### Case 1. `UnionDatabase`: merged_db = ms.database.UnionDatabase( db1, # no record db2, # no record db3, # has r3 db4 # has r4 ) # returns the better one between r3 and r4 merged_db.query_tuning_record(..., target_workload) ### Case 2. `OrderedUnionDatabase` merged_db = ms.database.OrderedUnionDatabase( db1, # no record db2, # no record db3, # has r3 db4 # has r4 ) # returns r3 merged_db.query_tuning_record(..., target_workload) ### Case 3. Mix-use scenario merged_db = ms.database.UnionDatabase( db1, # no record db2, # no record db3, # has r3 ms.database.OrderedUnionDatabase( # returns r4 db4, # has r4 db5, # has r5 ) ) # returns the better one between r3 and r4 merged_db.query_tuning_record(..., target_workload) ### Case 4. Another mix-use scenario merged_db = ms.database.UnionDatabase( db1, # no record db2, # no record db3, # has r3 ms.database.UnionDatabase( # returns the better one between r4 and r5 db4, # has r4 db5, # has r5 ) ) # returns the best one among r3, r4 and r5 merged_db.query_tuning_record(..., target_workload) ### Case 5. Yet another mix-use scenario merged_db = ms.database.OrderedUnionDatabase( db1, # no record db2, # no record ms.database.UnionDatabase( # returns the better one between r3 and r4 db3, # has r3 db4, # has r4 ) db5, # has r5 ) # returns the better one between r3 and r4 merged_db.query_tuning_record(..., target_workload) ``` Co-authored-by: sunggg <49998730+sunggg@users.noreply.github.com>

Following up #12520 and #12626, this PR introduces two database classes: `UnionDatabase` and `OrderedUnionDatabase`, both of which allow users to organically compose multiple databases together, so that the high-level IR (Relay, Relax) could select the best tuning records according to running time or a preferred order given by users. To each query, `UnionDatabase` returns the best record among all the databases given; Instead, `OrderedUnionDatabase` returns he record from the first database that responds to the query. Used together, users may specify complicated dispatching patterns like below: Examples below demonstrate the usecases of and difference between UnionDatabase and OrderDatabase. Assumption: * db1, db2 do not have tuning records for the target workload. * Each of db3, db4, db5 has tuning records r3, r4, r5 for target workload respectively. ```python #### Case 1. `UnionDatabase`: merged_db = ms.database.UnionDatabase( db1, # no record db2, # no record db3, # has r3 db4 # has r4 ) # returns the better one between r3 and r4 merged_db.query_tuning_record(..., target_workload) ### Case 2. `OrderedUnionDatabase` merged_db = ms.database.OrderedUnionDatabase( db1, # no record db2, # no record db3, # has r3 db4 # has r4 ) # returns r3 merged_db.query_tuning_record(..., target_workload) ### Case 3. Mix-use scenario merged_db = ms.database.UnionDatabase( db1, # no record db2, # no record db3, # has r3 ms.database.OrderedUnionDatabase( # returns r4 db4, # has r4 db5, # has r5 ) ) # returns the better one between r3 and r4 merged_db.query_tuning_record(..., target_workload) ### Case 4. Another mix-use scenario merged_db = ms.database.UnionDatabase( db1, # no record db2, # no record db3, # has r3 ms.database.UnionDatabase( # returns the better one between r4 and r5 db4, # has r4 db5, # has r5 ) ) # returns the best one among r3, r4 and r5 merged_db.query_tuning_record(..., target_workload) ### Case 5. Yet another mix-use scenario merged_db = ms.database.OrderedUnionDatabase( db1, # no record db2, # no record ms.database.UnionDatabase( # returns the better one between r3 and r4 db3, # has r3 db4, # has r4 ) db5, # has r5 ) # returns the better one between r3 and r4 merged_db.query_tuning_record(..., target_workload) ``` Co-authored-by: sunggg <49998730+sunggg@users.noreply.github.com>

Following apache#12520, this PR introduces `ScheduleFnDatabase`, a mocked database to allow injecting handcrafted schedules provided by a schedule function. The schedule function comes with the following signature: ```python def schedule_fn( sch: tir.Schedule, ) -> bool: task_name = sch.mod.attrs["task_name"] # ^^^ provides an optional name of the task queried ... ``` This mocked database helps incorporate the existing testing utility `apply_fixed_schedule` more formally into the MetaSchedule-Relay build pipeline, and allows further extension to Relax with the same interface. Next as another follow-up, we will introduce ConcatDatabase that allows mixing multiple databases, including the mocked and ones from JSON files.

…he#12628) Following up apache#12520 and apache#12626, this PR introduces two database classes: `UnionDatabase` and `OrderedUnionDatabase`, both of which allow users to organically compose multiple databases together, so that the high-level IR (Relay, Relax) could select the best tuning records according to running time or a preferred order given by users. To each query, `UnionDatabase` returns the best record among all the databases given; Instead, `OrderedUnionDatabase` returns he record from the first database that responds to the query. Used together, users may specify complicated dispatching patterns like below: Examples below demonstrate the usecases of and difference between UnionDatabase and OrderDatabase. Assumption: * db1, db2 do not have tuning records for the target workload. * Each of db3, db4, db5 has tuning records r3, r4, r5 for target workload respectively. ```python #### Case 1. `UnionDatabase`: merged_db = ms.database.UnionDatabase( db1, # no record db2, # no record db3, # has r3 db4 # has r4 ) # returns the better one between r3 and r4 merged_db.query_tuning_record(..., target_workload) ### Case 2. `OrderedUnionDatabase` merged_db = ms.database.OrderedUnionDatabase( db1, # no record db2, # no record db3, # has r3 db4 # has r4 ) # returns r3 merged_db.query_tuning_record(..., target_workload) ### Case 3. Mix-use scenario merged_db = ms.database.UnionDatabase( db1, # no record db2, # no record db3, # has r3 ms.database.OrderedUnionDatabase( # returns r4 db4, # has r4 db5, # has r5 ) ) # returns the better one between r3 and r4 merged_db.query_tuning_record(..., target_workload) ### Case 4. Another mix-use scenario merged_db = ms.database.UnionDatabase( db1, # no record db2, # no record db3, # has r3 ms.database.UnionDatabase( # returns the better one between r4 and r5 db4, # has r4 db5, # has r5 ) ) # returns the best one among r3, r4 and r5 merged_db.query_tuning_record(..., target_workload) ### Case 5. Yet another mix-use scenario merged_db = ms.database.OrderedUnionDatabase( db1, # no record db2, # no record ms.database.UnionDatabase( # returns the better one between r3 and r4 db3, # has r3 db4, # has r4 ) db5, # has r5 ) # returns the better one between r3 and r4 merged_db.query_tuning_record(..., target_workload) ``` Co-authored-by: sunggg <49998730+sunggg@users.noreply.github.com>

junrushao marked this pull request as ready for review August 27, 2022 23:19

junrushao mentioned this pull request Aug 28, 2022

[MetaSchedule] Introduce Union and OrderedUnion in Database #12628

Merged

MasterJH5574 approved these changes Aug 29, 2022

View reviewed changes

masahi approved these changes Aug 29, 2022

View reviewed changes

masahi merged commit 648a29a into apache:main Aug 29, 2022

sunggg reviewed Aug 29, 2022

View reviewed changes

AndrewZhaoLuo mentioned this pull request Oct 4, 2022

TVM v0.10.0.rc0 Release Candidate Notes #12979

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MetaSchedule] Introduce `ScheduleFnDatabase` #12626

[MetaSchedule] Introduce `ScheduleFnDatabase` #12626

junrushao commented Aug 27, 2022 •

edited by github-actions bot

Loading

junrushao commented Aug 27, 2022

MasterJH5574 left a comment

junrushao commented Aug 29, 2022

masahi left a comment •

edited

Loading

masahi Aug 29, 2022 •

edited

Loading

junrushao Aug 29, 2022

sunggg left a comment •

edited

Loading

sunggg Aug 29, 2022

sunggg Aug 29, 2022

[MetaSchedule] Introduce ScheduleFnDatabase #12626

[MetaSchedule] Introduce ScheduleFnDatabase #12626

Conversation

junrushao commented Aug 27, 2022 • edited by github-actions bot Loading

junrushao commented Aug 27, 2022

MasterJH5574 left a comment

Choose a reason for hiding this comment

junrushao commented Aug 29, 2022

masahi left a comment • edited Loading

Choose a reason for hiding this comment

masahi Aug 29, 2022 • edited Loading

Choose a reason for hiding this comment

junrushao Aug 29, 2022

Choose a reason for hiding this comment

sunggg left a comment • edited Loading

Choose a reason for hiding this comment

sunggg Aug 29, 2022

Choose a reason for hiding this comment

sunggg Aug 29, 2022

Choose a reason for hiding this comment

[MetaSchedule] Introduce `ScheduleFnDatabase` #12626

[MetaSchedule] Introduce `ScheduleFnDatabase` #12626

junrushao commented Aug 27, 2022 •

edited by github-actions bot

Loading

masahi left a comment •

edited

Loading

masahi Aug 29, 2022 •

edited

Loading

sunggg left a comment •

edited

Loading