Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MetaSchedule] Introduce ScheduleFnDatabase #12626

Merged

Conversation

junrushao
Copy link
Member

@junrushao junrushao commented Aug 27, 2022

Following #12520, this PR introduces ScheduleFnDatabase, a mocked
database to allow injecting handcrafted schedules provided by a schedule
function.

The schedule function comes with the following signature:

def schedule_fn(
  sch: tir.Schedule,
) -> bool:
  task_name = sch.mod.attrs["task_name"]
  # ^^^ provides an optional name of the task queried
  ...

This mocked database helps incorporate the existing testing utility
apply_fixed_schedule more formally into the MetaSchedule-Relay build
pipeline, and allows further extension to Relax with the same interface.

Next as another follow-up, we will introduce ConcatDatabase that allows
mixing multiple databases, including the mocked and ones from JSON
files.

cc @Hzfengsy @junrushao1994

Following apache#12520, this PR introduces `ScheduleFnDatabase`, a mocked
database to allow injecting handcrafted schedules provided by a schedule
function.

The schedule function comes with the following signature:

```python
def schedule_fn(
  sch: tir.Schedule,
) -> bool:
  task_name = sch.mod.attrs["task_name"]
  # ^^^ provides an optional name of the task queried
  ...
```

This mocked database helps incorporate the existing testing utility
`apply_fixed_schedule` more formally into the MetaSchedule-Relay build
pipeline, and allows further extension to Relax with the same interface.

Next as another follow-up, we will introduce ConcatDatabase that allows
mixing multiple databases, including the mocked and ones from JSON
files.
@junrushao junrushao marked this pull request as ready for review August 27, 2022 23:19
@junrushao
Copy link
Member Author

This PR is ready for review. CC: @masahi @sunggg @YuchenJin

junrushao added a commit to junrushao/tvm that referenced this pull request Aug 28, 2022
Following up apache#12520 and apache#12626, this PR introduces `MergedDatabase`,
which allow users to compose multiple databases so that the high-level
IR could select the best tuning records among them.

The `MergedDatabase` also comes with an extra field `preferred` to allow
users to override tuning records from other databases. A classic usecase
of the `preferred` parameter is through handcrafted schedule functions:

```python
def schedule_fn(sch: tir.Schedule) -> bool:
  if "nn_conv2d" in sch.mod.attrs["task_name"]:
    handcrafted_scheduling(sch)
    return True
  return False

with ms.database.MergedDatabase(
  databases=[database],
  preferred=ms.database.ScheduleFn(schedule_fn),
):
  lib = relay.build(...)
```
Copy link
Contributor

@MasterJH5574 MasterJH5574 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @junrushao! Since ScheduleFnDatabase is not very intuitive, let’s get the PR and code better documented!

@junrushao
Copy link
Member Author

This will be quite self-explanatory with the following MergedDatabase PR

Copy link
Member

@masahi masahi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand this correctly, this removes the need for task extraction which was done by apply_fixed_schedules. This is very nice!

config={"relay.backend.use_meta_schedule": True},
config={
"relay.backend.use_meta_schedule": True,
"relay.FuseOps.link_params": link_params,
Copy link
Member

@masahi masahi Aug 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove "relay.FuseOps.link_params": link_params now, since we are not running separate task extraction for fake database creation. This config was only for making sure that task extraction and relay.build use the same link_params value.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Will do in the next PR!

@masahi masahi merged commit 648a29a into apache:main Aug 29, 2022
Copy link
Contributor

@sunggg sunggg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR!
LGTM and have some comments below.

=====================================================================
Oops. didn't know it has been already merged. I will just keep this for the future reference.

@@ -407,21 +406,21 @@ def schedule_dense(sch):
target = "llvm"
params = {"weight": weight_np}

def schedule_fn(task, sch):
if "nn_dense" in task.task_name:
def schedule_fn(sch):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay with this for now and agree that this delivers the functionality we need. But like we discussed offline, I have some concerns about this approach and hope to have some discussion in the future PRs.

  • If developers start writing more target-specific, layout-specific manual schedules, we may end up facing complicated if-statement logics that might be hard to maintain.
  • Conditional statement are based on the task name, so when task name is different from what we expect, it wouldn't work. When task naming rule changes, developers might need to update this accordingly, which is not intuitive imo.
  • I agree with @MasterJH5574, I'm not sure if this is intuitive interface for developers. You would need to modify the schedule and return True. Wonder if we can skip manually returning true by automatically checking if the schedule has changed.

@@ -156,7 +156,8 @@ TuningRecord TuningRecord::FromJSON(const ObjectRef& json_obj, const Workload& w

/******** Database ********/

Optional<TuningRecord> DatabaseNode::QueryTuningRecord(IRModule mod, Target target) {
Optional<TuningRecord> DatabaseNode::QueryTuningRecord(const IRModule& mod, const Target& target,
const String& workload_name) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we pass workload_name? The these three functions do not seem to use this: QueryTuningRecord, QuerySchedule, QueryIRModule.

junrushao added a commit to junrushao/tvm that referenced this pull request Aug 29, 2022
Following up apache#12520 and apache#12626, this PR introduces `MergedDatabase`,
which allow users to compose multiple databases so that the high-level
IR could select the best tuning records among them.

The `MergedDatabase` also comes with an extra field `preferred` to allow
users to override tuning records from other databases. A classic usecase
of the `preferred` parameter is through handcrafted schedule functions:

```python
def schedule_fn(sch: tir.Schedule) -> bool:
  if "nn_conv2d" in sch.mod.attrs["task_name"]:
    handcrafted_scheduling(sch)
    return True
  return False

with ms.database.MergedDatabase(
  databases=[database],
  preferred=ms.database.ScheduleFn(schedule_fn),
):
  lib = relay.build(...)
```
junrushao added a commit to junrushao/tvm that referenced this pull request Aug 29, 2022
Following up apache#12520 and apache#12626, this PR introduces `MergedDatabase`,
which allow users to compose multiple databases so that the high-level
IR could select the best tuning records among them.

The `MergedDatabase` also comes with an extra field `preferred` to allow
users to override tuning records from other databases. A classic usecase
of the `preferred` parameter is through handcrafted schedule functions:

```python
def schedule_fn(sch: tir.Schedule) -> bool:
  if "nn_conv2d" in sch.mod.attrs["task_name"]:
    handcrafted_scheduling(sch)
    return True
  return False

with ms.database.MergedDatabase(
  preferred=ms.database.ScheduleFn(schedule_fn),
  # ^^^^ override scheduling decisions
  databases=[database],
  fallback=libtorch_database,
  # ^^^^ fallback to libtorch
):
  lib = relay.build(...)
```
junrushao added a commit to junrushao/tvm that referenced this pull request Aug 29, 2022
Following up apache#12520 and apache#12626, this PR introduces `MergedDatabase`,
which allow users to compose multiple databases so that the high-level
IR could select the best tuning records among them.

The `MergedDatabase` also comes with an extra field `preferred` to allow
users to override tuning records from other databases. A classic usecase
of the `preferred` parameter is through handcrafted schedule functions:

```python
def schedule_fn(sch: tir.Schedule) -> bool:
  if "nn_conv2d" in sch.mod.attrs["task_name"]:
    handcrafted_scheduling(sch)
    return True
  return False

with ms.database.MergedDatabase(
  preferred=ms.database.ScheduleFn(schedule_fn),
  # ^^^^ override scheduling decisions
  databases=[database],
  fallback=libtorch_database,
  # ^^^^ fallback to libtorch
):
  lib = relay.build(...)
```
junrushao added a commit to junrushao/tvm that referenced this pull request Aug 30, 2022
Following up apache#12520 and apache#12626, this PR introduces `MergedDatabase`,
which allow users to compose multiple databases so that the high-level
IR could select the best tuning records among them.

The `MergedDatabase` also comes with an extra field `preferred` to allow
users to override tuning records from other databases. A classic usecase
of the `preferred` parameter is through handcrafted schedule functions:

```python
def schedule_fn(sch: tir.Schedule) -> bool:
  if "nn_conv2d" in sch.mod.attrs["task_name"]:
    handcrafted_scheduling(sch)
    return True
  return False

with ms.database.MergedDatabase(
  preferred=ms.database.ScheduleFn(schedule_fn),
  # ^^^^ override scheduling decisions
  databases=[database],
  fallback=libtorch_database,
  # ^^^^ fallback to libtorch
):
  lib = relay.build(...)
```
junrushao added a commit to junrushao/tvm that referenced this pull request Aug 30, 2022
Following up apache#12520 and apache#12626, this PR introduces `MergedDatabase`,
which allow users to compose multiple databases so that the high-level
IR could select the best tuning records among them.

The `MergedDatabase` also comes with an extra field `preferred` to allow
users to override tuning records from other databases. A classic usecase
of the `preferred` parameter is through handcrafted schedule functions:

```python
def schedule_fn(sch: tir.Schedule) -> bool:
  if "nn_conv2d" in sch.mod.attrs["task_name"]:
    handcrafted_scheduling(sch)
    return True
  return False

with ms.database.MergedDatabase(
  preferred=ms.database.ScheduleFn(schedule_fn),
  # ^^^^ override scheduling decisions
  databases=[database],
  fallback=libtorch_database,
  # ^^^^ fallback to libtorch
):
  lib = relay.build(...)
```
junrushao added a commit to junrushao/tvm that referenced this pull request Aug 31, 2022
Following up apache#12520 and apache#12626, this PR introduces `MergedDatabase`,
which allow users to compose multiple databases so that the high-level
IR could select the best tuning records among them.

The `MergedDatabase` also comes with an extra field `preferred` to allow
users to override tuning records from other databases. A classic usecase
of the `preferred` parameter is through handcrafted schedule functions:

```python
def schedule_fn(sch: tir.Schedule) -> bool:
  if "nn_conv2d" in sch.mod.attrs["task_name"]:
    handcrafted_scheduling(sch)
    return True
  return False

with ms.database.MergedDatabase(
  preferred=ms.database.ScheduleFn(schedule_fn),
  # ^^^^ override scheduling decisions
  databases=[database],
  fallback=libtorch_database,
  # ^^^^ fallback to libtorch
):
  lib = relay.build(...)
```
junrushao added a commit to junrushao/tvm that referenced this pull request Aug 31, 2022
Following up apache#12520 and apache#12626, this PR introduces two database classes:
`UnionDatabase` and `OrderedUnionDatabase`, both of which allow users to
organically compose multiple databases together, so that the high-level
IR (Relay, Relax) could select the best tuning records according to
running time or a preferred order given by users.

To each query, `UnionDatabase` returns the best record among all the
databases given; Instead, `OrderedUnionDatabase` returns he record from
the first database that responds to the query.

Used together, users may specify complicated dispatching patterns like
below:

```python
def schedule_fn(sch: tir.Schedule) -> bool:
  if "nn_conv2d" in sch.mod.attrs["task_name"]:
    if some_other_tir_conditions(sch.mod):
      handcrafted_scheduling(sch)
      return True
  return False

with ms.database.OrderedUnionDatabase(
  ms.database.ScheduleFn(schedule_fn),  # hand-override some scheduling
  ms.database.Union(                    # existing databases
    db_for_matmul,
    db_for_conv2d,
    db_for_softmax,
  )
  libtorch_database,                    # fallback to libtorch
):
  lib = relay.build(...)
```
junrushao added a commit to junrushao/tvm that referenced this pull request Aug 31, 2022
Following up apache#12520 and apache#12626, this PR introduces two database classes:
`UnionDatabase` and `OrderedUnionDatabase`, both of which allow users to
organically compose multiple databases together, so that the high-level
IR (Relay, Relax) could select the best tuning records according to
running time or a preferred order given by users.

To each query, `UnionDatabase` returns the best record among all the
databases given; Instead, `OrderedUnionDatabase` returns he record from
the first database that responds to the query.

Used together, users may specify complicated dispatching patterns like
below:

```python
def schedule_fn(sch: tir.Schedule) -> bool:
  if "nn_conv2d" in sch.mod.attrs["task_name"]:
    if some_other_tir_conditions(sch.mod):
      handcrafted_scheduling(sch)
      return True
  return False

with ms.database.OrderedUnionDatabase(
  ms.database.ScheduleFn(schedule_fn),  # hand-override some scheduling
  ms.database.Union(                    # existing databases
    db_for_matmul,
    db_for_conv2d,
    db_for_softmax,
  )
  libtorch_database,                    # fallback to libtorch
):
  lib = relay.build(...)
```
junrushao added a commit to junrushao/tvm that referenced this pull request Aug 31, 2022
Following up apache#12520 and apache#12626, this PR introduces two database classes:
`UnionDatabase` and `OrderedUnionDatabase`, both of which allow users to
organically compose multiple databases together, so that the high-level
IR (Relay, Relax) could select the best tuning records according to
running time or a preferred order given by users.

To each query, `UnionDatabase` returns the best record among all the
databases given; Instead, `OrderedUnionDatabase` returns he record from
the first database that responds to the query.

Used together, users may specify complicated dispatching patterns like
below:

```python
def schedule_fn(sch: tir.Schedule) -> bool:
  if "nn_conv2d" in sch.mod.attrs["task_name"]:
    if some_other_tir_conditions(sch.mod):
      handcrafted_scheduling(sch)
      return True
  return False

with ms.database.OrderedUnionDatabase(
  ms.database.ScheduleFn(schedule_fn),  # hand-override some scheduling
  ms.database.Union(                    # existing databases
    db_for_matmul,
    db_for_conv2d,
    db_for_softmax,
  )
  libtorch_database,                    # fallback to libtorch
):
  lib = relay.build(...)
```
junrushao added a commit to junrushao/tvm that referenced this pull request Aug 31, 2022
Following up apache#12520 and apache#12626, this PR introduces two database classes:
`UnionDatabase` and `OrderedUnionDatabase`, both of which allow users to
organically compose multiple databases together, so that the high-level
IR (Relay, Relax) could select the best tuning records according to
running time or a preferred order given by users.

To each query, `UnionDatabase` returns the best record among all the
databases given; Instead, `OrderedUnionDatabase` returns he record from
the first database that responds to the query.

Used together, users may specify complicated dispatching patterns like
below:

```python
def schedule_fn(sch: tir.Schedule) -> bool:
  if "nn_conv2d" in sch.mod.attrs["task_name"]:
    if some_other_tir_conditions(sch.mod):
      handcrafted_scheduling(sch)
      return True
  return False

with ms.database.OrderedUnionDatabase(
  ms.database.ScheduleFn(schedule_fn),  # hand-override some scheduling
  ms.database.Union(                    # existing databases
    db_for_matmul,
    db_for_conv2d,
    db_for_softmax,
  )
  libtorch_database,                    # fallback to libtorch
):
  lib = relay.build(...)
```
junrushao added a commit to junrushao/tvm that referenced this pull request Sep 1, 2022
Following up apache#12520 and apache#12626, this PR introduces two database classes:
`UnionDatabase` and `OrderedUnionDatabase`, both of which allow users to
organically compose multiple databases together, so that the high-level
IR (Relay, Relax) could select the best tuning records according to
running time or a preferred order given by users.

To each query, `UnionDatabase` returns the best record among all the
databases given; Instead, `OrderedUnionDatabase` returns he record from
the first database that responds to the query.

Used together, users may specify complicated dispatching patterns like
below:

Examples below demonstrate the usecases of and difference between
UnionDatabase and OrderDatabase.

Assumption:
* db1, db2 do not have tuning records for the target workload.
* Each of db3, db4, db5 has tuning records r3, r4, r5 for target
workload respectively.

```python
merged_db = ms.database.UnionDatabase(
    db1, # no record
    db2, # no record
    db3, # has r3
    db4  # has r4
)
merged_db.query_tuning_record(..., target_workload)

merged_db = ms.database.OrderedUnionDatabase(
    db1, # no record
    db2, # no record
    db3, # has r3
    db4  # has r4
)
merged_db.query_tuning_record(..., target_workload)

merged_db = ms.database.UnionDatabase(
    db1, # no record
    db2, # no record
    db3, # has r3
    ms.database.OrderedUnionDatabase( # returns r4
        db4,  # has r4
        db5,  # has r5
    )
)
merged_db.query_tuning_record(..., target_workload)

merged_db = ms.database.UnionDatabase(
    db1, # no record
    db2, # no record
    db3, # has r3
    ms.database.UnionDatabase( # returns best one between r4 and r5
        db4,  # has r4
        db5,  # has r5
    )
)
merged_db.query_tuning_record(..., target_workload)

merged_db = ms.database.OrderedUnionDatabase(
    db1, # no record
    db2, # no record
    ms.database.UnionDatabase( # returns best one between r3 and r4
        db3, # has r3
        db4,  # has r4
    )
    db5,  # has r5
)
merged_db.query_tuning_record(..., target_workload)
```

Co-authored-by: sunggg <49998730+sunggg@users.noreply.github.com>
junrushao added a commit to junrushao/tvm that referenced this pull request Sep 1, 2022
Following up apache#12520 and apache#12626, this PR introduces two database classes:
`UnionDatabase` and `OrderedUnionDatabase`, both of which allow users to
organically compose multiple databases together, so that the high-level
IR (Relay, Relax) could select the best tuning records according to
running time or a preferred order given by users.

To each query, `UnionDatabase` returns the best record among all the
databases given; Instead, `OrderedUnionDatabase` returns he record from
the first database that responds to the query.

Used together, users may specify complicated dispatching patterns like
below:

Examples below demonstrate the usecases of and difference between
UnionDatabase and OrderDatabase.

Assumption:
* db1, db2 do not have tuning records for the target workload.
* Each of db3, db4, db5 has tuning records r3, r4, r5 for target
workload respectively.

```python
merged_db = ms.database.UnionDatabase(
    db1, # no record
    db2, # no record
    db3, # has r3
    db4  # has r4
)
merged_db.query_tuning_record(..., target_workload)

merged_db = ms.database.OrderedUnionDatabase(
    db1, # no record
    db2, # no record
    db3, # has r3
    db4  # has r4
)
merged_db.query_tuning_record(..., target_workload)

merged_db = ms.database.UnionDatabase(
    db1, # no record
    db2, # no record
    db3, # has r3
    ms.database.OrderedUnionDatabase( # returns r4
        db4,  # has r4
        db5,  # has r5
    )
)
merged_db.query_tuning_record(..., target_workload)

merged_db = ms.database.UnionDatabase(
    db1, # no record
    db2, # no record
    db3, # has r3
    ms.database.UnionDatabase( # returns best one between r4 and r5
        db4,  # has r4
        db5,  # has r5
    )
)
merged_db.query_tuning_record(..., target_workload)

merged_db = ms.database.OrderedUnionDatabase(
    db1, # no record
    db2, # no record
    ms.database.UnionDatabase( # returns best one between r3 and r4
        db3, # has r3
        db4,  # has r4
    )
    db5,  # has r5
)
merged_db.query_tuning_record(..., target_workload)
```

Co-authored-by: sunggg <49998730+sunggg@users.noreply.github.com>
junrushao added a commit to junrushao/tvm that referenced this pull request Sep 1, 2022
Following up apache#12520 and apache#12626, this PR introduces two database classes:
`UnionDatabase` and `OrderedUnionDatabase`, both of which allow users to
organically compose multiple databases together, so that the high-level
IR (Relay, Relax) could select the best tuning records according to
running time or a preferred order given by users.

To each query, `UnionDatabase` returns the best record among all the
databases given; Instead, `OrderedUnionDatabase` returns he record from
the first database that responds to the query.

Used together, users may specify complicated dispatching patterns like
below:

Examples below demonstrate the usecases of and difference between
UnionDatabase and OrderDatabase.

Assumption:
* db1, db2 do not have tuning records for the target workload.
* Each of db3, db4, db5 has tuning records r3, r4, r5 for target
workload respectively.

```python
#### Case 1. `UnionDatabase`:
merged_db = ms.database.UnionDatabase(
    db1, # no record
    db2, # no record
    db3, # has r3
    db4  # has r4
)
# returns the better one between r3 and r4
merged_db.query_tuning_record(..., target_workload)

### Case 2. `OrderedUnionDatabase`
merged_db = ms.database.OrderedUnionDatabase(
    db1, # no record
    db2, # no record
    db3, # has r3
    db4  # has r4
)
# returns r3
merged_db.query_tuning_record(..., target_workload)

### Case 3. Mix-use scenario
merged_db = ms.database.UnionDatabase(
    db1, # no record
    db2, # no record
    db3, # has r3
    ms.database.OrderedUnionDatabase( # returns r4
        db4,  # has r4
        db5,  # has r5
    )
)
# returns the better one between r3 and r4
merged_db.query_tuning_record(..., target_workload)

### Case 4. Another mix-use scenario
merged_db = ms.database.UnionDatabase(
    db1, # no record
    db2, # no record
    db3, # has r3
    ms.database.UnionDatabase( # returns the better one between r4 and r5
        db4,  # has r4
        db5,  # has r5
    )
)
# returns the best one among r3, r4 and r5
merged_db.query_tuning_record(..., target_workload)

### Case 5. Yet another mix-use scenario
merged_db = ms.database.OrderedUnionDatabase(
    db1, # no record
    db2, # no record
    ms.database.UnionDatabase( # returns the better one between r3 and r4
        db3, # has r3
        db4, # has r4
    )
    db5,  # has r5
)
# returns the better one between r3 and r4
merged_db.query_tuning_record(..., target_workload)
```

Co-authored-by: sunggg <49998730+sunggg@users.noreply.github.com>

# Please enter the commit message for your changes. Lines starting
# with '#' will be kept; you may remove them yourself if you want to.
# An empty message aborts the commit.
#
# Date:      Sat Aug 27 17:36:54 2022 -0700
#
# On branch feature/2022-08-27/merged-database
# Your branch and 'origin/feature/2022-08-27/merged-database' have diverged,
# and have 1 and 1 different commits each, respectively.
#   (use "git pull" to merge the remote branch into yours)
#
# Changes to be committed:
#	modified:   include/tvm/meta_schedule/database.h
#	modified:   python/tvm/meta_schedule/database/__init__.py
#	new file:   python/tvm/meta_schedule/database/ordered_union_database.py
#	new file:   python/tvm/meta_schedule/database/union_database.py
#	modified:   src/meta_schedule/database/json_database.cc
#	new file:   src/meta_schedule/database/ordered_union_database.cc
#	new file:   src/meta_schedule/database/union_database.cc
#	modified:   src/meta_schedule/utils.h
#	modified:   tests/python/unittest/test_link_params.py
#	modified:   tests/python/unittest/test_meta_schedule_database.py
#

# Please enter the commit message for your changes. Lines starting
# with '#' will be kept; you may remove them yourself if you want to.
# An empty message aborts the commit.
#
# Date:      Sat Aug 27 17:36:54 2022 -0700
#
# On branch feature/2022-08-27/merged-database
# Your branch and 'origin/feature/2022-08-27/merged-database' have diverged,
# and have 1 and 1 different commits each, respectively.
#   (use "git pull" to merge the remote branch into yours)
#
# Changes to be committed:
#	modified:   include/tvm/meta_schedule/database.h
#	modified:   python/tvm/meta_schedule/database/__init__.py
#	new file:   python/tvm/meta_schedule/database/ordered_union_database.py
#	new file:   python/tvm/meta_schedule/database/union_database.py
#	modified:   src/meta_schedule/database/json_database.cc
#	new file:   src/meta_schedule/database/ordered_union_database.cc
#	new file:   src/meta_schedule/database/union_database.cc
#	modified:   src/meta_schedule/utils.h
#	modified:   tests/python/unittest/test_link_params.py
#	modified:   tests/python/unittest/test_meta_schedule_database.py
#

# Please enter the commit message for your changes. Lines starting
# with '#' will be kept; you may remove them yourself if you want to.
# An empty message aborts the commit.
#
# Date:      Sat Aug 27 17:36:54 2022 -0700
#
# On branch feature/2022-08-27/merged-database
# Your branch and 'origin/feature/2022-08-27/merged-database' have diverged,
# and have 1 and 1 different commits each, respectively.
#   (use "git pull" to merge the remote branch into yours)
#
# Changes to be committed:
#	modified:   include/tvm/meta_schedule/database.h
#	modified:   python/tvm/meta_schedule/database/__init__.py
#	new file:   python/tvm/meta_schedule/database/ordered_union_database.py
#	new file:   python/tvm/meta_schedule/database/union_database.py
#	modified:   src/meta_schedule/database/json_database.cc
#	new file:   src/meta_schedule/database/ordered_union_database.cc
#	new file:   src/meta_schedule/database/union_database.cc
#	modified:   src/meta_schedule/utils.h
#	modified:   tests/python/unittest/test_link_params.py
#	modified:   tests/python/unittest/test_meta_schedule_database.py
#
junrushao added a commit to junrushao/tvm that referenced this pull request Sep 1, 2022
Following up apache#12520 and apache#12626, this PR introduces two database classes:
`UnionDatabase` and `OrderedUnionDatabase`, both of which allow users to
organically compose multiple databases together, so that the high-level
IR (Relay, Relax) could select the best tuning records according to
running time or a preferred order given by users.

To each query, `UnionDatabase` returns the best record among all the
databases given; Instead, `OrderedUnionDatabase` returns he record from
the first database that responds to the query.

Used together, users may specify complicated dispatching patterns like
below:

Examples below demonstrate the usecases of and difference between
UnionDatabase and OrderDatabase.

Assumption:
* db1, db2 do not have tuning records for the target workload.
* Each of db3, db4, db5 has tuning records r3, r4, r5 for target
workload respectively.

```python
#### Case 1. `UnionDatabase`:
merged_db = ms.database.UnionDatabase(
    db1, # no record
    db2, # no record
    db3, # has r3
    db4  # has r4
)
# returns the better one between r3 and r4
merged_db.query_tuning_record(..., target_workload)

### Case 2. `OrderedUnionDatabase`
merged_db = ms.database.OrderedUnionDatabase(
    db1, # no record
    db2, # no record
    db3, # has r3
    db4  # has r4
)
# returns r3
merged_db.query_tuning_record(..., target_workload)

### Case 3. Mix-use scenario
merged_db = ms.database.UnionDatabase(
    db1, # no record
    db2, # no record
    db3, # has r3
    ms.database.OrderedUnionDatabase( # returns r4
        db4,  # has r4
        db5,  # has r5
    )
)
# returns the better one between r3 and r4
merged_db.query_tuning_record(..., target_workload)

### Case 4. Another mix-use scenario
merged_db = ms.database.UnionDatabase(
    db1, # no record
    db2, # no record
    db3, # has r3
    ms.database.UnionDatabase( # returns the better one between r4 and r5
        db4,  # has r4
        db5,  # has r5
    )
)
# returns the best one among r3, r4 and r5
merged_db.query_tuning_record(..., target_workload)

### Case 5. Yet another mix-use scenario
merged_db = ms.database.OrderedUnionDatabase(
    db1, # no record
    db2, # no record
    ms.database.UnionDatabase( # returns the better one between r3 and r4
        db3, # has r3
        db4, # has r4
    )
    db5,  # has r5
)
# returns the better one between r3 and r4
merged_db.query_tuning_record(..., target_workload)
```

Co-authored-by: sunggg <49998730+sunggg@users.noreply.github.com>
junrushao added a commit that referenced this pull request Sep 1, 2022
Following up #12520 and #12626, this PR introduces two database classes:
`UnionDatabase` and `OrderedUnionDatabase`, both of which allow users to
organically compose multiple databases together, so that the high-level
IR (Relay, Relax) could select the best tuning records according to
running time or a preferred order given by users.

To each query, `UnionDatabase` returns the best record among all the
databases given; Instead, `OrderedUnionDatabase` returns he record from
the first database that responds to the query.

Used together, users may specify complicated dispatching patterns like
below:

Examples below demonstrate the usecases of and difference between
UnionDatabase and OrderDatabase.

Assumption:
* db1, db2 do not have tuning records for the target workload.
* Each of db3, db4, db5 has tuning records r3, r4, r5 for target
workload respectively.

```python
#### Case 1. `UnionDatabase`:
merged_db = ms.database.UnionDatabase(
    db1, # no record
    db2, # no record
    db3, # has r3
    db4  # has r4
)
# returns the better one between r3 and r4
merged_db.query_tuning_record(..., target_workload)

### Case 2. `OrderedUnionDatabase`
merged_db = ms.database.OrderedUnionDatabase(
    db1, # no record
    db2, # no record
    db3, # has r3
    db4  # has r4
)
# returns r3
merged_db.query_tuning_record(..., target_workload)

### Case 3. Mix-use scenario
merged_db = ms.database.UnionDatabase(
    db1, # no record
    db2, # no record
    db3, # has r3
    ms.database.OrderedUnionDatabase( # returns r4
        db4,  # has r4
        db5,  # has r5
    )
)
# returns the better one between r3 and r4
merged_db.query_tuning_record(..., target_workload)

### Case 4. Another mix-use scenario
merged_db = ms.database.UnionDatabase(
    db1, # no record
    db2, # no record
    db3, # has r3
    ms.database.UnionDatabase( # returns the better one between r4 and r5
        db4,  # has r4
        db5,  # has r5
    )
)
# returns the best one among r3, r4 and r5
merged_db.query_tuning_record(..., target_workload)

### Case 5. Yet another mix-use scenario
merged_db = ms.database.OrderedUnionDatabase(
    db1, # no record
    db2, # no record
    ms.database.UnionDatabase( # returns the better one between r3 and r4
        db3, # has r3
        db4, # has r4
    )
    db5,  # has r5
)
# returns the better one between r3 and r4
merged_db.query_tuning_record(..., target_workload)
```

Co-authored-by: sunggg <49998730+sunggg@users.noreply.github.com>
xinetzone pushed a commit to daobook/tvm that referenced this pull request Nov 25, 2022
Following apache#12520, this PR introduces `ScheduleFnDatabase`, a mocked
database to allow injecting handcrafted schedules provided by a schedule
function.

The schedule function comes with the following signature:

```python
def schedule_fn(
  sch: tir.Schedule,
) -> bool:
  task_name = sch.mod.attrs["task_name"]
  # ^^^ provides an optional name of the task queried
  ...
```

This mocked database helps incorporate the existing testing utility
`apply_fixed_schedule` more formally into the MetaSchedule-Relay build
pipeline, and allows further extension to Relax with the same interface.

Next as another follow-up, we will introduce ConcatDatabase that allows
mixing multiple databases, including the mocked and ones from JSON
files.
xinetzone pushed a commit to daobook/tvm that referenced this pull request Nov 25, 2022
…he#12628)

Following up apache#12520 and apache#12626, this PR introduces two database classes:
`UnionDatabase` and `OrderedUnionDatabase`, both of which allow users to
organically compose multiple databases together, so that the high-level
IR (Relay, Relax) could select the best tuning records according to
running time or a preferred order given by users.

To each query, `UnionDatabase` returns the best record among all the
databases given; Instead, `OrderedUnionDatabase` returns he record from
the first database that responds to the query.

Used together, users may specify complicated dispatching patterns like
below:

Examples below demonstrate the usecases of and difference between
UnionDatabase and OrderDatabase.

Assumption:
* db1, db2 do not have tuning records for the target workload.
* Each of db3, db4, db5 has tuning records r3, r4, r5 for target
workload respectively.

```python
#### Case 1. `UnionDatabase`:
merged_db = ms.database.UnionDatabase(
    db1, # no record
    db2, # no record
    db3, # has r3
    db4  # has r4
)
# returns the better one between r3 and r4
merged_db.query_tuning_record(..., target_workload)

### Case 2. `OrderedUnionDatabase`
merged_db = ms.database.OrderedUnionDatabase(
    db1, # no record
    db2, # no record
    db3, # has r3
    db4  # has r4
)
# returns r3
merged_db.query_tuning_record(..., target_workload)

### Case 3. Mix-use scenario
merged_db = ms.database.UnionDatabase(
    db1, # no record
    db2, # no record
    db3, # has r3
    ms.database.OrderedUnionDatabase( # returns r4
        db4,  # has r4
        db5,  # has r5
    )
)
# returns the better one between r3 and r4
merged_db.query_tuning_record(..., target_workload)

### Case 4. Another mix-use scenario
merged_db = ms.database.UnionDatabase(
    db1, # no record
    db2, # no record
    db3, # has r3
    ms.database.UnionDatabase( # returns the better one between r4 and r5
        db4,  # has r4
        db5,  # has r5
    )
)
# returns the best one among r3, r4 and r5
merged_db.query_tuning_record(..., target_workload)

### Case 5. Yet another mix-use scenario
merged_db = ms.database.OrderedUnionDatabase(
    db1, # no record
    db2, # no record
    ms.database.UnionDatabase( # returns the better one between r3 and r4
        db3, # has r3
        db4, # has r4
    )
    db5,  # has r5
)
# returns the better one between r3 and r4
merged_db.query_tuning_record(..., target_workload)
```

Co-authored-by: sunggg <49998730+sunggg@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants