Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BenchGC] add tuner tools for benchgc #358

Open
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

xurui1995
Copy link
Contributor

add tuner tools for the benchgc to support auto-tuning

Comment on lines 84 to 106
default_blocks = [16, 32, 64, 128, 256, 512]
default_innermost_blocks = [16, 32]
self.field_candidates["M_threads"] = find_factors(self.num_threads)
self.field_candidates["K_threads"] = find_factors(self.num_threads)
self.field_candidates["N_threads"] = find_factors(self.num_threads)
self.field_candidates["M_block"] = [
block for block in default_blocks if self.M >= block
]
self.field_candidates["K_block"] = [
block for block in default_blocks if self.K >= block
]
self.field_candidates["N_block"] = [
block for block in default_blocks if self.N >= block
]
self.field_candidates["innermostM_block"] = [
block for block in default_innermost_blocks if self.M >= block
]
self.field_candidates["innermostK_block"] = [
block for block in default_innermost_blocks if self.K >= block
]
self.field_candidates["innermostN_block"] = [
block for block in default_innermost_blocks if self.N >= block
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is better to provide the grid options by command line. Developer can control the search space in this way.

@xurui1995 xurui1995 linked an issue Sep 24, 2024 that may be closed by this pull request
Comment on lines +323 to +360
def save_status(self):
save_dict = {
"iter": self.iter,
"last_update_iter": self.last_update_iter,
"best": self.best,
"best_cost": self.best_cost,
"current_idx": self.current_idx,
"skipped_num": self.skipped_num,
}
with open(self.checkpoint, "w") as file:
json.dump(save_dict, file, indent=4)

def load_status(self):
print("continue tuning from checkpoint...")
with open(
self.checkpoint,
"r",
) as file:
try:
data = json.load(file)
assert set(
[
"iter",
"last_update_iter",
"best",
"best_cost",
"current_idx",
"skipped_num",
]
) == set(data.keys())
self.iter = data["iter"]
self.last_update_iter = data["last_update_iter"]
self.best = data["best"]
self.best_cost = data["best_cost"]
self.current_idx = data["current_idx"]
self.skipped_num = data["skipped_num"]
except Exception as e:
print("load checkpoint failed", e)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need this feature? Is tuning a time cost job?

@xurui1995 xurui1995 requested a review from zhczhong September 24, 2024 07:37
@WangJialei-A
Copy link
Contributor

@xurui1995
I would like to suggest to use the mature tuning framework such as optuna to handle this problem
Please see the example here.
https://optuna.readthedocs.io/en/stable/reference/samplers/generated/optuna.samplers.GridSampler.html

@ciyongch
Copy link
Contributor

@xurui1995 I would like to suggest to use the mature tuning framework such as optuna to handle this problem Please see the example here. https://optuna.readthedocs.io/en/stable/reference/samplers/generated/optuna.samplers.GridSampler.html

It seems a good idea to use the existing auto-tuning fwk, let's evaluate if it could meet our requirement for the tuning features, for example, arbitrary tuning space, check-point save and restore, early stop, distributed tuning.


def attach_to_ir(self, op: OpView):
attr_to_field = {
"Mthreads": self.M_threads,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently MatmulConfigAnalysis.cpp reads named attribute MThreads instead of Mthreads. Please align the naming convention here (also for Kthreads and Nthreads).

"MBlock": 128,
"KBlock": 64,
"NBlock": 16,
"innerMostMBlock": 32,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo, shall be innermost with lower case m to match matmul config.

self.innermost_k_block,
self.innermost_n_block,
],
[self.m, self.k, self.n],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The order here shall be m/n/k

## Options
Since bench is also required within the tuner, the tuner also supports benchmarking options.
Unlike bench mode, in tuner mode, a batch quantity of modules is generated each time, and The default values for warm-up and repeat have been adjusted accordingly.
* --bench_kind [py, grid]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

py & wrapper?

self.tunning_space.initial_ir,
)

def run(self, max_iter: int = DEFAULT_MAX_ITERS, timeout: int = DEFAULT_TIMEOUT):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we support module construction in parallel, and then executing them one by one in sequence to reduce the compilation time?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

No verbose on correctness check script add tuner mode for BenchGC to support auto-tuning
4 participants