Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Misc Improvement of Evolutionary Search #549

Conversation

junrushao
Copy link
Member

No description provided.

@junrushao junrushao changed the title Fix Segfault in Evo Search Misc Improvement of Evolutionary Search Dec 11, 2021
@junrushao
Copy link
Member Author

CC @zxybazh: I finally managed to get rid of CachedTrace :-)

@junrushao junrushao force-pushed the misc/2021-12-10/fix-feature-extraction branch 2 times, most recently from 526b92d to d753e87 Compare December 12, 2021 10:29
@@ -146,8 +146,8 @@ bool RewriteCooperativeFetchNode::Apply(const tir::Schedule& sch) {
sch->Vectorize(split[3]);
sch->Bind(split[2], "threadIdx.x");
sch->Bind(split[1], "threadIdx.y");
sch->StorageAlign(block, 0, -2, 32, 8);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@spectrometerHBH Let me know if my fix makes sense to you

@@ -129,7 +131,7 @@ def main(var_A: T.handle, var_B: T.handle, var_C: T.handle) -> None:
v1 = T.axis.spatial(512, i2_0_0 * 128 + (ax0_ax1_fused_0 * 256 + ax0_ax1_fused_1 * 32 + ax0_ax1_fused_2) % 128)
T.reads([A[v0, v1]])
T.writes([A_shared[v0, v1]])
T.block_attr({"meta_schedule.cooperative_fetch":1})
T.block_attr({"meta_schedule.cooperative_fetch":1, "buffer_dim_align": [[0, 0, 32, 8]]})
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@spectrometerHBH Let me know if the fix makes sense to you

@@ -283,6 +285,7 @@ def main(var_A: T.handle, var_B: T.handle, var_C: T.handle) -> None:
jo = T.axis.spatial(32, i0_0_1_i1_0_1_fused % 2 * 16 + i0_0_2_i1_0_2_fused * 2 + i1_0_4_init)
T.reads([])
T.writes([C_local_wmma_accumulator[io * 16 : io * 16 + 16, jo * 16 : jo * 16 + 16]])
T.block_attr({"meta_schedule.auto_tensorize":"wmma_fill"})
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@spectrometerHBH Let me know if the fix makes sense to you

@@ -58,8 +59,15 @@ class ThreadBindingUnifier : public StmtExprMutator {
if (op->kind != ForKind::kThreadBinding) {
return StmtExprMutator::VisitStmt_(op);
}
return UnifyThreadBindingImpl(op, op->loop_var, op->thread_binding.value(),
Range::FromMinExtent(op->min, op->extent));
Map<String, ObjectRef> annotations = op->annotations;
Copy link
Member Author

@junrushao junrushao Dec 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vinx13 @jinhongyii Let me know if the quick hack makes sense to you

@junrushao junrushao force-pushed the misc/2021-12-10/fix-feature-extraction branch from 1fef7f2 to 8a99e3f Compare December 12, 2021 10:42
@junrushao junrushao merged commit 2fca198 into tlc-pack:meta-schedule-refactor Dec 12, 2021
spectrometerHBH added a commit that referenced this pull request Dec 30, 2021
[Meta Schedule][M3c] Schedule Rules, Mutator & Postprocs (#485)

[Meta Schedule][M3c] PostOrderApply (#486)

Fix Post Order Apply (#490)

[MetaSchedule] Relay Integration (#489)

[M3c][Meta Schedule] Add Trace Correctness Test for PostOrderApply (#492)

Fix replay trace. (#493)

[M3c][Meta Schedule] Implement the Replay Func class. (#495)

[PR] Test script for meta-schedule task extraction. Interface to load… (#494)

[Meta Schedule Refactor] Get child blocks (#500)

Read-at && Write-at (#497)

[M3c][Meta Schedule] Measure Callbacks (#498)

[Bug] Fix Infinite Loop Caused When Calling Methods Not Overrided In PyClass (#496)

[MetaSchedule] Sample-Perfect-Tile (#501)

[MetaSchedule] TE Workloads (#502)

[TensorIR] GetProducer, GetConsumer (#506)

[MetaScheduleRefactor] Annotate&Unannotate (#505)

[MetaSchedule] Multi-Level-Tiling & Auto-Inline (#503)

[Tests] Add unittests for auto-inline and multi-level-tiling (#508)

[Meta Schedule] Minor Fixes (#507)

[MetaSchedule] Rewrite Cooperative-Fetching / Unbound-Block / Reduction-Block (#509)

[MetaSchedule] Rewrite Parallel-Vectorize-Unroll / Verify-GPU / Disallow-Dynamic-Loops (#499)

[Meta Schedule] Add Helper Function & Minor Modification (#512)

[MetaSchedule] Test for Rewrite Parallel-Vectorize-Unroll  (#513)

[Meta Schedule] Feature Extractor & Cost Model (#510)

Blockize & Tensorize (#514)

Layout Rewriting: Suggest-Index-Map (#520)

[MetaSchedule] Parallel-Vectorize-Unroll & Random-Compute-Location (#516)

[Meta Schedule] Per-Store-Feature (#521)

Add traced schedule for blockize & tensorize (#526)

[Meta Schedule] Add XGBoost Model & Random Model (#519)

User-Interface: Tune-TIR (#525)

User-Interface: Tune-TE (#527)

[Minor] More logging on python (#528)

Get CUDA tuning working (#529)

[MetaSchedule] TensorRT BYOC (#518)

[BugFix] LocalBuilder API (#531)

[Meta Schedule] Add Cost Model Update Measure Callback (#530)

[Bugfix] BuilderInput with default params (#532)

[MetaSchedule] Mutator-Tile-Size, Mutate-Parallel, Mutate-Unroll (#534)

[Meta Schedule] Evolutionary Search (#522)

[BugFix] Remove duplicated definition of MakeMultinomialSampler (#535)

[Meta Schedule] Fix some bugs (#537)

Initiate Experiments for CPU Performance Alignment with Ansor (#538)

[Meta Schedule] Tweak experiment scripts (#539)

[Meta Schedule] Initiate experiments on CUDA (#540)

[TIR][Schedule] Buffer transform (#523)

Auto Tensor Core (#524)

Working on Evo Search (#542)

[Meta Schedule] Add Replay Tuning Interface (#543)

Evolutionary Search on CPU (#544)

Misc improvement over the error message (#545)

[TIR][Schedule] Software pipelining (#533)

[Meta Schedule Refactor] fixing unit tests (#547)

[MetaSchedule] Mutator-Compute-Location (#548)

Misc Improvement of Evolutionary Search (#549)

Hotfix for software pipeline (#552)

Misc Improvement (#550)

[Cherry-Pick][TensorIR] Primitive "SetScope" (#9738) (#555)

Rule RFactor (#551)

[MemHammer] Rewrite Rules (#554)

[MetaSchedule] Schedule Rule: Cross-Thread Reduction (#556)

[MetaSchedule] Performance Alignment - NRM and SFM (CUDA) (#559)

[MetaSchedule] Perf Alignment - NRM on CUDA (#560)

[TIR] Reorder the block iters of the blocks generated by RFactor (#561)
Co-authored-by: Siyuan Feng <Hzfengsy@sjtu.edu.cn>
Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com>
Co-authored-by: Hongyi Jin <3231950289@qq.com>
Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com>
Co-authored-by: Junru Shao <junrushao1994@gmail.com>
Co-authored-by: Wuwei Lin <wuwei@apache.org>
Co-authored-by: Sunghyun Park <49998730+sunggg@users.noreply.github.com>
Co-authored-by: Xiyou Zhou <xiyou@octoml.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant