Clarify Function Names for Logging #12

Satrat · 2024-07-02T14:36:01Z

SUMMARY:

Each module compressor(sparseGPT, WANDA, GPTQ) implemented a fasterprune method, but this is misleading for GPTQ which does quantization only. Renaming fasterprune->compress
Rename LayerCompressor.prune() to LayerCompressor.compress_module() for the same reason

TEST PLAN:
Manually tested a w4a16 quantization example to confirm the logging change:

Before

2024-07-02T14:15:00.233030+0000 | prune | INFO - Compressing model.layers.11.self_attn.q_proj.model.layers.11.self_attn.q_proj...
2024-07-02T14:15:01.061103+0000 | fasterprune | INFO - time 0.83
2024-07-02T14:15:01.061573+0000 | fasterprune | INFO - error 6155.81

After

2024-07-02T14:31:16.224969+0000 | compress_module | INFO - Compressing model.layers.20.model.layers.20.self_attn.q_proj...
2024-07-02T14:31:17.100250+0000 | compress | INFO - time 0.87
2024-07-02T14:31:17.100596+0000 | compress | INFO - error 4623.34

rename prune/fasterprune

69a554d

bfineran approved these changes Jul 2, 2024

View reviewed changes

Satrat merged commit 39cbf17 into main Jul 2, 2024
8 of 12 checks passed

markmc pushed a commit to markmc/llm-compressor that referenced this pull request Nov 13, 2024

make sure scale/zp on correct device (vllm-project#12)

514e4db

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarify Function Names for Logging #12

Clarify Function Names for Logging #12

Satrat commented Jul 2, 2024

Clarify Function Names for Logging #12

Clarify Function Names for Logging #12

Conversation

Satrat commented Jul 2, 2024