Adjust QWEN2 VL Loss `rtol` #412

austin362667 · 2024-11-28T09:44:07Z

Summary

Closes #411

The convergence tests all passed in the latest commit (PR#407). Its CI worked fine: https://github.com/linkedin/Liger-Kernel/actions/runs/11983838113/job/33413899589?pr=407#step:5:984
Without any code changes inside Liger, the convergence tests now failed in QWEN2VL cases, referring to QWEN2 VL doesn't converge #411.

The root cause of this issue lies in HuggingFace's release of new transformers, which introduced modifications to QWEN2VL. Since the discrepancy doesn't originate from a bug in the Liger QWEN2VL implementation, it's acceptable to slightly relax the relative tolerances (rtols) to accommodate these changes.

By the way, it seems there might be some related context:

Liger-Kernel/test/convergence/test_mini_models.py

Line 530 in 0137757

    
                   pytest.param(  # qwen2_vl requires slightly larger tolerances to pass this test after bug fix to qwen2_vl in transformers v4.47.0

Testing Done

Yes.

Full log below,

test/convergence/test_mini_models.py::test_mini_model[mini_llama3-32-0.0001-dtype0-1e-08-2e-05-0.0001-1e-05-0.005-1e-05] PASSED [  5%]
test/convergence/test_mini_models.py::test_mini_model[mini_llama3-32-0.0001-dtype1-0.001-0.01-0.1-0.01-0.01-0.01] PASSED [ 11%]
test/convergence/test_mini_models.py::test_mini_model[mini_mllama-32-0.0001-dtype2-1e-08-1e-05-0.005-1e-05-0.005-1e-05] PASSED [ 17%]
test/convergence/test_mini_models.py::test_mini_model[mini_mllama-32-0.0001-dtype3-0.001-0.01-0.1-0.01-0.01-0.01] PASSED [ 23%]
test/convergence/test_mini_models.py::test_mini_model[mini_qwen2-32-0.0001-dtype4-1e-08-1e-05-0.005-1e-05-0.005-1e-05] PASSED [ 29%]
test/convergence/test_mini_models.py::test_mini_model[mini_qwen2-32-0.0001-dtype5-0.001-0.01-0.1-0.01-0.01-0.01] PASSED [ 35%]
test/convergence/test_mini_models.py::test_mini_model[mini_qwen2_vl-32-0.0001-dtype6-8e-06-0.04-0.005-1e-05-0.005-1e-05] PASSED [ 41%]
test/convergence/test_mini_models.py::test_mini_model[mini_qwen2_vl-32-0.0001-dtype7-0.001-0.05-0.1-0.01-0.01-0.01] PASSED [ 47%]
test/convergence/test_mini_models.py::test_mini_model[mini_phi3-32-0.0001-dtype8-1e-08-1e-05-0.005-1e-05-0.005-1e-05] PASSED [ 52%]
test/convergence/test_mini_models.py::test_mini_model[mini_phi3-32-0.0001-dtype9-0.001-0.01-0.1-0.01-0.01-0.01] PASSED [ 58%]
test/convergence/test_mini_models.py::test_mini_model[mini_mistral-32-0.0001-dtype10-1e-08-1e-05-0.005-1e-05-0.005-1e-05] PASSED [ 64%]
test/convergence/test_mini_models.py::test_mini_model[mini_mistral-32-0.0001-dtype11-0.001-0.01-0.1-0.01-0.01-0.01] PASSED [ 70%]
test/convergence/test_mini_models.py::test_mini_model[mini_gemma1-32-0.0001-dtype12-1e-08-0.0001-0.005-1e-05-0.005-1e-05] PASSED [ 76%]
test/convergence/test_mini_models.py::test_mini_model[mini_gemma1-32-0.0001-dtype13-0.001-0.01-0.1-0.01-0.01-0.01] PASSED [ 82%]
test/convergence/test_mini_models.py::test_mini_model[mini_gemma1.1-32-0.0001-dtype14-1e-08-0.0001-0.005-1e-05-0.005-1e-05] PASSED [ 88%]
test/convergence/test_mini_models.py::test_mini_model[mini_gemma1.1-32-0.0001-dtype15-0.001-0.01-0.1-0.01-0.01-0.01] PASSED [ 94%]
test/convergence/test_mini_models.py::test_mini_model[mini_gemma2-32-0.0001-dtype16-1e-08-0.0001-0.005-1e-05-0.005-1e-05] PASSED [100%]

================== 17 passed, 1 warning in 163.58s (0:02:43) ===================

Hardware Type: A10G
run make test to ensure correctness
run make checkstyle to ensure code style
run make test-convergence to ensure convergence

Signed-off-by: Austin Liu <austin362667@gmail.com>

Adjust qwen2vl loss rtol

26ac1d2

Signed-off-by: Austin Liu <austin362667@gmail.com>

ByronHsu approved these changes Nov 30, 2024

View reviewed changes

ByronHsu merged commit 7e0f459 into linkedin:main Nov 30, 2024
1 of 3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adjust QWEN2 VL Loss `rtol` #412

Adjust QWEN2 VL Loss `rtol` #412

austin362667 commented Nov 28, 2024 •

edited

Loading

Adjust QWEN2 VL Loss rtol #412

Adjust QWEN2 VL Loss rtol #412

Conversation

austin362667 commented Nov 28, 2024 • edited Loading

Summary

Testing Done

Adjust QWEN2 VL Loss `rtol` #412

Adjust QWEN2 VL Loss `rtol` #412

austin362667 commented Nov 28, 2024 •

edited

Loading