Set drop_last to always True #1761

RdoubleA · 2024-10-07T22:16:29Z

Context

What is the purpose of this PR? Is it to

add a new feature
fix a bug
update tests and/or documentation
other (please add here)

Closes #1754. Instead of specifying drop_last with the dataset config (which will break when passed into the dataset builder), just hardcode to True. No configs expose this. We can make it configurable down the line if users request it.

Test plan

CI

pytorch-bot · 2024-10-07T22:16:33Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1761

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 0b9a3ec with merge base a8a64ec ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ebsmothers

Is this the right fix? Personally I find a top-level config field called drop_last to be pretty unclear. Why not just infer drop_last in the existing if/else for ConcatDataset here? https://github.com/pytorch/torchtune/blob/main/recipes/full_finetune_distributed.py#L487-L496

E.g.

if isinstance(cfg_dataset, ListConfig):
  # same stuff as before
  drop_last = any([single_cfg_dataset.get("drop_last", True) for single_cfg_dataset in cfg_dataset])
else:
  # same stuff as before
  drop_last = cfg_dataset.get("drop_last", True)

RdoubleA · 2024-10-07T23:00:37Z

drop_last being in the dataset config does not make sense, as dataset points to a component and all kwargs should be related to that component/builder. drop_last is not in any of our dataset builders. I think having it top level is less confusing in that sense.

Frankly, we should have a more configurable dataloader or a section in the config for dataloader args, as this is becoming more customized.

ebsmothers · 2024-10-07T23:11:38Z

OK yeah it is a dataloader config more than a dataset config so I do see your point there. Mainly I just do not want us to wind up with a hodgepodge of a bunch of random top-level configs, and drop_last is too in the weeds for the majority of users to care about (it is pretty much the exact definition of an edge case).

joecummings

Rather than trying to conceive of edge cases that users might not even care about, can we just default drop_last to True, document it, and then if users want to have this be configurable, we can add a config.

Either way it's a one line change. For most cases, the last batch should be dropped.

RdoubleA · 2024-10-08T14:09:31Z

Agreed

move to top level config

623619f

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 7, 2024

acisseJZhong approved these changes Oct 7, 2024

View reviewed changes

ebsmothers requested changes Oct 7, 2024

View reviewed changes

joecummings reviewed Oct 8, 2024

View reviewed changes

default to true

0b9a3ec

RdoubleA changed the title ~~Move drop_last to top-level config~~ Set drop_last to always True Oct 8, 2024

ebsmothers approved these changes Oct 8, 2024

View reviewed changes

RdoubleA merged commit 27b0fcc into pytorch:main Oct 8, 2024
17 checks passed

RdoubleA deleted the drop_last branch October 8, 2024 20:07

mori360 pushed a commit to mori360/torchtune that referenced this pull request Oct 14, 2024

Set drop_last to always True (pytorch#1761)

a20dd56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set drop_last to always True #1761

Set drop_last to always True #1761

RdoubleA commented Oct 7, 2024 •

edited

Loading

pytorch-bot bot commented Oct 7, 2024 •

edited

Loading

ebsmothers left a comment

RdoubleA commented Oct 7, 2024 •

edited

Loading

ebsmothers commented Oct 7, 2024

joecummings left a comment

RdoubleA commented Oct 8, 2024

Set drop_last to always True #1761

Set drop_last to always True #1761

Conversation

RdoubleA commented Oct 7, 2024 • edited Loading

Context

Test plan

pytorch-bot bot commented Oct 7, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1761

✅ No Failures

ebsmothers left a comment

Choose a reason for hiding this comment

RdoubleA commented Oct 7, 2024 • edited Loading

ebsmothers commented Oct 7, 2024

joecummings left a comment

Choose a reason for hiding this comment

RdoubleA commented Oct 8, 2024

RdoubleA commented Oct 7, 2024 •

edited

Loading

pytorch-bot bot commented Oct 7, 2024 •

edited

Loading

RdoubleA commented Oct 7, 2024 •

edited

Loading