Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Data] Empty stats on StreamSplitDataIterators resulting from streaming_split() #36006

Closed
scottjlee opened this issue Jun 2, 2023 · 3 comments · Fixed by #36217 or #36908
Closed

[Data] Empty stats on StreamSplitDataIterators resulting from streaming_split() #36006

scottjlee opened this issue Jun 2, 2023 · 3 comments · Fixed by #36217 or #36908
Assignees
Labels
bug Something that is supposed to be working; but isn't data Ray Data-related issues P1 Issue that should be fixed within a few weeks

Comments

@scottjlee
Copy link
Contributor

What happened + What you expected to happen

The stats are missing (empty string) from DatasetIterators resulting from streaming_split(), after execution. See below for reproducible example.

Versions / Dependencies

Ray master, Python 3.9

Reproduction script


def pause(x):
    time.sleep(.0001)
    return x

ds = ray.data.range(10000)
ds = ds.map(lambda x: x)
ds = ds.map(pause)

@ray.remote
def consume(p):
    for x in p.iter_batches():
        pass

    print("Finish consume")
    stats = p.stats()
    print(f"Emit DatasetStats: {stats}")
    print(type(stats))

a, b = ds.streaming_split(2)
ray.get([consume.remote(a), consume.remote(b)])

Issue Severity

None

@scottjlee scottjlee added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) P1 Issue that should be fixed within a few weeks data Ray Data-related issues and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Jun 2, 2023
@ericl ericl self-assigned this Jun 3, 2023
@scottjlee scottjlee assigned scottjlee and ericl and unassigned ericl and scottjlee Jun 8, 2023
@ericl
Copy link
Contributor

ericl commented Jun 8, 2023

Will take a look.

@scottjlee
Copy link
Contributor Author

Oh yeah sorry for the notification, I accidentally moved this around in the sprint planning GH project and it reassigned the issue. Thanks

@ericl
Copy link
Contributor

ericl commented Jun 28, 2023

Per user report, this doesn't include iterator stats.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something that is supposed to be working; but isn't data Ray Data-related issues P1 Issue that should be fixed within a few weeks
Projects
None yet
2 participants