Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add locket as a dependency instead of vendoring #6166

Merged
merged 3 commits into from
Apr 23, 2022

Conversation

zanieb
Copy link
Contributor

@zanieb zanieb commented Apr 20, 2022

This was vendored to get an unmerged improvement to Windows handling of file locks that has now been merged. It is not yet released, so this pull request may not pass CI until a new locket release is made. The maintainer is active and a new release will likely be made soon after a dangling file descriptor fix discussed in mwilliamson/locket.py#14 is confirmed to be working as intended.

See mwilliamson/locket.py#8 (comment) for discussion about vendoring.
Replaces #6122

@zanieb
Copy link
Contributor Author

zanieb commented Apr 20, 2022

I do not think these test failures are related to locket. I've confirmed that the resource warnings reported in #2486 are no longer raised in the Prefect test suite so I believe mwilliamson/locket.py#14 is resolved.

@mwilliamson may want to wait for confirmation from @mrocklin but it seems safe to cut a release with the windows/resource fixes, I can update this pull request to require it afterwards. Thanks!

@mrocklin
Copy link
Member

yeah, the test failure that I'm seeing here is unrelated. I've seen this a bit recently. cc'ing @fjetter who may be able to handle it separately

    @gen_cluster(client=True)
    async def test_flight_cancelled_error(c, s, a, b):
        """One worker with one thread. We provoke an flight->cancelled transition
        and let the task err."""
        lock = asyncio.Lock()
        await lock.acquire()
    
        async def wait_and_raise(*args, **kwargs):
            async with lock:
                raise RuntimeError()
    
        with mock.patch.object(
            distributed.worker,
            "get_data_from_worker",
            side_effect=wait_and_raise,
        ):
            fut1 = c.submit(inc, 1, workers=[a.address], allow_other_workers=True)
            fut2 = c.submit(inc, fut1, workers=[b.address])
    
            await wait_for_state(fut1.key, "flight", b)
            fut2.release()
            fut1.release()
            await wait_for_state(fut1.key, "cancelled", b)
    
        lock.release()
        # At this point we do not fetch the result of the future since the future
        # itself would raise a cancelled exception. At this point we're concerned
        # about the worker. The task should transition over error to be eventually
        # forgotten since we no longer hold a ref.
        while fut1.key in b.tasks:
            await asyncio.sleep(0.01)
    
        # Everything should still be executing as usual after this
>       assert await c.submit(sum, c.map(inc, range(10))) == sum(map(inc, range(10)))

distributed\tests\test_cancelled_state.py:247: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
distributed\client.py:294: in _result
    raise exc.with_traceback(tb)
C:\Miniconda3\envs\dask-distributed\lib\unittest\mock.py:2164: in _execute_mock_call
    result = await effect(*args, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   raise RuntimeError()
E   RuntimeError

@mrocklin
Copy link
Member

so I believe mwilliamson/locket.py#14 is resolved

Woo!

@mwilliamson may want to wait for confirmation from @mrocklin but it seems safe to cut a release with the windows/resource fixes, I can update this pull request to require it afterwards. Thanks

This is all good to go from my end.

@github-actions
Copy link
Contributor

github-actions bot commented Apr 20, 2022

Unit Test Results

       16 files  ±0         16 suites  ±0   7h 29m 53s ⏱️ - 2m 44s
  2 727 tests ±0    2 644 ✔️  - 1       81 💤 +1  2 ±0 
21 701 runs  ±0  20 660 ✔️  - 3  1 039 💤 +3  2 ±0 

For more details on these failures, see this check.

Results for commit 0d425ce. ± Comparison against base commit 370f456.

♻️ This comment has been updated with latest results.

@zanieb zanieb marked this pull request as ready for review April 20, 2022 22:47
@mrocklin
Copy link
Member

It looks like CI is sad because locket isn't yet up. Once it's up I'm happy to press the green button. Let me know if I should rerun CI.

@mrocklin
Copy link
Member

I'm guessing that you're also aware of errors like these, and that they'll be resolved when the new version comes out?

https://github.com/dask/distributed/runs/6104354678?check_suite_focus=true

@zanieb
Copy link
Contributor Author

zanieb commented Apr 21, 2022

Locket looks up https://pypi.org/project/locket/ but CI isn't detecting it? I'm presuming I've added the package incorrectly. Are there contributing guides for adding a dependency or another pull request I could refer to? There are a lot of files listing dependencies haha.

@mrocklin
Copy link
Member

See continuous_integration/environment*.yaml

@mrocklin
Copy link
Member

It looks like the conda build CI test failure is mostly saying "this isn't up on conda yet, and so I'm hesitant to let you merge".

it looks like the bots are maybe handling this though: conda-forge/locket-feedstock#9

@mrocklin
Copy link
Member

Hrm, looks like the recipe there needs to be updated a bit. Do you want to take this on @madkinsz , or bug the locket maintainers? If not I can ask someone on our end to poke things there. Learning about conda-forge is a good activity though, if you haven't engaged already.

mrocklin pushed a commit that referenced this pull request Apr 22, 2022
Apparently some of these tests are not rock solid. 

E.g. in #6166

https://github.com/dask/distributed/runs/6101034638?check_suite_focus=true

Based on the traceback, it looks like the mock.patch was not properly removed after leaving the context manager. To rule this out and see if there is something else ongoing, I removed the mocks in favour of some event/lock/subclass magic.

I realise that these tests are not great and are extremely difficult to maintain. We can have a conversation about removing some of them but I would like to not have this conversation right now. I consider this new approach easier to reason about and at least slightly easier to maintain. 

As part of #5896 I very much hope that we can remove some, if not all of these tests and replace them with some much easier unit tests. I would like to have the conversation about removing code once we are in a position to replace them with something simpler. This is not in scope for this PR
@zanieb
Copy link
Contributor Author

zanieb commented Apr 22, 2022

Ah I just presumed it didn't have a conda release. I'll take a look at the feedstock, I've got a bit of experience maintaining the Prefect one but more can't hurt. Once that's released I'll need to update the CI environments here to pull from conda instead of pip.

@mrocklin
Copy link
Member

mrocklin commented Apr 22, 2022 via email

@zanieb
Copy link
Contributor Author

zanieb commented Apr 23, 2022

@mrocklin should be available now, perhaps a rerun?

@mrocklin
Copy link
Member

Running

@mrocklin mrocklin merged commit 28fd2f7 into dask:main Apr 23, 2022
@mrocklin
Copy link
Member

This is in. Thanks @madkinsz !

@jrbourbeau @jsignell whoever handles the next release should probably add locket>=1.0 to the distributed requirements in the conda package. conda-forge/distributed-feedstock#204

fjetter added a commit to fjetter/distributed that referenced this pull request Jun 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Resource Warnings with LocalCluster
2 participants