Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes for synchronous mode #1803

Merged
merged 6 commits into from
Jul 5, 2019
Merged

Fixes for synchronous mode #1803

merged 6 commits into from
Jul 5, 2019

Conversation

nsubiron
Copy link
Collaborator

@nsubiron nsubiron commented Jun 25, 2019

TLDR: I made tick and apply_settings synchronize automatically with the server so users don't need to manually wait for tick. Unfortunately with this, old recipes using wait_for_tick will fail.


Description

Requires #1802.

Currently, we recommend using synchronous mode by calling "tick" and "wait_for_tick" on each iteration

while True:
    world.tick()           # Initialize a new "tick" in the simulator.
    world.wait_for_tick()  # Wait until we listen to the new tick.

    # ...

but there is a race condition in here; although unlikely, it can happen that the tick arrives before we start waiting for it and we end up having a dead-lock. It's a difficult problem to solve cause we meed to synchronize two servers sending data asynchronously (rpc and streaming). The "tick" method sends a cue to the simulator via RPC, but "wait_for_tick" listens to the tick event received each update via streaming. See also #1795.

However, we can synchronize the tick method by returning the id of the newly started frame (this way we make sure the tick was applied upon function return, and also we guarantee the id of the frame we're expecting). For convenience, apply_setting can return too the id of the frame when the settings took effect, with this, we know for sure at which frame the synchronous mode started.

EDIT: I also made tick and apply_settings block until the new tick is received, so no need to manually loop waiting for the state to be updated.

I have written a Python example using these two changes to synchronize the output of several sensors. I think we can add something similar to the API, maybe in C++, to make the synchronous mode easier to use

EDIT: Now this example is in synchronous_mode.py.

try:
    import queue
except ImportError:
    import Queue as queue


class CarlaSyncMode(object):
    def __init__(self, world, *sensors):
        self.world = world
        self.sensors = sensors
        self.frame = None
        self._queues = []

    def __enter__(self):
        settings = self.world.get_settings()
        settings.synchronous_mode = True
        self.frame = self.world.apply_settings(settings)

        def make_queue(register_event):
            q = queue.Queue()
            register_event(q.put)
            self._queues.append(q)

        make_queue(self.world.on_tick)
        for sensor in self.sensors:
            make_queue(sensor.listen)
        return self

    def tick(self, timeout):
        self.frame = self.world.tick()
        data = [self._retrieve_data(q, timeout) for q in self._queues]
        assert all(x.frame == self.frame for x in data)
        return data

    def __exit__(self, type, value, traceback):
        settings = self.world.get_settings()
        settings.synchronous_mode = False
        self.world.apply_settings(settings)

    def _retrieve_data(self, queue, timeout):
        while True:
            data = queue.get(timeout=timeout)
            if data.frame == self.frame:
                return data

An example script with this context manager

client = carla.Client('localhost', 2000)
client.set_timeout(2.0)

world = client.get_world()

sensors = []

try:
    sensors.append(world.spawn_actor(
        world.get_blueprint_library().find('sensor.camera.rgb'),
        carla.Transform()))
    sensors.append(world.spawn_actor(
        world.get_blueprint_library().find('sensor.camera.depth'),
        carla.Transform()))
    sensors.append(world.spawn_actor(
        world.get_blueprint_library().find('sensor.camera.semantic_segmentation'),
        carla.Transform()))

    with CarlaSyncMode(world, *sensors) as sync_mode:
        while True:
            data = sync_mode.tick(timeout=1.0)
            snapshot = data[0]
            for n, item in enumerate(data[1:]):
                item.save_to_disk('_out/%01d_%08d' % (n, sync_mode.frame))

finally:
    for sensor in sensors:
        sensor.destroy()

This change is Reviewable

@kraken24
Copy link

@nsubiron I use both world.tick() and world.wait_for_tick() function when running in synchronous mode at 10 fps. But carla stops suddenly and is stuck at a particular instance and I have to restart the whole simulation. I want to run 2000 episodes for reinforcement learning but it stops after 350 episodes at the maximum

do i have to change some settings or initialize carla at different fps rate?

@JunningHuang
Copy link

JunningHuang commented Jun 30, 2019

Hi, @nsubiron
I've used your code provided but it couldn't work. The error is:

RuntimeError: rpc::rpc_error during call in function version

The client is fail to get the world class.

client.get_world()

@nsubiron nsubiron marked this pull request as ready for review July 1, 2019 18:32
@nsubiron nsubiron requested review from fpasch and marcgpuig July 1, 2019 18:33
@nsubiron nsubiron added this to the 0.9.6 milestone Jul 1, 2019
Copy link

@fpasch fpasch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 14 of 14 files at r1.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @marcgpuig)

Copy link
Contributor

@marcgpuig marcgpuig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 14 of 14 files at r1.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved

@nsubiron nsubiron merged commit e4dd26a into master Jul 5, 2019
@delete-merged-branch delete-merged-branch bot deleted the nsubiron/sync branch July 5, 2019 22:15
@napratin
Copy link

@nsubiron Thanks for making this change - I agree with you that the previous tick()...wait_for_tick() recipe was prone to failure. But we're noticing that in some situations, both client and server are entering an indefinite waiting state. It's particularly happening when we try to run in synchronous mode and no-rendering mode at a very high frame rate (300+ fps). To reproduce, try running a loop that just keeps calling world.tick() (this happens randomly, sometimes within a few thousand frames and sometimes a few hundred thousand frames; so you may have to wait and try a few times).

We haven't been able to pin-point the issue but we're suspecting a packet drop / packet out-of-order or other concurrency problem. Currently, as far as I understand, the tick_cue call is sent from the client to the server via RPC (which immediately returns the next expected frame number), but the actual new frame info / world snapshot is sent back over the streaming connection asynchronously (and the client waits indefinitely till this is received, consequently it never sends the next tick_cue call).

A hacky solution we've found is to replay the last world snapshot from the server if no new tick_cue call has been received in a while (see: #2038). But this ends up flooding the network with repeated world snapshots, sometimes when it is not needed.

Instead, if we could have the entire tick call execute completely on the server side and directly return the world snapshot (not just the expected frame number), that would fantastic! (and truly a synchronous tick)

Does that make sense?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants