Fixes for synchronous mode #1803

nsubiron · 2019-06-25T15:21:10Z

TLDR: I made tick and apply_settings synchronize automatically with the server so users don't need to manually wait for tick. Unfortunately with this, old recipes using wait_for_tick will fail.

Description

~~Requires #1802.~~

Currently, we recommend using synchronous mode by calling "tick" and "wait_for_tick" on each iteration

while True:
    world.tick()           # Initialize a new "tick" in the simulator.
    world.wait_for_tick()  # Wait until we listen to the new tick.

    # ...

but there is a race condition in here; although unlikely, it can happen that the tick arrives before we start waiting for it and we end up having a dead-lock. It's a difficult problem to solve cause we meed to synchronize two servers sending data asynchronously (rpc and streaming). The "tick" method sends a cue to the simulator via RPC, but "wait_for_tick" listens to the tick event received each update via streaming. See also #1795.

However, we can synchronize the tick method by returning the id of the newly started frame (this way we make sure the tick was applied upon function return, and also we guarantee the id of the frame we're expecting). For convenience, apply_setting can return too the id of the frame when the settings took effect, with this, we know for sure at which frame the synchronous mode started.

EDIT: I also made tick and apply_settings block until the new tick is received, so no need to manually loop waiting for the state to be updated.

I have written a Python example using these two changes to synchronize the output of several sensors. I think we can add something similar to the API, maybe in C++, to make the synchronous mode easier to use

EDIT: Now this example is in synchronous_mode.py.

try:
    import queue
except ImportError:
    import Queue as queue


class CarlaSyncMode(object):
    def __init__(self, world, *sensors):
        self.world = world
        self.sensors = sensors
        self.frame = None
        self._queues = []

    def __enter__(self):
        settings = self.world.get_settings()
        settings.synchronous_mode = True
        self.frame = self.world.apply_settings(settings)

        def make_queue(register_event):
            q = queue.Queue()
            register_event(q.put)
            self._queues.append(q)

        make_queue(self.world.on_tick)
        for sensor in self.sensors:
            make_queue(sensor.listen)
        return self

    def tick(self, timeout):
        self.frame = self.world.tick()
        data = [self._retrieve_data(q, timeout) for q in self._queues]
        assert all(x.frame == self.frame for x in data)
        return data

    def __exit__(self, type, value, traceback):
        settings = self.world.get_settings()
        settings.synchronous_mode = False
        self.world.apply_settings(settings)

    def _retrieve_data(self, queue, timeout):
        while True:
            data = queue.get(timeout=timeout)
            if data.frame == self.frame:
                return data

An example script with this context manager

client = carla.Client('localhost', 2000)
client.set_timeout(2.0)

world = client.get_world()

sensors = []

try:
    sensors.append(world.spawn_actor(
        world.get_blueprint_library().find('sensor.camera.rgb'),
        carla.Transform()))
    sensors.append(world.spawn_actor(
        world.get_blueprint_library().find('sensor.camera.depth'),
        carla.Transform()))
    sensors.append(world.spawn_actor(
        world.get_blueprint_library().find('sensor.camera.semantic_segmentation'),
        carla.Transform()))

    with CarlaSyncMode(world, *sensors) as sync_mode:
        while True:
            data = sync_mode.tick(timeout=1.0)
            snapshot = data[0]
            for n, item in enumerate(data[1:]):
                item.save_to_disk('_out/%01d_%08d' % (n, sync_mode.frame))

finally:
    for sensor in sensors:
        sensor.destroy()

This change is

kraken24 · 2019-06-28T12:26:47Z

@nsubiron I use both world.tick() and world.wait_for_tick() function when running in synchronous mode at 10 fps. But carla stops suddenly and is stuck at a particular instance and I have to restart the whole simulation. I want to run 2000 episodes for reinforcement learning but it stops after 350 episodes at the maximum

do i have to change some settings or initialize carla at different fps rate?

JunningHuang · 2019-06-30T10:16:22Z

Hi, @nsubiron
I've used your code provided but it couldn't work. The error is:

RuntimeError: rpc::rpc_error during call in function version

The client is fail to get the world class.

client.get_world()

… and return the frame id when the changes took effect

fpasch

Reviewed 14 of 14 files at r1.
Reviewable status: complete! all files reviewed, all discussions resolved (waiting on @marcgpuig)

marcgpuig

Reviewed 14 of 14 files at r1.
Reviewable status: complete! all files reviewed, all discussions resolved

napratin · 2019-08-30T02:23:34Z

@nsubiron Thanks for making this change - I agree with you that the previous tick()...wait_for_tick() recipe was prone to failure. But we're noticing that in some situations, both client and server are entering an indefinite waiting state. It's particularly happening when we try to run in synchronous mode and no-rendering mode at a very high frame rate (300+ fps). To reproduce, try running a loop that just keeps calling world.tick() (this happens randomly, sometimes within a few thousand frames and sometimes a few hundred thousand frames; so you may have to wait and try a few times).

We haven't been able to pin-point the issue but we're suspecting a packet drop / packet out-of-order or other concurrency problem. Currently, as far as I understand, the tick_cue call is sent from the client to the server via RPC (which immediately returns the next expected frame number), but the actual new frame info / world snapshot is sent back over the streaming connection asynchronously (and the client waits indefinitely till this is received, consequently it never sends the next tick_cue call).

A hacky solution we've found is to replay the last world snapshot from the server if no new tick_cue call has been received in a while (see: #2038). But this ends up flooding the network with repeated world snapshots, sometimes when it is not needed.

Instead, if we could have the entire tick call execute completely on the server side and directly return the world snapshot (not just the expected frame number), that would fantastic! (and truly a synchronous tick)

Does that make sense?

nsubiron self-assigned this Jun 25, 2019

nsubiron mentioned this pull request Jun 26, 2019

Tick function for Reinforcement Learning #1809

Closed

nsubiron force-pushed the nsubiron/sync branch from b424cdb to 6208a5a Compare July 1, 2019 12:25

nsubiron added 2 commits July 1, 2019 20:19

Make 'tick' and 'apply_settings' wait until the new frame is received…

2c42ce5

… and return the frame id when the changes took effect

Update documentation on synchronous mode

3e9410b

nsubiron force-pushed the nsubiron/sync branch from 6208a5a to 73d49da Compare July 1, 2019 18:32

nsubiron marked this pull request as ready for review July 1, 2019 18:32

nsubiron requested review from fpasch and marcgpuig July 1, 2019 18:33

nsubiron added this to the 0.9.6 milestone Jul 1, 2019

nsubiron force-pushed the nsubiron/sync branch from 73d49da to 0a149f7 Compare July 1, 2019 18:41

Improve synchronous mode example

f9fcdc1

nsubiron force-pushed the nsubiron/sync branch from 0a149f7 to f9fcdc1 Compare July 1, 2019 18:48

nsubiron added 2 commits July 2, 2019 10:58

Do not crash Python script when users cancel

6cb7c8c

Fix assert check in wrong place

c7e9c39

fpasch approved these changes Jul 4, 2019

View reviewed changes

marcgpuig approved these changes Jul 5, 2019

View reviewed changes

Merge branch 'master' into nsubiron/sync

e348106

nsubiron merged commit e4dd26a into master Jul 5, 2019

delete-merged-branch bot deleted the nsubiron/sync branch July 5, 2019 22:15

tin1254 mentioned this pull request Nov 2, 2019

Test Implementation of Reading / Writing / Spawning bark-simulator/carla-interface#7

Closed

ZhangYouRong mentioned this pull request Mar 12, 2020

Synchronous_mode gets stuck in RL training #2581

Closed

Vaan5 mentioned this pull request May 4, 2020

Synchronous mode - tick hangs #2809

Closed

Morphlng mentioned this pull request Sep 13, 2022

Add support for Windows platform and some bug fixes praveen-palanisamy/macad-gym#65

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes for synchronous mode #1803

Fixes for synchronous mode #1803

nsubiron commented Jun 25, 2019 •

edited

Loading

kraken24 commented Jun 28, 2019

JunningHuang commented Jun 30, 2019 •

edited

Loading

fpasch left a comment

marcgpuig left a comment

napratin commented Aug 30, 2019

Fixes for synchronous mode #1803

Fixes for synchronous mode #1803

Conversation

nsubiron commented Jun 25, 2019 • edited Loading

Description

kraken24 commented Jun 28, 2019

JunningHuang commented Jun 30, 2019 • edited Loading

fpasch left a comment

Choose a reason for hiding this comment

marcgpuig left a comment

Choose a reason for hiding this comment

napratin commented Aug 30, 2019

nsubiron commented Jun 25, 2019 •

edited

Loading

JunningHuang commented Jun 30, 2019 •

edited

Loading