Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

"wrong type writer" error if topics with same token have different types #234

Closed
dhood opened this issue Jun 29, 2017 · 11 comments
Closed
Assignees
Labels
bug Something isn't working ready Work is about to start (Kanban column)

Comments

@dhood
Copy link
Member

dhood commented Jun 29, 2017

I have a node which subscribes to topic data and publishes reception rate data on topic reception_rate/data. Those topics have different types.

I get a TDataWriter::narrow:ERROR: Bad parameter: wrong type writer error when I try to publish, but only if it's publishing to */data. If it publishes to reception_rate/data_ there's no error. I suspect that the typesupport is being mixed up with the other topic's.

Here's how to reproduce it with two publishers in the same node:

import sys
from time import sleep

import rclpy

from std_msgs.msg import Int64, String


def main(args=None):
    if args is None:
        args = sys.argv

    rclpy.init(args=args)

    node = rclpy.create_node('talker')

    chatter_pub = node.create_publisher(String, 'chatter')
    chatter_pub2 = node.create_publisher(Int64, 'test/chatter')

    msg = String()
    msg2 = Int64()

    i = 1
    while True:
        msg.data = 'Hello World: {0}'.format(i)
        msg2.data = i
        i += 1
        print('Publishing: "{0}"'.format(msg.data))
        chatter_pub.publish(msg)
        chatter_pub2.publish(msg2)
        sleep(1)


if __name__ == '__main__':
    main()

Behaviour with fastrtps (matches expected behaviour):

$ RMW_IMPLEMENTATION=rmw_fastrtps_cpp ros2 run demo_nodes_py talker
Publishing: "Hello World: 1"
Publishing: "Hello World: 2"
Publishing: "Hello World: 3"
Publishing: "Hello World: 4"

$ ros2 topic list --show-types
/chatter [std_msgs/String]
/test/chatter [std_msgs/Int64]

Behaviour with connext (fresh daemon):

$ RMW_IMPLEMENTATION=rmw_connext_cpp ros2 run demo_nodes_py talker
RTI Data Distribution Service EVAL License issued to OSRF dhood@osrfoundation.org For non-production use only.
Expires on 16-Jul-2017 See www.rti.com for more information.
Publishing: "Hello World: 1"
TDataWriter::narrow:ERROR: Bad parameter: wrong type writer

$ ros2 topic list --show-types
/chatter [std_msgs/String]
/test/chatter [std_msgs/String]

That last output showing /test/chatter having type std_msgs/String is suspicious.

@dhood dhood added the bug Something isn't working label Jun 29, 2017
dhood added a commit to ros2/demos that referenced this issue Jun 29, 2017
The different types cause connext to get mixed up, see ros2/rmw_connext#234
dhood added a commit to ros2/demos that referenced this issue Jun 29, 2017
* Topics returned by get_topic_names_and_types have leading forward slashes now

* get_topic_names_and_types returns a list of types now

* Fix a bug in the reception of best effort messages

* Kinder failing without matplotlib

* Add missing __init__.py files

* Print warning when topics which appropriate names are ignored because of types

* No issues with empty waitsets anymore

(There's always a signal handler)

* Add linters

* Install the whole package rather than using py_modules

* Don't publish on topic with the same name token

The different types cause connext to get mixed up, see ros2/rmw_connext#234

* Flake8 import order wants different things on different machines?

* Add launch as exec_depend
@dirk-thomas
Copy link
Member

I can reproduce the problem. It looks like this is a regression of #227. The topic names are being split into the basename and the partition. Then only the basename is being used to lookup the topic type (

topic_description = participant->lookup_topicdescription(topic_str);
).

I will try to come up with a fix for this. I think it is significant enough to be made for the beta 2 even though we are in code freeze.

@dirk-thomas dirk-thomas self-assigned this Jun 29, 2017
@dirk-thomas
Copy link
Member

dirk-thomas commented Jun 29, 2017

Reading more about partitions and the API in Connext it looks to me that the topic description (which is identifying the type) is being looked up by DDSDomainParticipant::lookup_topicdescription. That function only takes the topic name but not the partition as an arguments. Might it be the case that the same topic in different partitions must have the same type (at least in Connext)?

If that is the case I fear that the concept of partitions can't be used to implement the concept of ROS namespace??? (I hope I am missing something here.) @Karsten1987 @wjwwood Maybe either of you has some more insight on this since you were most involved in this part.

@dhood
Copy link
Member Author

dhood commented Jun 29, 2017

I haven't read the details yet, but if you're correct that the types must match for topics in different partitions: there might still be a way we can use partitions to implement namespaces, if we can somehow "trick" connext into using FQNs for the topic type lookup. It never complained in beta1 when doing psuedo-namespacing by putting "test/chatter" as the topic name before the namespace implementation was introduced (that is, there's no restriction on using slashes in topic names in connext). This isn't a concrete suggestion at the moment, just something to get some ideas rolling while I have time to read up.

@Karsten1987
Copy link
Contributor

So from what I understood the actual problem comes from create_topic which only matches the topic with the type and returns NULL when they don't match. Now, the problem I haven't figured out a solution yet is why this match is done on participant level and not on publisher. That's the reason why the namespaces just work fine between two different applications/nodes, but not within the same node. I may have misunderstood the concept of partitions, but IMO you should be able to create multiple instances of the same topic on different partitions.

I'll have to look at bit more into what options we have.

@dhood
Copy link
Member Author

dhood commented Jun 29, 2017

"Participant level" seems to be the key here: you can run the following with no complaints from connext, since they're different participants:

RMW_IMPLEMENTATION=rmw_connext_cpp ros2 topic pub chatter std_msgs/String
RMW_IMPLEMENTATION=rmw_connext_cpp ros2 topic pub chatter std_msgs/Int32

Then the two [topic, type] pairs work fine independently.

RMW_IMPLEMENTATION=rmw_ext_cpp ros2 topic echo chatter std_msgs/Int32
RMW_IMPLEMENTATION=rmw_connext_cpp ros2 topic echo chatter std_msgs/String

while in a single process/node/participant like the modified talker demo, namespaces or not, there is the "wrong type writer" issue if you try to publish different types to the same topic.

I don't mean to say that you should be allowed to publish different types to the same topic. My point is that we can probably get around our current issue in the same way: by using different participants.

@dhood
Copy link
Member Author

dhood commented Jun 29, 2017

Looks like it's recommended to use as few participants as possible, so my previous suggestion is not likely to be an ideal solution for the namespacing issue: https://community.rti.com/best-practices/create-few-domainparticipants-possible

@dhood
Copy link
Member Author

dhood commented Jun 29, 2017

From reading RTI responses on forum (1, 2, 3) I agree that it looks like connext is not supposed to support different types on the same topic from within a participant, and namespaces aren't taken into consideration with that.

I did find a workaround though that's less dramatic than creating separate participants:

We can get around the '1 type per topic' restriction if we "just" embed the name of the topic type into the ROS -> DDS topic name mangling. Not pretty, but it gets hidden below the RMW layer for the most part.

For interoperability we have to apply the same thing across all DDS rmw implementations, but we already have mangling in our topics names anyway.

With no modification to the above example (but changes to rmw_connext_cpp + rmw_fastrtps_cpp to embed the topic type into the topic name), it now works fine:

$ RMW_IMPLEMENTATION=rmw_connext_cpp ros2 run demo_nodes_py talker
RTI Data Distribution Service EVAL License issued to OSRF dhood@osrfoundation.org For non-production use only.
Expires on 16-Jul-2017 See www.rti.com for more information.
Publishing: "Hello World: 1"
Publishing: "Hello World: 2"
Publishing: "Hello World: 3"
RMW_IMPLEMENTATION=rmw_connext_cpp ros2 topic echo chatter std_msgs/String
RTI Data Distribution Service EVAL License issued to OSRF dhood@osrfoundation.org For non-production use only.
Expires on 16-Jul-2017 See www.rti.com for more information.
data: 'Hello World: 11'

data: 'Hello World: 12'
$ RMW_IMPLEMENTATION=rmw_fastrtps_cpp ros2 topic echo test/chatter std_msgs/Int64
data: 37

data: 38
$ RMW_IMPLEMENTATION=rmw_fastrtps_cpp ros2 topic list --show-types
/chatter_std_msgs::msg::dds_::String_ [std_msgs/String]
/test/chatter_std_msgs::msg::dds_::Int64_ [std_msgs/Int64]

That's just a hacky prototype. We'd have to use a version of the type names that doesn't have characters not allowed by the spec (not that connext or fastrtps seem to mind), and this will limit users' topic name length, but otherwise I think the only other thing that has to change is the get_topic_names_and_types de-mangling. Since we can infer the part of the topic name to remove from the topic type, we don't have to limit what topic name options/characters are allowed for users, but 'wasting' characters when we already have a length limit may be an issue.

It's not elegant, but this is one possible workaround. By keeping partitions for namespaces we save the implementation of aliasing and dynamically change namespaces easily, so maybe it's worth the trade-off.

@Karsten1987
Copy link
Contributor

That will certainly work, but at the same time that could also imply that we go back to pack the FQDN "/test/chatter" completely into the topic field with "test__chatter" by applying underscores. If we can't get around modifying the topic string in any case, we may consider keeping it somewhat intuitive so that external DDS components can make sense out of it.

argh, that's a bummer.

Now as for beta2, we could agree on leaving the system as is, stating that creating multiple publisher/subscriber/etc. within the same node, the same topic and using connext will lead to undefined behavior. I do believe it's exactly this combination which happens to fail. Other methods such as loading multiple nodes via composition will work as they have their own participant.

@dirk-thomas dirk-thomas assigned wjwwood and unassigned dirk-thomas Jul 17, 2017
@mikaelarguedas mikaelarguedas added the ready Work is about to start (Kanban column) label Jul 19, 2017
@madMAx43v3r
Copy link

madMAx43v3r commented Feb 12, 2018

Unfortunately choosing DDS was a huge mistake, you chose ~1000 pages of over-engineered specification that was never fixed and which took ~10 years just to implement version 1. As such you should have expected endless issues like this one.
The success of ROS1 is due to its simplicity and efficiency which you both threw out the window with ROS2. Just ask yourself this, will ROS2 ever be able to transmit ~100 MB/s of data without burning up the entire CPU and having to mess around with hundreds of QoS settings just to get it to work in the first place?

@dirk-thomas
Copy link
Member

Unfortunately choosing DDS was a huge mistake, you chose ~1000 pages of over-engineered specification that was never fixed and which took ~10 years just to implement version 1.

Thank you for sharing your opinion. From the feedback we have received so far it seems that many people are disagreeing with that and think choosing DDS as the default option for the RMW implementation makes a lot of sense.

As such you should have expected endless issues like this one.

The problem described in this ticket is actually not really a problem in the DDS spec but due to the way ROS 2 chose to map namespaces to partitions. The next release of ROS 2 will actually use a different approach which resolves this problem.

Just ask yourself this, will ROS2 ever be able to transmit ~100 MB/s of data without burning up the entire CPU and having to mess around with hundreds of QoS settings just to get it to work in the first place?

I am pretty confident that DDS users are already using the software in such scenarios. And since ROS 2 doesn't even expose that many QoS settings the user doesn't have to deal with that "complexity".

@dhood
Copy link
Member Author

dhood commented Jun 9, 2018

We've switched back to not using partitions for the implementation of namespace (ros2/ros2#476). Regression test for this issue in ros2/rcl#257

@dhood dhood closed this as completed Jun 9, 2018
130s pushed a commit to 130s/monitoring_tools that referenced this issue Mar 14, 2022
* Topics returned by get_topic_names_and_types have leading forward slashes now

* get_topic_names_and_types returns a list of types now

* Fix a bug in the reception of best effort messages

* Kinder failing without matplotlib

* Add missing __init__.py files

* Print warning when topics which appropriate names are ignored because of types

* No issues with empty waitsets anymore

(There's always a signal handler)

* Add linters

* Install the whole package rather than using py_modules

* Don't publish on topic with the same name token

The different types cause connext to get mixed up, see ros2/rmw_connext#234

* Flake8 import order wants different things on different machines?

* Add launch as exec_depend
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working ready Work is about to start (Kanban column)
Projects
None yet
Development

No branches or pull requests

6 participants