-
Notifications
You must be signed in to change notification settings - Fork 357
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Entity enable #530
base: master
Are you sure you want to change the base?
Entity enable #530
Conversation
@TheFixer I'm sorry it is taking so long to review. It hasn't slipped through the cracks, it is just that it needs and deserves a careful look. |
No problem, @eboasson . I am still struggling a bit with properly cleaning up entities that have not been enabled yet. I suggest I keep struggling for a while while you spent your valuable time on more urgent issues. |
78a41fc
to
0a34a1c
Compare
6a4e77a
to
18f39fc
Compare
@eboasson This PR implements delayed enabling of entities. A more detailed description of what has been done can be found in https://github.com/TheFixer/cyclonedds/commit/18f39fca458f365981b34e5928e6768148e0fe64. Would it be possible to review this PR? |
It has been a few months ago that I spent a considerable amount of time to attempt to implement the dds_enable() and get it running on the (then current) master. So far I have not received any comments. Therefore, I am about to close this pull request. |
@TheFixer I am very sorry that it got ignored for so long. I don't know why I missed this:
but it sure hasn't helped that it got lost somewhere in either my mailbox or my memory. As to the relevance of the PR, it is a rather important bit considering that content filtering is something that should happen at the reader level (rather than via "content filtered topics") and that the "enable" functionality is a key part of making it possible to set a content a filter on a reader before it starts receiving data. (The alternative would be setting it via a QoS, but that has some other downsides, at least at this time.) So allow me to reopen the PR and make a priority of doing the review it deserves. We'll see how large the impact is of the changes that have been merged into master since then, but as they don't involve major changes to the basics of creating or deleting entities, I have good hopes it won't be too bad. And next time, if you don't hear anything and are wondering what's going on, please don't wait so long and just ping me! |
Signed-off-by: TheFixer <thefixer@iteazz.com> This pull request implements delayed enabling of entities, in particular of publishers, subscribers, readers and writers. Delayed enabling allows immutable qos changes to be changed as long as the entity is not enabled, but once the entity is enabled immutable qos changes are not allowed anymore. Delayed enabling is implemented by deferring the creation of ddsi entities as long as the entity is not enabled. The application can call dds_enable() on a disabled entity to enable it. Calling dds_enable() on an already enabled entity is considered a noop for the application. This pull request implements the following behaviour: - Participants are always enabled. - Created entities will be automatically enabled when their parent entity is enabled, and their parent has autoenable set to true. This is the default. - Calling dds_create_writer() on a disabled publisher which has autoenable=true will return a dds writer without a corresponding ddsi writer (and similar for readers) - Disabled entities can be enabled by calling dds_enable() on the entity. When the parent of the entity is not enabled, enabling will fail and return with DDS_RETCODE_PRECONDITION_NOT_MET. When the entity cannot be enabled because of qos mismatches or invalid security credentials, an error code will be returned. The call dds_enable() will return DDS_RETCODE_OK if is has enabled the entity and alls its disabled children that have been created while their parent has autoenable=true. If not all children are successfully enabled then dds_enable() will return with the error code of the first child that that could not be enabled. - Calling dds_enable() on an already enabled entity is a noop. - The m_iid is pre-allocated when a dds_entity is created, and is reused when the corresponding ddsi entity gets enabled. - Immutable QoS settings can be changed on an entity as long as the entity is not enabled. As a consequence the creation of some qos dependent structures (like whc) have been moved from the dds entity to the ddsi entity. - The whc is created when entity is enabled, not prior to that. The same holds for the xpack creation. - When readers that have no predefined rhc, the rhc will be created when the reader gets enabled (i.e., the call to dds_rhc_default_new() is done when the entity is enabled) - Readers that do have a predefined rhc will use this rhc when becoming enabled. - thread_state_awake() and thread_state_sleep() functions have moved to the location where entities are enabled. - The check for the availability of the security credentials has been moved to location where entities are enabled. This implies that a disabled dds writer or reader can be created before checking its security credentials. To not change legacy behaviour the creation of a dds reader or writer fails with DDS_RETCODE_NOT_ALLOWED_BY_SECURITY when the reader or writer is automatically enabled when created. Note that this can only happen for writers whose parent is enabled and have autoenabled set to true. - If an entity (e.g., subscriber) is enabled which causes recursive calls to dds_enable() on multiple children (e.g., readers), and some of these child entities are successfully enabled and others are not (e.g., because the some of the readers have the correct security credentials and others have not) then the dds_enable() on the parent will fail. It is the responsibility of the application to find out which of the entities have been enabled, and which have not. - When a reader is created with an implicit subscriber, then the implicit subscriber is enabled if and only if the autoenable setting of the participant is set to true. Similarly for the creation of a writer with an implicit publisher. - Various tests have been implemented. Notes for the reviewer. - If a participant has autoenable=false and an reader (or writer) with an implicit subscriber (or publisher) is created, I don't think it is possible to enable the implicit subscriber (or publisher)( anymore. I therefore think it would be wise to prevent the creation of a reader (or writer) with an implicit subscriber (or publisher) when the participant has autoenable=false, and return DDS_RETCODE_PRECONDITION_NOT_MET in that case. Do you agree? - Currently, I don't use locking when recursively enabling children. Please check. - I am not sure, but I have the impression that some security tests take much time. Not sure if it is caused by this pull request.
@TheFixer the set of conflicts appeared small enough to make it worthwhile rebasing it before doing the review (the advantage is that I have keep track of fewer versions in my head), and so I did just that. I've made my rebased version available at https://github.com/eboasson/cyclonedds/tree/thefixer-entity-enable. The tests still pass locally. A quick look at the changes I made suggests that it won't affect the review in a major way, so I can give review comments on the current version of this PR without complicating things. However, I think it would make the most sense to update the PR first, provided you agree with the way I resolved the conflicts in the rebased version. |
Thanks @eboasson for taking the effort. I'll have a look at the rebase but don't expect any issues. I let you know when the updated pull request is available. |
18f39fc
to
7d55316
Compare
@eboasson I inspected the changes caused by the rebase and did not see any problems. The PR is now updated. |
@TheFixer thanks, that makes doing the review a bit easier. (And leaves me no excuse for procrastination!) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TheFixer as was to be expected, the review results in some comments. Some are really minor matters, some are more substantial.
What I haven't checked is whether indeed all functions that should fail if the entity they are invoked isn't enabled yet correctly return that error. I'll do that in a second round, I suppose.
Regarding the tests, I cannot help but wonder whether it would not be better to collect all "enable" tests in a single file and structure them a bit differently because a large overlap in the set of checks for the different entity kinds, and a bit of generic code along with some small functions to construct entities would likely result in significantly less code covering exactly same the ground.
Finally, I think that especially for the reader/writer tests it would make sense to also check that the built-in topics come out as they should (generated at the correct time, correct instance handles, stuff like that) and that remote nodes indeed observe the discovery events as a consequence of enabling the entity. (The latter is pretty obvious looking at the code, but the former not so much.)
* The dds_enable() operation will return DDS_RETCODE_OK if an entity and all its | ||
* autoenabled children have been enabled successfully. In case one or more of | ||
* its autoenabled children could not be enabled, an error code is returned. | ||
* It is the responsibility of the application to find out which of the children |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not really a fan of this model — I'd rather see no change at all when the operation fails — but considering that it is a relatively rarely used operation and that I imagine failure will usually be treated as a grounds for program termination, I think this is fine for a first iteration.
@@ -202,6 +202,23 @@ static struct ddsi_tkmap_instance *dds__builtin_get_tkmap_entry (const struct dd | |||
return tk; | |||
} | |||
|
|||
static struct ddsi_tkmap_instance *dds__builtin_get_tkmap_entry_with_iid (const struct ddsi_guid *guid, const uint64_t iid, void *vdomain) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function serves a rather unique purpose: new local entities now need to have an iid
before being enabled, and in the way it is implemented in this PR before the GUID is assigned. In all other cases the iid
gets assigned on mapping the key to an instance id, and for entities that GUID is the key.
Technically, the possibility exists that a GUID gets reused, there are after all but 2^24 entity ids (in Cyclone, arguably different entity kinds could use the same entity id). So a process that creates 2^24 entities (but for some rounding errors), deletes one of them (so there is again an entity id available), and then creates another entity will necessarily reuse the entity id of the entity that was just deleted. However the GUID-to-iid mapping may outlive the entity that was deleted.
Consequently, in the design in this PR, come the call dds__builtin_get_tkmap_entry_with_iid
on the creation of the second entity with GUID guid
, if the old GUID-to-iid mapping for that GUID still exists, you end up in a state where the mapping retains the old iid, whereas the entity object has the new iid. That'll lead to an observable difference (somewhere, I can't be bothered to figure out all the details). I am sure you'll agree this is an edge case, but edge cases are usually more interesting than regular ones 🙂.
Methinks the solution lies in not introducing this new path, but rather moving the GUID allocation (e.g., in new_reader
) up in the stack (e.g., to dds_create_reader
, or perhaps even to dds_entity_init
) so that the GUID-to-iid mapping can always follow the same path and the potential discrepancy avoided.
(This has various effects throughout the PR, I'm not going to mark them, that's not worth the bother.)
e->m_flags |= DDS_ENTITY_ENABLED; | ||
DDS_CERROR (&e->m_domain->gv.logconfig, "Delayed entity enabling is not supported\n"); | ||
} | ||
rc = dds_entity_deriver_enable(e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does make some sense (enabling is definitely specific to the entity kind), but it doesn't actually dispatch to an implementation for all entity kinds (e.g., what about topics? — and if those don't need any special handling, dds_entity_deriver_dummy_enable
simply returns "illegal operation", so is a rather incomplete operation).
/* enable all disabled children with the flag DDS_ENTITY_ENABLE_ON_PARENT */ | ||
for (c = ddsrt_avl_iter_first (&dds_entity_children_td, &e->m_children, &it); c != NULL; c = ddsrt_avl_iter_next (&it)) { | ||
if ((!(c->m_flags & DDS_ENTITY_ENABLED)) && (c->m_flags & DDS_ENTITY_ENABLE_ON_PARENT)) { | ||
ret = dds_enable((dds_entity_t)c->m_hdllink.hdl); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is invoked by the entity kind-specific function on e
while e
's entity lock is held, but that means it locks the children's entity locks while holding the parent's entity lock, which breaks the rules. Perhaps it would have been better to allow locking a child entity while holding the parent's lock, rather than the other way around, but today you can only easily walk "upward" in the hierarchy.
So here is a risk of deadlock, however small (there are rather few operations that would actually try to lock a disabled entity, but certainly calling dds_enable
on a publisher and on its writers in parallel is one way.)
The trick to do such a walk over entities typically involves unlocking & relocking the parent entity. See, for example, pushdown_pubsub_qos
.
e->m_flags |= DDS_ENTITY_ENABLE_ON_PARENT; | ||
/* do not enable if parent is not enabled or no autoenable */ | ||
if ((!(parent->m_flags & DDS_ENTITY_ENABLED)) || ((e->m_flags & DDS_ENTITY_ENABLE_ON_PARENT) == 0)) | ||
e->m_flags &= ~DDS_ENTITY_ENABLED; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think rewriting the logic to set ENABLED
once if and only if it really is enabled is much clearer than setting it and clearing it if turns out that it shouldn't be set.
thread_state_asleep (lookup_thread_state ()); | ||
#endif | ||
(void) dds_delete (reader); | ||
goto del_implicit_sub; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not against goto
for error handling, but this moves just a little too far in the direction of "unstructured" programming to my taste!
(Also for the writer.)
* returns PRECONDITION_NOT_MET */ | ||
if ((pub->m_entity.m_parent != NULL) && !dds_entity_is_enabled(pub->m_entity.m_parent)) | ||
return DDS_RETCODE_PRECONDITION_NOT_MET; | ||
pub->m_entity.m_flags |= DDS_ENTITY_ENABLED; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests, setting the ENABLED
flag and propagating to children, are a fundamental part of enabling entities. So in my view, these particular parts of enabling an entity should be part of the generic dds_enable
, and not be part of the entity kind-specific functions.
/* FIXME: not freeing WHC here because it is owned by the DDSI entity */ | ||
|
||
/* not freeing WHC here because it is owned by the DDSI entity */ | ||
#if 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any value in leaving this #if 0/#endif block in.
@@ -501,6 +501,9 @@ void rd_wr_init_fail( | |||
goto fail; | |||
dds_set_status_mask (*rd, DDS_SUBSCRIPTION_MATCHED_STATUS); | |||
} | |||
dds_delete_qos (qos); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These three lines don't add anything of value, it means the same code is executed under all circumstances. I prefer the simpler variant.
@@ -448,7 +448,7 @@ CU_TheoryDataPoints(ddssec_secure_communication, multiple_readers) = { | |||
CU_DataPoints(size_t, 1, 3, 1, 3), /* number of participants per domain */ | |||
CU_DataPoints(size_t, 3, 1, 3, 3), /* number of readers per participant */ | |||
}; | |||
CU_Theory((size_t n_dom, size_t n_pp, size_t n_rd), ddssec_secure_communication, multiple_readers, .timeout = 90, .disabled = false) | |||
CU_Theory((size_t n_dom, size_t n_pp, size_t n_rd), ddssec_secure_communication, multiple_readers, .timeout = 120, .disabled = false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why implementing dds_enable
would affect the duration of this test. I'd prefer not to increase the timeout if there is no need, but I'm open to the possibility that 90s is just a bit short.
@eboasson Thanks for the review. I'll rework the PR and let you know when ready. |
This PR implements delayed enabling of participants, publishers, subscribers, readers and writers using
dds_enable()
call. Delayed enabling allows qos changes to entities that before they are enabled, and that would result in DDS_RETCODE_IMMUTABLE_POLICY after the entity has been enabled.In particular, this PR contains the following
entity_factory.autoenable_created_entities
QoS is implemented for participants, publishers and subscribers. This QoS indicates whether child entities for these must be automatically enabled (default) or disabled when created.dds_enable()
to enable entities that have been created disabled. This operation is a noop when applied to an already enabled entityThe following issues are still present: