-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proper support for allow_partial #147
Proper support for allow_partial #147
Conversation
I'm currently working on finalizing this PR and I would like to hear your opinions before proceeding. When converting from partial to complete DFAs, it is necessary to add a trap state. So far I was using None for this purpose, but it doesn't work well with networkx.DiGraph because that library doesn't allow nodes labeled None in the graph. It is not too difficult to work around this problem for the DFA class, but it becomes problematic when calling NFA.to_dfa (or other automata types) as this invariant would need to propagate throughout the code. So I need another approach to generate new trap states. The only solution I can think of that does not require significant refactoring, is to assign the highest negative integer number that isn't already a state. This solution works well, but sometimes states are strings or frozen sets, which leads to a typing mismatch that isn't directly caused by the user. To some extent, this can be mitigated on our side, but I don't think that we can completely avoid it. The more radical solution would be to make all DFAs partial. That would simplify a lot of the code, but would cost some performance for "dense" DFAs |
@EduardoGoulart1 Regarding mixed state types, that's a fair question. @eliotwrobson I think we've updated the library to handle, for example, states of mixed types, correct? I believe some existing methods (maybe If this is the case, then I'm fine with the highest negative integer number for the trap state name. |
@caleb531 yes, the library can handle mixed state types, and there are a couple of places this is used I think (see |
Also, adapting the test is fairly easy, but some tests have hard-coded values for the expected result which for me does not make sense and complicates reusing it for testing partial DFAs. For instance the test below ( Is there a specific reason for that? I would have expected the test to verify its algebraic properties. For example something like |
@EduardoGoulart1 the reason for these is to verify the state names and types of the output (which when names are retained, is part of the API). These test cases should be kept with their hard-coded output. There actually is a test case testing for algebraic properties separately. I'm not sure about the best way to test with allow partial for these. Is there a way to use parameterized test cases (like in pytest) with nose? The DFAs in those test cases are not partial anyway, so I don't think they can really be meaningfully adapted to test for the new behavior in this PR. |
@eliotwrobson @EduardoGoulart1 It does look like nose2 supports test parameterization! Although since DFAs can be verbose to construct, I wonder if the decorator signature would get pretty large. But please feel free to play around to find something practical and maintainable: |
@caleb531 @eliotwrobson sure my initial idea was to extend the tests to use dfa.as_partial(). The only places where this does not work is for such hard-coded tests. The problem is that we do not expand all the states. But I will try my best with these constraints in mind :D |
@EduardoGoulart1 just a heads up that, because of the size of the refactor and some weirdness that was uncovered, I want to wait until #129 is merged before merging this. I fixed the last major blocker there, so hopefully that will happen soon. |
Even if a state in a partial DFA has no outgoing transitions, it still needs to have a state function (i.e. an empty dict) Also added f-strings to validation messages
@EduardoGoulart1 @eliotwrobson Just merged #129! I'm working on resolving the merge conflicts now. |
The semantics is defined such that if no transition is defined for a certain symbol on a given state, we act as if there would be a transition there leading to a trap state. Reserve None to denote trap states. The code is implemented such that there is little or no extra overhead for complete DFAs. For partial DFAs, the code is implemented to perform well with sparse DFAs. For most functions, the code will also perform well with dense partial DFAs.
3d92e07
to
a48075d
Compare
@EduardoGoulart1 Some merge conflicts came up when #152 was merged. I've resolved those and gotten the lint check passing in the latest push (with some minor style changes and optimizations). Please pull when you get the chance. |
@caleb531 sounds good, I'm resolving some now and the remaining threads are fairly short. |
@caleb531 all items are resolved and removed all TODOs. The only thing left is docs changes, but I'll leave that until after your review. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@EduardoGoulart1 @eliotwrobson Left a few requested changes. Nothing major (at least from my perspective 😅).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@caleb531 Resolved all threads from your comments! If those were all the items you had questions about, feel free to merge when ready!
@EduardoGoulart1 @eliotwrobson This looks good to me! Will merge now. Thank you both for all the work on this PR. |
@eliotwrobson @caleb531 Please do not review it because the code is not ready yet. This is just a draft to book progress on the allow_partial support.
If you go with the definition that DFAs are sets of words, then all operations are well-defined and relatively easy to implement. Most of the times the only required change is to replace loops like:
By the code:
I made the code such that it adds no overhead for complete DFAs compared to the current implementation. For partial DFAs, if the DFAs are sparse, then you will get a large performance boost, while if they are dense you will pay some penalty (but most of the times negligible).
I feel like we still need to discuss a few things:
Reserve something (probably None) to represent the trap state. Basically, we add the invariant that if None is part of the set of states, then it must be a trap state. This is already implicitly assumed in. Because network graph does not allow for states labeled None, we replaced it with automatically generated integers_get_next_current_state
and greatly simplifies the implementation logicMissing: