Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proper support for allow_partial #147

Merged

Conversation

EduardoGoulart1
Copy link
Contributor

@EduardoGoulart1 EduardoGoulart1 commented May 8, 2023

@eliotwrobson @caleb531 Please do not review it because the code is not ready yet. This is just a draft to book progress on the allow_partial support.

If you go with the definition that DFAs are sets of words, then all operations are well-defined and relatively easy to implement. Most of the times the only required change is to replace loops like:

for symbol in input_symbols:
    tgt_a = transitions_a[symbol]

By the code:

for symbol in input_symbols:
    tgt_a = transitions_a.get(tgt_a, None)

I made the code such that it adds no overhead for complete DFAs compared to the current implementation. For partial DFAs, if the DFAs are sparse, then you will get a large performance boost, while if they are dense you will pay some penalty (but most of the times negligible).

I feel like we still need to discuss a few things:

  1. Reserve something (probably None) to represent the trap state. Basically, we add the invariant that if None is part of the set of states, then it must be a trap state. This is already implicitly assumed in _get_next_current_state and greatly simplifies the implementation logic. Because network graph does not allow for states labeled None, we replaced it with automatically generated integers
  2. We could think of making this allow_partial thing invisible to the users. Instead of passing allow_partial as a parameter, we compute internally the flag "is_partial". I go by the principle that we should not restrict our API unless it would lead to misuse.
  3. Now that support for partial DFAs is there, it is straightforward to extend operations to support DFAs with different alphabets

Missing:

  • Adjust/add tests
  • Add benchmarks

@eliotwrobson eliotwrobson mentioned this pull request May 11, 2023
@EduardoGoulart1
Copy link
Contributor Author

EduardoGoulart1 commented May 29, 2023

I'm currently working on finalizing this PR and I would like to hear your opinions before proceeding.

When converting from partial to complete DFAs, it is necessary to add a trap state. So far I was using None for this purpose, but it doesn't work well with networkx.DiGraph because that library doesn't allow nodes labeled None in the graph. It is not too difficult to work around this problem for the DFA class, but it becomes problematic when calling NFA.to_dfa (or other automata types) as this invariant would need to propagate throughout the code. So I need another approach to generate new trap states.

The only solution I can think of that does not require significant refactoring, is to assign the highest negative integer number that isn't already a state. This solution works well, but sometimes states are strings or frozen sets, which leads to a typing mismatch that isn't directly caused by the user. To some extent, this can be mitigated on our side, but I don't think that we can completely avoid it.

The more radical solution would be to make all DFAs partial. That would simplify a lot of the code, but would cost some performance for "dense" DFAs

@caleb531
Copy link
Owner

caleb531 commented May 29, 2023

@EduardoGoulart1 Regarding mixed state types, that's a fair question. @eliotwrobson I think we've updated the library to handle, for example, states of mixed types, correct? I believe some existing methods (maybe DFA.minify?) produce machines with such mixed state types.

If this is the case, then I'm fine with the highest negative integer number for the trap state name.

@eliotwrobson
Copy link
Collaborator

eliotwrobson commented May 29, 2023

@caleb531 yes, the library can handle mixed state types, and there are a couple of places this is used I think (see from_finite_language). The type annotations are also compatible with mixed-type states. @EduardoGoulart1 I think that's a fine convention, as there are a few places where we start assigning integer state names starting from 0 (in the regex code I believe).

@coveralls
Copy link

coveralls commented Jun 4, 2023

Coverage Status

coverage: 99.812% (-0.07%) from 99.883% when pulling 125d244 on EduardoGoulart1:proper_support_allow_partial into f1c3674 on caleb531:develop.

@EduardoGoulart1
Copy link
Contributor Author

EduardoGoulart1 commented Jun 4, 2023

Also, adapting the test is fairly easy, but some tests have hard-coded values for the expected result which for me does not make sense and complicates reusing it for testing partial DFAs. For instance the test below (test_union) is explicitly testing the union operator by constructing the DFA representing the result.

Is there a specific reason for that? I would have expected the test to verify its algebraic properties. For example something like A \subseteq (A \cup B) && B \subseteq (A \cub B) && (A \cub B) - A - B == \emptyset. If there is no such reason, then I will proceed to update such tests (this will also remove many lines of code)

image

@eliotwrobson
Copy link
Collaborator

@EduardoGoulart1 the reason for these is to verify the state names and types of the output (which when names are retained, is part of the API). These test cases should be kept with their hard-coded output. There actually is a test case testing for algebraic properties separately.

I'm not sure about the best way to test with allow partial for these. Is there a way to use parameterized test cases (like in pytest) with nose? The DFAs in those test cases are not partial anyway, so I don't think they can really be meaningfully adapted to test for the new behavior in this PR.

@caleb531
Copy link
Owner

caleb531 commented Jun 4, 2023

@eliotwrobson @EduardoGoulart1 It does look like nose2 supports test parameterization! Although since DFAs can be verbose to construct, I wonder if the decorator signature would get pretty large. But please feel free to play around to find something practical and maintainable:

https://docs.nose2.io/en/latest/params.html

@EduardoGoulart1
Copy link
Contributor Author

@caleb531 @eliotwrobson sure my initial idea was to extend the tests to use dfa.as_partial(). The only places where this does not work is for such hard-coded tests. The problem is that we do not expand all the states. But I will try my best with these constraints in mind :D

@eliotwrobson
Copy link
Collaborator

@EduardoGoulart1 just a heads up that, because of the size of the refactor and some weirdness that was uncovered, I want to wait until #129 is merged before merging this. I fixed the last major blocker there, so hopefully that will happen soon.

Even if a state in a partial DFA has no outgoing transitions, it still
needs to have a state function (i.e. an empty dict)

Also added f-strings to validation messages
@caleb531
Copy link
Owner

@EduardoGoulart1 @eliotwrobson Just merged #129! I'm working on resolving the merge conflicts now.

The semantics is defined such that if no transition is defined for a
certain symbol on a given state, we act as if there would be a
transition there leading to a trap state.

Reserve None to denote trap states.

The code is implemented such that there is little or no extra overhead
for complete DFAs. For partial DFAs, the code is implemented to perform
well with sparse DFAs. For most functions, the code will also perform
well with dense partial DFAs.
@caleb531 caleb531 force-pushed the proper_support_allow_partial branch from 3d92e07 to a48075d Compare June 14, 2023 19:51
automata/fa/dfa.py Outdated Show resolved Hide resolved
@eliotwrobson
Copy link
Collaborator

@EduardoGoulart1 Some merge conflicts came up when #152 was merged. I've resolved those and gotten the lint check passing in the latest push (with some minor style changes and optimizations). Please pull when you get the chance.

@eliotwrobson
Copy link
Collaborator

Would it be possible for you guys to work through the outstanding comment threads? I would personally prefer to review this PR only once these comment threads have been resolved, and I don't have anything more to say on the ones that are still open.

@caleb531 sounds good, I'm resolving some now and the remaining threads are fairly short.

@eliotwrobson
Copy link
Collaborator

@caleb531 all items are resolved and removed all TODOs. The only thing left is docs changes, but I'll leave that until after your review. Thanks!

automata/fa/dfa.py Outdated Show resolved Hide resolved
automata/fa/dfa.py Outdated Show resolved Hide resolved
Copy link
Owner

@caleb531 caleb531 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@EduardoGoulart1 @eliotwrobson Left a few requested changes. Nothing major (at least from my perspective 😅).

Copy link
Collaborator

@eliotwrobson eliotwrobson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@caleb531 Resolved all threads from your comments! If those were all the items you had questions about, feel free to merge when ready!

@caleb531
Copy link
Owner

caleb531 commented Jul 18, 2023

@EduardoGoulart1 @eliotwrobson This looks good to me! Will merge now.

Thank you both for all the work on this PR.

@caleb531 caleb531 merged commit 832bd5d into caleb531:develop Jul 18, 2023
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants