Add parameter to limit depth of recursion across relationships #97

grant-humphries · 2017-04-06T00:06:06Z

Hi folks, I'm attempting to upgrade from version 0.3.1 to 0.3.3. The recursion across all relationships to create schema nodes for more distantly related tables (as opposed to just primary relationships) is one of the features that is leading me to upgrade, but the ORM that I'm working with is pretty massive. When calling SQLAlchemySchemaNode against one of our tables I let the code run for a minute or so and add_nodes method got called just under 41,000 times. I've let it run for several minutes longer than that, but the task I was attempting never completed and python was chewing up a pretty good chunk of memory during that time.

In light of this I propose that a recursion depth parameter be added that limits how many tables away from the root an relationship can be and still be added to the schema. It may be a good idea to have a default value in place for this as well to avoid situations like the one I describe.

The text was updated successfully, but these errors were encountered:

tisdall · 2017-04-06T17:34:45Z

Ideally the best solution would be for it stop if it reaches a node it's already visited. However, that might be complicated to implement and requires keeping track of all visited nodes.

A recursion depth count would stop it running infinitely, but the end result coming from the method would probably be unusable. It'd probably be good to have it throw an exception if the recursion goes beyond a given depth. (this would probably good to have either way)

I haven't worked on this project in years, though, so I'm not likely the best person to implement this. A recursion depth counter shouldn't be too hard to implement and I'd accept a PR for that if it had tests and docs.

grant-humphries · 2017-04-13T19:44:11Z

Thanks for taking the time to respond to this @tisdall. This package is pretty useful to my company so I might be able to convince them to allow me some time to create a pull request for this. A couple of responses to your comments:

Ideally the best solution would be for it stop if it reaches a node it's already visited. However, that might be complicated to implement and requires keeping track of all visited nodes.

It seems like a straight forward solution to this would be add the sqlalchemy class names to a set after each node visit. Since sets use hash tables the look-ups should still be pretty quick even if the ORM is really large. Does anything come to mind off hand that would circumvent this approach? It looks like something similar has been done with in prevent circular dependencies with parent_ parameter.

A recursion depth count would stop it running infinitely, but the end result coming from the method would probably be unusable.

Can you elaborate on why the schema may be unusable in those cases? It seems like it would be possible to stop adding relationships once they are more than x levels away from the root table.

In our case I can't imagine a scenario where we want tables more the three or four levels from the root added to the schema so it would still be nice to be able prevent the creation of large parts of the schema that are never intended to be used.

tisdall · 2017-04-16T16:46:52Z

@grant-humphries - What I meant was that if you had data/schema that went 8 layers deep but returned only 4, the result wouldn't be very usable as it's missing data and if you used that to edit the data the end result would save only 4 layers and truncate the rest. So, I think it should be an exception and leave it up to the implementer to deal with it as they want. In other words, it should always be the case that objectify(dictify(x)) should always return x if there's no exception.

In your specific example, you'd specify the limit to recursion to be 4 since you expect nothing to be deeper than that. Anything that went deeper would be an exception.

tisdall · 2017-04-16T16:48:25Z

It's been a while since I used SQLAlchemy so I can't remember how it handles some things... I don't think you can just look at class names as you could have a table that has foreign keys to itself. I think you'd have to look at the table and the primary keys to identify if it's the exact same row you've already visited.

maparent · 2017-04-24T21:02:02Z

This may sound simple-minded, but what about immediately setting the __colanderalchemy__ property of the class in the SQLAlchemySchemaNode initializer? That node is not complete yet, but it is already a valid reference.
Then, when we get the relationship _class in get_schema_from_relationship (schema.py:460), we can check if the attribute is set and return it directly if so.
The parents_ attribute will not be valid with this approach, but it does not seem to be used except to (unsuccessfully) avoid the loop.

tisdall · 2017-04-25T12:09:06Z

@grant-humphries - can you provide a failing example that we can run tests against? Take a look at the 'tests' directory here for how the tests work and see if you can modify one to mimic your data structure and hopefully replicate the problem.

Joishi · 2017-11-02T17:03:36Z

@tisdall - I'll take a look, as I am encountering the same issue

Joishi · 2017-11-06T17:58:47Z

@tisdall - Actually, looking at my models and your tests and doing some debugging, I'm not noticing any sort of recursive circle issues. What I am noticing is that it's recursing as deeply down the stack as necessary to create a full schema. My ORM contains about 400 models, many of which are intertwined. To create a full schema for 1 model ends up creating over 250k schemas (many of which are duplicates, just at different levels in different relationships).

I have added some additional models to the tests and added one more test to allow for includes to support dot notation. I will likely submit a pull request this week for review.

tisdall · 2017-11-06T20:08:41Z

@Joishi - How are you planning on handling the issue in the PR?

Joishi · 2017-11-06T20:26:56Z

I'm not sure what you mean? This is the first time I've worked on a shared project in Github, so I'm not familiar with the expected workflow. I have forked the repo and made some local changes (not pushed yet) .. was going to submit a pull request from my fork. Or maybe I'm not understanding what you are asking correctly. :)

tisdall · 2017-11-06T20:34:13Z

@Joishi - well, are you adding a parameter to limit the recursion, or are you implementing something else?

Joishi · 2017-11-06T20:51:33Z

@tisdall - I'm going to add support for dot notation of includes.. For example:

includes = [
'col1',
'rel1',
'rel1.col1',
]

This will generate a schema that contains what you specify in the includes, and only those values. If you don't specify an includes, it will get everything by default (as it does now).

The above example would generate a schema that includes the relationship rel1, but only the col1 of that relationship. Any relationships THAT object has will not be traversed.

tisdall · 2017-11-06T21:09:05Z

@Joishi - Just be aware that I haven't touched anything related to this library for quite some time, so I can't guarantee that it'll be merged any time soon... I'll likely need some input from some other users before going ahead with a merge.

If test is ran to generate full Route schema, it creates 113 total SQLAlchemySchemaNode objects as it traverses all relationships. For small orms this may not be an issue, but the problem gets exponentially worse as the size of the orm grows. This commit will enable users to explicitly state which relationships and columns to include in subnodes by allowing for dot-notation syntax in the includes keyword arg. Resolves: stefanofontanelli#97

Joishi · 2017-11-07T20:55:54Z

@tisdall - I submitted the pull request but it failed on 2.6 support .. I'm guessing it has to do with the list comprehension I used. I will review what 2.6 support will require.

Oh ... or not. This is in the build log:

Exception: SQLAlchemy requires Python 2.7 or higher.

tisdall · 2017-11-08T13:16:53Z

Yeah, SQLA dropped support for Python 2.6 so we probably should too.

maparent mentioned this issue Apr 24, 2017

Set the __colanderalchemy__ class attribute early to cut recursion depth #100

Open

Joishi mentioned this issue Nov 7, 2017

Allow includes to support dot notation #105

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add parameter to limit depth of recursion across relationships #97

Add parameter to limit depth of recursion across relationships #97

grant-humphries commented Apr 6, 2017

tisdall commented Apr 6, 2017

grant-humphries commented Apr 13, 2017

tisdall commented Apr 16, 2017

tisdall commented Apr 16, 2017

maparent commented Apr 24, 2017 •

edited

Loading

tisdall commented Apr 25, 2017

Joishi commented Nov 2, 2017

Joishi commented Nov 6, 2017

tisdall commented Nov 6, 2017

Joishi commented Nov 6, 2017

tisdall commented Nov 6, 2017

Joishi commented Nov 6, 2017 •

edited

Loading

tisdall commented Nov 6, 2017

Joishi commented Nov 7, 2017 •

edited

Loading

tisdall commented Nov 8, 2017

Add parameter to limit depth of recursion across relationships #97

Add parameter to limit depth of recursion across relationships #97

Comments

grant-humphries commented Apr 6, 2017

tisdall commented Apr 6, 2017

grant-humphries commented Apr 13, 2017

tisdall commented Apr 16, 2017

tisdall commented Apr 16, 2017

maparent commented Apr 24, 2017 • edited Loading

tisdall commented Apr 25, 2017

Joishi commented Nov 2, 2017

Joishi commented Nov 6, 2017

tisdall commented Nov 6, 2017

Joishi commented Nov 6, 2017

tisdall commented Nov 6, 2017

Joishi commented Nov 6, 2017 • edited Loading

tisdall commented Nov 6, 2017

Joishi commented Nov 7, 2017 • edited Loading

tisdall commented Nov 8, 2017

maparent commented Apr 24, 2017 •

edited

Loading

Joishi commented Nov 6, 2017 •

edited

Loading

Joishi commented Nov 7, 2017 •

edited

Loading