Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for pipelines with multiple inputs #30

Merged
merged 13 commits into from
Feb 8, 2018
Merged

Add support for pipelines with multiple inputs #30

merged 13 commits into from
Feb 8, 2018

Conversation

lschr
Copy link
Contributor

@lschr lschr commented Oct 17, 2017

Implementation of the feature discussed in #29.

Still missing:
- docs
- tests
- Python 2 compatibility
@lschr
Copy link
Contributor Author

lschr commented Oct 17, 2017

Here is a first working (at least on Python 3.6) implementation. Key features are

  • One can now pass multiple ancestors to the Pipeline class's __init__.
  • There is a new propagate_how argument (the name is, of course, up for discussion) that specifies how attributes are propagated from multiple inputs. It can either be a integer i, which causes propagation from the i-th ancestor only, or "first", or "last" (go through the list of ancestors starting with the first/last). It defaults (for now) to 0, but maybe "first" would be better.
  • The pipeline decorator now supports an argc argument which specifies the number of inputs. This defaults to 1 for backwards compatibility. "all" is also allowed, in which case introspection is used to determine the number of arguments to the function/class. This would be a nice default but is not backwards compatible.

Example:

@pipeline(argc="all")
def add(a1, a2):
    return a1 + a2

i1 = pims.open("img1.tif")
i2 = pims.open("img2.tif")
i12 = add(i1, i2)

@lschr
Copy link
Contributor Author

lschr commented Jan 3, 2018

Since there were no objections, I finished and tested the implementation. I'd love to see this merged.

@lschr lschr changed the title Add support for pipelines with multiple inputs (in progress) Add support for pipelines with multiple inputs Jan 3, 2018
@caspervdw
Copy link
Member

Thanks a lot @lschr and sorry for the lack of attention to this project. I will review this within the next week.

Copy link
Member

@caspervdw caspervdw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lschr Thanks for this fantastic generalization of the pipelines! I left some minor notes in the review, and also one use case that your code (probably) does not cover yet.

it specifies the index of the ancestor (in `ancestors`). If it is
'first', go through all ancestors starting with the first one until
one is found that has the attribute. If it is 'last', go through
the ancestors in reverse order. Defaults to 0.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to stick to int here and using negative integers for indexing from the end of the ancestors list. So 0 equals 'first' and -1 equals 'last'. Probably the implementation will simplify because this is normal Python list indexing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I now understand the necessity of this implementation, please ignore my comment.


def __len__(self):
return self._ancestor.__len__()
return min(a.__len__() for a in self._ancestors)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not len(a)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess because it was self._ancestor.__len__() orginally. Since I did not know why, I did not change it.

If True, don't modify `func`'s doc string to say that it has been
made lazy. Defaults to False
argc : int or 'all', optional
Number of arguments that are relevant for the pipeline. For instance,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit unclear to me, may be something like 'Number of arguments that are indexed by the pipeline', and the name itself, argc, could be turned into the more comprehensive indexed_arg_count? Or ancestor_count?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed the doc and went with ancestor_count to keep it reasonably short.


Returns
-------
Pipeline
Lazy function evaluation :py:class:`Pipeline` for `func`.
"""
if isinstance(argc, str):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There could be a Py2/3 issue here. Maybe better: if argc == 'all':

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

else:
# Fall back on normal behavior of func, interpreting input
# as a single image.
return cls([obj], *args, **kwargs)[0]
return cls(*(tuple([a] for a in ancestors) + args), **kwargs)[0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably doesn't cover the case of 1 indexable and 1 non-indexable ancestor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is (currently) not supported and also I think it would be non-trivial to implement. A user requiring such a thing could either set ancestor_count=1 (with reduced flexibility with respect to the second ancestor) or manually turn the second ancestor into a Slicerator.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, this is some kind of array casting. I am 👍 to make the absence of array casting explicit and not allow ancestors with unequal lengths inside a Pipeline. So that would include adding a check at the __init__ and adapting the __len__. What do you think?

or isinstance(obj, Pipeline):
def proc_func(x):
return func(x, *args, **kwargs)
if isinstance(argc, str):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@caspervdw
Copy link
Member

@lschr See my line comment. @danielballan do you want to check this as well before merging?

@lschr
Copy link
Contributor Author

lschr commented Jan 16, 2018

@caspervdw I think you are right. Let the user deal with differently-sized ancestors himself since he is the only one knowing how to do so properly anyways. Pipeline.__init__ will now raise a ValueError if ancestors' sizes don't match.

@caspervdw
Copy link
Member

Thanks @lschr, good to merge from my side. @danielballan ?

@danielballan
Copy link
Member

I would like to look at this before merging. Swamped today, will try to look this weekend.

@danielballan
Copy link
Member

I haven't had a chance to look at this, but I don't want to keep blocking progress, so I will trust @caspervdw and push the green button. Sorry for the hold-up!

@danielballan danielballan merged commit bd9e1b4 into soft-matter:master Feb 8, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants