Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What should map.these() yield? #14718

Open
e-kayrakli opened this issue Jan 3, 2020 · 37 comments
Open

What should map.these() yield? #14718

e-kayrakli opened this issue Jan 3, 2020 · 37 comments

Comments

@e-kayrakli
Copy link
Contributor

map.these() yields keys in the map. This was initially based on python's
dictionary interface.

A good amount of discussion about this starts with this
comment

In sum, python's dict interface was designed that way so that there is a
symmetry between different uses of the in keyword.

some_key in some_map  # check if a key is in the map
for some_key in some_map # iterate keys in the map

But Chapel doesn't have in (#5034). So, maybe we shouldn't follow this
precedent. Maybe map.these can iterate k-v pairs, and if we have in in the
future we can make it take k-v as the lhs argument, as well.

Personally, just looking at the dict interface, I am not a big fan of python's
dictionary iterator yielding keys, and it is reasonable to think about making
Chapel's map.these() yield k-v pairs.

OTOH, I am reluctant because I am used to it from python, and it may be
annoying for people coming from pythonland.

@dlongnecke-cray
Copy link
Contributor

One disadvantage of yielding both keys and values: in the case that we decide to have a no-reference map, we'd ostensibly have to yield both keys and values by value.

I think it is reasonable to yield keys by value and to have the expectation that keys generally be small sized value types suitable for copying around. However I don't hold any such expectation for map values.

So if we decide (not saying we will) to prohibit yielding by reference, then there would be a cost to always yielding a value even if it isn't used.

However if we continue to exist in a world where we can yield references to values (and const references to keys), I think it would be totally reasonable to have map.these() yield both keys and values.


The above uncertainties I've expressed make me think that the safest choice would be to continue to have map.these() continue to yield only keys.

@mppf
Copy link
Member

mppf commented Jun 10, 2020

I think that yielding references is a separate problem from returning them because we historically have had the "no updating collection size while iterating" rule.

I personally prefer having map.these yield (key, value) tuples however I'm not sure if the language/compiler is up to the task of yielding modifyable values in this context. (Modifying the key in a loop over the map would definitely be a no-no - but modifying the value seems like something we should be able to support).

@dlongnecke-cray
Copy link
Contributor

@daviditen Any thoughts as to what map.these() should yield as the implementor of map?
@bradcray I thought I saw you give a thumbs up in the OP, do you have an opinion one way or the other?

@bradcray
Copy link
Member

I was really surprised by the current map.these() behavior, and I think it was my comment (linked in the OP above) that caused Engin to fork off this issue. (key,value) pairs is what I would've expected with no other information.

@daviditen
Copy link
Member

I have no objection to it yielding (key, value) tuples.

@dlongnecke-cray
Copy link
Contributor

dlongnecke-cray commented Jun 11, 2020

@ben-albrecht As a Python power-user, would you be comfortable or feel alarmed if map.these yielded key/value pairs instead of just keys? (Ah I just realized you thumbs-upped the OP, I take that to mean you're OK with the change.)

@lydia-duncan Ditto experience with Python - any preference?

@ben-albrecht
Copy link
Member

I am OK with key,value pairs, and like that looping over keys or values requires an explicit iterator: map.keys() or map.values()

@lydia-duncan
Copy link
Member

Seems reasonable to me!

@dlongnecke-cray
Copy link
Contributor

A new thought just occurred to me, which is that even if we enabled returning tuples by const ref as in #15973, we'd unfortunately still have to return the entire tuple from map.these() by const ref. This is because we require keys to be immutable, and we'd have no way of specifying the constness of the individual elements of a returned referential tuple.

A bit of a shame, because it means if a user wants to mutate a value in a loop on map.these(), they'd have to grab a mutable reference:

for (k, v) in myMap {
  ref v2 = myMap[k];
  v2 = foo;
}

So I suppose as long as you don't plan to mutate values, map.these() still ends up being fairly idiomatic.

@mppf
Copy link
Member

mppf commented Jul 14, 2020

I wonder if we should make it possible to specify the return intent for tuple elements individually?

E.g. proc map.these(): (const ref keyType, ref valType)

@dlongnecke-cray
Copy link
Contributor

Woah there. That's a very bold question you're asking...

@bradcray
Copy link
Member

I wonder if we should make it possible to specify the return intent for tuple elements individually?

This is reminiscent of a question @vasslitvinov asked some time ago when we were wrestling with tuple semantics, but I don't recall whether we didn't like it or just didn't have the incentive to chase after it at that time. To me, it also seems a little entangled with the ongoing desire to have ref fields in objects #8481.

@mppf
Copy link
Member

mppf commented Jul 14, 2020

To me, it also seems a little entangled with the ongoing desire to have ref fields in objects #8481.

@bradcray - is there some specific part of ref fields in objects that makes you concerned about this? For the tuple, it's more that it's an anonymous object, and the user wishes to make one field const ref and another ref.

For my part, I think the language could handle ref fields in objects at this point and getting that going is mainly an implementation effort.

@bradcray
Copy link
Member

I wouldn't say concerned, just that I think of our implementation of (heterogeneous) tuples as being record-like, which is why the two efforts seem related to me. I've been in favor of ref fields in general, just pointing out that implementing one may involve (or get us most of the way to) supporting the other.

@mppf
Copy link
Member

mppf commented Jul 14, 2020

Just to be clear - tuples already support ref fields but it's handled as special stuff in the compiler rather than a general feature.

@vasslitvinov
Copy link
Member

We have notes from a deep dive on tuples on 2015-05-19. Reference components were a major topic. The deep dive was inconclusive.

Back then, we were still debating "tuple as a shortcut for multiple variables" vs. "tuple as a lightweight record" and how to pass an array by reference when returning it as a part of a tuple. Brad expressed unexcitement about the syntax proc foo((const in i, ref A)). The syntaxes return (1, ref A) and new MyRecord(1, ref A) were also proposed, I vaguely recall the majority not liking it because it was "too much".

@lydia-duncan
Copy link
Member

lydia-duncan commented Jan 19, 2023

In our meeting on Tuesday, it looked like we were in favor of keeping the these iterator, but wanted to change it to return key, value pairs.

Voting:

  • map.these should exist: 7 👍, 1 ❤️
  • these() returns keys: 2 👍, 1 👎
  • these() returns values: 0
  • these() returns tuples of (key, value): 8 👍🏻

@dlongnecke-cray
Copy link
Contributor

dlongnecke-cray commented Jan 19, 2023

I'm OK with having these iterate over keys and values, but I think that makes it all the more imperative that we iron out whether/how tuples can return components by ref.

I think there is an unacceptable overhead to returning some map value types (and even key types, really) by value.

As well...it may not be possible when the key/value type is non-copyable, such as owned.

I'm sure this has already been brought up, but it would suck if the main iterator over our map collections just straight up didn't work for some element types because it was returning pairs by value.

@bradcray
Copy link
Member

Is anybody able to characterize the current status of tuples and ref components? I seem to remember there being some work to improve them lately, but can't remember who did it or where that left us.

@vasslitvinov
Copy link
Member

For one, tuples+ref components seem to contribute to the "begin + tuple" bug that Brad hit recently.

@dlongnecke-cray
Copy link
Contributor

dlongnecke-cray commented Jan 20, 2023

I have a branch that began the effort to transition ref (int, int) -> (ref int, ref int) semantically, but that effort was never merged because we didn't have the necessary design discussion first. This was when I was relatively new to the project. The branch is very stale, but I (or whomever takes on the task) could reference it in a followup effort as I doubt the production code for tuples has changed.

I think we need to decide if we are OK with ref (int, int) meaning (ref int, ref int). We have some discussions pending for that.

Furthermore, I think we need to seriously consider what is brought up here and in #15973. Specifically, the ability to specify storage and constness for individual tuple components.

I think this is important in the Map case less for the ability to return individual components by ref or value, but because we need the ability to specify the key is const but the value is mutable.

If we're OK with just having map.these() return const keys and values, then just the const ref (k, v) -> (const ref k, const ref v) effort will carry us for now, though.

@bradcray
Copy link
Member

For cases that want to yield by ref, would

for (k,v) in zip(myMap.keys(), myMap.values()) { ... }

have the same problem, or does it somehow dodge it? If it has the problem then the issue seems orthogonal to the these() iterator yielding tuples... If it doesn't, I'd like to understand better why it doesn't and these() would.

@dlongnecke-cray
Copy link
Contributor

I don't think that your example would work as hoped today. I expect it would copy the keys and values.

Let's say we slapped a ref (k, v) on the loop index. Today, that would compiler error or segfault because you're returning a ref to a temporary tuple value created by the zip.

So let's imagine that we went through with the (ref k, ref v) transformation.

Then the problem is that the keys cannot be yielded by anything but const ref (this is always the case for correctness). So we need to do const ref (k, v). Unless we can specify constness for individual elements, the tuple is always going to be restricted to const if a single component is const.

@bradcray
Copy link
Member

I don't think that your example would work as hoped today. I expect it would copy the keys and values.

Even if one or both of the iterators was written to yield references? If so, then why does:

for (i,a) in zip(A.domain, A) do
  a = i;

work? (in the sense of "modifies elements of A"?) I was thinking maybe it's just because we use the de-tupling syntax for the index variables, but that seems not to be the case: https://ato.pxeger.com/run?1=m70sOSOxIDVnwYKlpSVpuhY3W8sSixQcrRSiDfX0jGMVilITc6y50vKLFDQydRI1FTLzFKoyCzQc9VLycxMz83QUHDUVUvK5FBQSFWwVMq25yosyS1Jz8jQcNSG6MhNxa8lM1DDUBGrTBTIMNFH0QhwDdRPMbQA

@dlongnecke-cray
Copy link
Contributor

dlongnecke-cray commented Jan 20, 2023

It looks like you're right.

The loop index variable is an example of a referential tuple. I guess I forgot about that even though I wrote the spec piece on it...

I think my misplaced confidence was owing to a discussion I recollect we had (not sure what the issue is for this if any) about whether the x in for x in thing should always be a value or if it should be allowed to be a reference (and in what situations). It's all very hazy, though.

Regardless, the current behavior means that I was wrong and users can work around the issue by zipping keys and values separately.

@mppf
Copy link
Member

mppf commented Jan 20, 2023

For cases that want to yield by ref, would

for (k,v) in zip(myMap.keys(), myMap.values()) { ... }

have the same problem, or does it somehow dodge it? If it has the problem then the issue seems orthogonal to the these() iterator yielding tuples... If it doesn't, I'd like to understand better why it doesn't and these() would.

AFAIK it doesn't have the same problem because the compiler can combine tuple elements for zip in a way that preserves their individual ref nature (and, hopefully, const-ness). In other words, the compiler supports that pattern directly, even though we can't write an iterator that yields such a tuple without using pragmas.

@bradcray
Copy link
Member

OK, so based on that data, my (personal) overall take on this issue would be:

  • map.these() should yield (key, value) tuples
  • for cases where a these() iterator can't express the desired tuple ref-ness properly today, we should generate a compiler error, suggesting the user use the manual workaround of zipping keys() and values() in the meantime. My assumption is for something like a map from ints to ints we would not want to be yielding refs anyway—in what cases would do we want to?
  • we see how often that comes up in practice and how sore users are about it

Unless writing the these() iterator like:

iter these() {
  for (k,v) in zip(keys(), values() do yield (k, v);
}

would allow us to leverage the compiler's ability to create such tuples and things would "just work" (i.e., I'm not sure whether Michael's reference to needing pragmas is only if we were trying to write a new iterator that didn't rely on zippering, or that computed the ref-ness or not of the values manually?)

@mppf
Copy link
Member

mppf commented Jan 23, 2023

for cases where a these() iterator can't express the desired tuple ref-ness properly today, we should generate a compiler error, suggesting the user use the manual workaround of zipping keys() and values() in the meantime. My assumption is for something like a map from ints to ints we would not want to be yielding refs anyway—in what cases would do we want to?

You might want to be able to modify the values in the loop. For a contrived example, this one would increment each value by its corresponding key:

var m: map(int, int) = ...;
forall (key, value) in m {
  value += key;
}

I think simpler cases like "Set all values to 1" can be handled by a .values() that returns references.

Another reasonable approach to the compilation error would be for map.these() to yield the key and value both as const ref. We could say, if you want to modify the value, you need to use .values() (if you don't need the key) or the zippering pattern (if you do). I think this might cover the same cases as your proposal.

I'm pretty happy with either one of these compilation-error approaches here & using that to stabilize `map. But, I am still worried about #15973 from a language stability point of view. In particular, that issue identified

However, any code that makes assumption as to the memory layout of a ref tuple would break.

Of course, that is a side-issue here.

@vasslitvinov
Copy link
Member

Here are my considerations.

It is a very good design if iterating over any collection consistently yields the collection's elements by reference. For const collections or for sets that would yield by const ref. So I would really like these() over a map and or a vector to yield elements rather than pairs.

For map.items(), yielding copies of elements/keys may result in a hidden performance trap. I suggest we do not do that. Especially that yielding copies does not allow the loop body to modify the current element. Yielding both by const ref in a referential tuple is my preference.

@bradcray
Copy link
Member

I would really like these() over a map and or a vector to yield elements rather than pairs.

All I can say in response is that every time I've written code in which I say for ... in myMap do, I've wanted the index expression to be (k,v) rather than just k or v. My rationale for why it makes sense to be intentionally inconsistent with things like arrays (say) is that when iterating over a vector, list, or rectangular array, the values are ordered and the keys are obvious/inferrable based on that ordering. Whereas with a map, there's no way to guess what order the values are going to come out, and most often I find they're not particularly useful without their keys.

@mppf
Copy link
Member

mppf commented Jan 27, 2023

@vasslitvinov -- I don't think it's necessarily wrong to think of the "elements" in a map as key-value pairs. In fact that is how the C++ standard library type works. Is there some reason that does not work for you / is incompatible with other design choices we have in map ? To my understanding, we could use this viewpoint, and would just have to adjust a few documentation lines for existing functions to use the term "values" rather than "elements".

@vasslitvinov
Copy link
Member

@mppf - the choice does not matter to me as a Chapel user. My preference is towards the goal of having a well-designed API.

that is how the C++ standard library type works

If we want to follow precedents in other languages, I think we should follow Python over C++.

@bradcray - given that Chapel is consistently inconsistent, I can accept inconsistency in this case as well. Especially if my arguments are not convincing.

bmcdonald3 added a commit to bmcdonald3/chapel that referenced this issue Feb 13, 2023
In chapel-lang#14718, it was proposed that iterating over a map should yield
key-value pairs as opposed to just values, so this PR implements
that change.

Signed-off-by: Ben McDonald <46734217+bmcdonald3@users.noreply.github.com>
@bmcdonald3
Copy link
Member

In an attempt to wrap this implementation up today, I was trying to enable returning tuples of type (const ref, ref), which isn't possible today, where our only return intents are:

return-intent:
  'const'
  'const ref'
  'ref'
  'param'
  'type'

Trying with ref and const ref and speaking with @dlongnecke-cray I was seeing some pretty strange behavior that definitely was not what we would like for map.these() which led to the opening of #21647.

Since that isn't possible today and seems like a bigger issue to get to a place where that could work, I am thinking that I will have map.these() yield copies of the values for the sake of 2.0 stabilization, rather than references and issue a compiler warning in the case that they are not copyable types.

Please speak up if you disagree with that proposal! (the map module stabilization is planned to wrap up this release).

@mppf
Copy link
Member

mppf commented Feb 21, 2023

@bmcdonald3 -- I think we should avoid stabilizing on yielding copies here. However, I am not sure what it will take to fix #21647, so I am thinking that #21647 will probably prevent this from being resolved in this release.

@bmcdonald3
Copy link
Member

OK, if we don't want to stabilize with copies, it seems that our options are:

  1. mark map.these() unstable (I think that's the right 2.0 labelling?)
  2. have map.these() yield const ref, so it wouldn't be behaving as we'd expect, but it would be fixed eventually
  3. have map.these() yield ref, similar to the above, it would be incorrect today, but would be fixed eventually
  4. remove map.these() entirely for 2.0 and then add it once we get the referential tuples issues ironed out

It seems that marking unstable would be the most straightforward approach, but I am not sure what the stance around 2.0 is with things being marked unstable.

Also, I am trying to act on this during this sprint as map is set to be stabilized this release, but I'm not sure how acceptable it is to delay the map stabilization over this issue.

This is definitely not my area of expertise, so I would not be surprised if I am overlooking something here, but I can't think of any other approaches besides those listed above.

@lydia-duncan
Copy link
Member

Since most people on the team voted in favor of these continuing to exist, I think we would best honor that discussion by marking it unstable for 2.0

@vasslitvinov
Copy link
Member

Another option is to throw compiler magic at it. This way we will have map.these() and it will behave as desired. However users will not be able to reproduce it in a stable manner.

Short of that, need to make it "unstable". Ditto the items() iterator unstable because it is the same thing.

Non-magical Chapel code cannot yield the desired (const ref, ref) in the case where the map values are of primitive types like ints. This is because we cannot create a tuple that references an int that resides elsewhere, even with a const ref -- any user-created tuple will copy the int.

bmcdonald3 added a commit that referenced this issue Feb 22, 2023
[ reviewed by @lydia-duncan - thank you! ]

As discussed in #14718, what we would like both `Map.these()` and 
`Map.items()` to yield would be tuples of `(key, value)` pairs 
with return intent of `(const ref, ref)`, but, since that is not 
possible today, we are marking them both as unstable until the 
ability to return with the correct intent is enabled in Chapel.

- [x] paratest
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants