-
Notifications
You must be signed in to change notification settings - Fork 424
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should all the flavors (serial, parallel, distributed of a given std datatype (list, set, map) be defined in a single module? #22218
Comments
For the first question, "a single module" seems pretty clearly preferable to me—both to keep the module name short and sweet, and because a single type per module seems unattractive to me. For the second, my intuition would be to make it plural (since it would contain multiple lists), but I know sometimes people have different takes on that question. I'm going to tag @mppf specifically, who I think has asked this question at times (maybe even for one of these cases, maybe because at the time it only had one type in it). |
I wouldn't expect these to be in a single module for the same reasons that the array distributions are not all in a single module. I would expect the different implementations (serial, parallel, distributed) to be very different. Another concern is that if they are combined, to some degree, compiling something that has e.g. All that said, this is not a strong preference, and in particular, in the standard library I can see the argument for convenience / needing to choose fewer module names. Perhaps we would choose an in-between solution such as having submodules in different files for the different data structures & re-exporting them. I wouldn't change the name of the |
I was imagining an opposite argument to some of yours: That in trying to keep a coherent interface between the three variations on a type, having them in a single module would simplify that sort of maintenance rather than distributing it across files. I also imagine that there will likely be some shared code between at least some of the variations on these types. The compilation time point is concerning, but seems like something we should be able to optimize going forward, at least to an extent. The comparison to distributions doesn't hold for me because we imagine an arbitrary number of distributions going forward, but only three flavors of these standard types.
I'm open to that as well. |
We were considering support for Cyclic and BlockCyclic-style variants for list |
That still seems different to me—even if the standard modules support two flavors of distributed lists (whether as one type or two), that doesn't feel the same as the arbitrary number of distributions I expect arrays will support over time. To me, the better analogy to distributions would be "We don't want to put all collections in a single module because we'll support more and more collection types over time." |
I mean, depending on what other distributions we had in mind for arrays, I could see some of them being useful for lists as well. It just seems like it would be on a more case-by-case basis rather than applying to any distribution style. Put another way, in a bold distributed list future where it feels like a fully blessed and integrated type that we support, I would hope that any new distribution would consider whether it should be applied to both arrays and lists rather than only thinking about arrays. |
Prep for an upcoming ad-hoc subteam (anticipated to start during the August 22nd sprint) 1. What is the API being presented?This issue is about the module organization and naming of Map.chpl, List.chpl and Set.chpl, especially motivated by the anticipated addition of a parallel and distributed version of each. Map.chpl: module Map {
…
record map { … }
…
} List.chpl: module List {
…
record list { … }
…
} Set.chpl: module Set {
…
record set { … }
…
} There are two questions to resolve: How is it intended to be used?The answer to this is really intrinsic to the answer to those questions.
How is it being used in Arkouda and CHAMPS?Map:
List:
Set:
2. What's the history of the feature, if it already exists?The Set module was added in early August of 2019, and was originally named Sets. When we added the Map module in late August of 2019, we decided to rename it to match Map instead of the other way around. The List module originally contained a simple linked list implementation and was added in 2007. When we made the LinkedList module in March of 2019, we deprecated the List module in favor of it. There was also a Lists module added in May of 2019. When we renamed the Sets module in August of 2019, we also renamed the Lists module to List and removed the deprecated List module. The Map module was added in late August of 2019 to replace using associative arrays like maps. There was discussion around when it was added that led to us renaming the Lists and Sets modules and ensuring that we didn’t call the module Maps (#13749) 3. What's the precedent in other languages, if they support it?Other languages either don’t support parallel or distributed versions, or if they do, they support them as separate types living in separate places. Several languages provide these collection types by default, which means they can be thought of as in the same place but not using a plural name to access them (or any module name really). When there are multiple implementations for a type, they typically live in their own location, though there are examples of living in the same module as another, more commonly used type and using that type’s name for the general module name. a. Python Python doesn’t handle these collection styles in the same way we do. Dictionaries are provided as a core part of the language. b. C/C++ C/C++ don’t use the same namespace for headers and type names. So it’s a little apples-to-oranges of a comparison, but the map, list, and set types all are provided by c. Rust Rust has a collections module which contains submodules for the individual collection types (https://doc.rust-lang.org/std/collections/index.html). There’s separate modules when a type has multiple implementations, e.g. Rust also has a crate for handling data parallelism on the collections (https://docs.rs/rayon/latest/rayon/), and a crate specifically for concurrent hash maps (https://docs.rs/chashmap/2.2.2/chashmap/index.html) d. Swift There’s a community-contributed Concurrent Collections package that contains a concurrent dictionary (https://github.com/peterprokop/SwiftConcurrentCollections/blob/master/Sources/SwiftConcurrentCollections/ConcurrentDictionary.swift) which is by definition in a different location than the dictionary type Swift normally provides. e. Julia I believe Julia provides its collections by default as part of the Base module. There are three different dictionary types defined there (https://docs.julialang.org/en/v1/base/collections/#Base.Dict, https://docs.julialang.org/en/v1/base/collections/#Base.IdDict and https://docs.julialang.org/en/v1/base/collections/#Base.WeakKeyDict), and two set types (https://docs.julialang.org/en/v1/base/collections/#Base.Set and https://docs.julialang.org/en/v1/base/collections/#Base.BitSet) f. Go Couldn’t find a set or list type, there are lots of packages for separate current map implementations and serial maps are provided by default. 4. Are there known Github issues with the feature?
5. Are there features it is related to? What impact would changing or adding this feature have on those other features?The most natural feature to compare the collection types to is arrays and their distribution strategies. Today, the main crux of the array type is defined in I don’t anticipate any changes we make here impacting our array organization for two reasons:
The other features to compare this to are:
None of that category of features have been stabilized today. Some are precursors to what we intend for the parallel or distributed version of the collection types (and so might go away). Some will be considered for stabilization in the future and so will probably be adjusted to be in line with whatever decision we make today when we are ready to stabilize them. 6. How do you propose to solve the problem?For question A (Do we want the distributed and parallel versions to live in the same module as the serial version, or a separate one?): A1. Put the distributed and parallel versions in their own module when we add them
A2. Put the distributed and parallel versions in the same module as the serial implementation
For question B (If we want them to live in the same module, should the module name be plural?) B1. yes
B2. no
I think they should live in separate modules. If we do put them in the same module, it seems reasonable to me to rename the module to the plural form, though. |
I probably won't be able to attend the meeting, but here is my take on the two questions: A. Do we want the distributed and parallel versions to live in the same module as the serial version, or a separate one?I'd prefer to have them in separate modules, primarily based on Michael's point above about compilation time (i.e., A post 2.0 consideration: I'd prefer to have a more hierarchical module structure where other B. If we want them to live in the same module, should the module name be plural? E.g. Maps.chpl, etc.I don't think pluralizing is necessary from an aesthetic perspective. I.e., the following is not offensive to me: import Map.parallelMap;
I think this concern could be alleviated by summarizing the various collection types and providing links to them at the beginning of the module's documentation. The name of the module itself still wouldn't indicate that there are multiple types within but I don't think that particularly matters, as a new Chapel programmer who is looking for a Map collection — parallel, distributed or otherwise — would likely check in the Generally, I don't think the churn caused by renaming such common modules would be worthwhile in this case. |
In our meeting today, we decided that the additional implementations of a particular collection would live in their own module. We checked with Brad offline and he was okay with this decision. We did poll about if we would make the name plural if we did include them, and decided against doing so. Meeting notes:
module User {
use List; // not `use List only list;` or `import List.list;`
var parallelList: list;
proc foo() {
// attempts to use `parallelList` would maybe get confused?
}
}
A1. votes: (Lydia, Shreyas in favor; Ben M strongly in favor) B1. votes: (Ben: strongly opposed) |
Though we don't have more than drafts of them yet, our plan is to support three different versions of each of our standard datatypes: One for serial computing, one for parallel single-locale, and one for distributed multi-locale. This issue asks two questions:
Should these various variants be stored in a single module or a module per type? (e.g.,
List(s)
vs.List
,ParList
,DistList
)If the answer to the previous question is "a single module", should the name be singular or plural? (e.g.,
List
vs.Lists
)The text was updated successfully, but these errors were encountered: