Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alist, Plist, hashtable API #81

Open
Fuco1 opened this issue Mar 16, 2014 · 13 comments
Open

Alist, Plist, hashtable API #81

Fuco1 opened this issue Mar 16, 2014 · 13 comments
Labels
enhancement Suggestion to improve or extend existing behavior

Comments

@Fuco1
Copy link
Collaborator

Fuco1 commented Mar 16, 2014

I've already discussed this a little bit with Magnar but I need to clear my thoughts a bit.

So what I'm proposing we do is an API for alists, plists and hashtables (I'll call these "maps" later). The rationale is simple: these maps, especially the first two are used ubiquitously all through emacs, yet they lack a pleasant and consistent API. One example for all, assoc key alist but plist-get plist key---who the heck should remember all that! The naming is atrocious, the calling conventions random, the functionality... not very developed.

Dash has proved itself extremely useful and successful API for lists, that's one reason I want this to be "under the same label"---it will get better recognition and the more people use it the better, we will fix bugs and add more functionality much faster. It would probably make sense to put it in a different repo, because the documentation would get reaaaaaally long otherwise.

I would only focus on emacs 24. I feel that 23 is too old to bother with, and lexical scope is sexy. 24.4. is coming soon and the next one is probably going to be 25... hopefully Debian would catch up. However, if you feel we should also support e23, I'm not hardcore opposed against it, it would only make the code a bit more messy.

There are couple existing solutions like ht.el by @Wilfred and emacs-kv by @nicferrier, and while good on their own, they suffer the same weaknesses---the API is mostly contingent, solving problems the author felt most pressing. The advantage of one unified API is clear---user needs to learn only one system to manipulate all of these data structures. I'd be very happy if the above people contributed to this library, as well as other dash contributors.

I have a temporary repository https://github.com/Fuco1/petulant-dangerzone with a couple stubs and drafts, I'll update it sometime in the near future to reflect all the discussion below. But I'd like the real thing be under Magnar's account, simply because he has billion followers and that way we can spread this much faster (talk about parasitism!)

Here's the basic overview of what I have in mind.

Distribution

I'd distribute this as three separate packages dash-pl, dash-al and dash-ht/hash, but they should all reside in one repo, so the updates can be synchronized better. As you'd see later, the documentation for all three will also be the same, so people using any of these can come to the same place to read the specs. Finally, need of one doesn't automatically imply need for the other, so it's also a bit more polite not to dump billion things on the user.

Prefix

We will have a single consistent API available for all three, only differentiated by a prefix. One issue I'm not so sure about is whether we should also keep the dash prefix. On one hand, it's a "trademark" and easily recognizable sign of the dash & friends, but on the other many people dislike it, some to the extend of simply refusing to use dash because "it has ugly names". The prefixes could be al, pl and ht if we keep the - as well, in other case it'd need to be something longer, but we can't use alist, plist or hash because that's already used. Personally, I'd favor the - solution, because it solves the namespace problem efficiently and I doubt any other package would ever use it. So it's like a dedicated namespace for us.

Basic calling conventions

The first argument will always be the map itself, so we could use the -> macro from dash to thread it through expressions. The key will be the last argument for a simple reason: we want to allow nested operations, so the last argument will actually be of &rest type allowing more than one key, it will then do the lookup recursively. The other alternative was having the functions take key or '(key1 key2 ...), but I find this really user-unfriendly---the enumeration simply feels much more natural. The value, if any, will come before the key. This is a little bit unusual, but I feel well justified by the above. Plus, we can read it as "insert/put value at key" which feels more natural than "at key insert/put value" or some other readings. If there are any more arguments, they will come inbetween but in fixed order.

Naming

All the names will follow couple conventions, so that we get a consistent overall design. Generalized functions will be designated by various suffixes. I'll demonstrate it on function insert. By default, it would take a map, a value and a key, and update or insert the value at key. The keys would be compared by equal and the value would simply be inserted.

We can generalize it to allow other equivalences, not just equal. If the function takes an equivalence relation on keys, it will have a suffix -by. Therefore, insert-by would take a map, an equivalence, a value and a key.

Further, we could make the new value in the map as a combination of the supplied value and the old value. For example, how many times you simply wanted to bump a counter 1+ in a p/alist? I know I do that a lot. So we could have a -with variant that would take a a -> a -> a function, combining the supplied and old value to give a new one. For example, to bump a counter by 1, you would call it with function + and value 1. Simple insert can be realized with function (-const value).

And finally, we may want to combine not only the values but also the key. The suffix for that would be -withkey, which adds key as the first argument to the above function, making it k -> a -> a -> a.

The -by suffix can be also reasonably used on retrieval. Further we might want to get the value not by a key, but a general predicate. For example, in an alist like auto-mode-alist, we might want to get the first value where key regexp-matches the alist key. I propose -at suffix for this, reading "get value at key which matches predicate". The predicate would be of type k -> Bool.

These prefixes can be combined to give variety of more or less generalized functions, but always in same order: -with[key], -by/-at.

Functions which serve as predicates will end with -p as is the standard elisp style and will also be aliased to a version ending in ? as in schemes.

Arguments

Arguments would follow a simple naming scheme

  • the map will be called map, be it alist, plist or hash
  • the key will be called key &rest keys,
  • the value will be called value
  • the argument required by -with[key] will be fun
  • the argument required by -by will be equiv
  • the argument required by -at will be pred
  • the argument implementing ordering on keys will be called comp

This allows us to have a single documentation for all three (and possibly other structures if someone cares to implement them). In rare case where some operation might not be possible for a certain DS, we would simply add a note somewhere informing the user about this, e.g. "This function is not available for hash tables".

Note that equiv should always implement equivalence relation and comp should always implement a total ordering, returning non-nil when keys are equal for equiv and when first is \leq to the second for comp.

Anaphora

We can support anaphoras in the same way dash does, using it and other for the arguments. I have not yet decided what would be an anaphora for ternary functions (like the one used in -withkey).

Destructive variants

I think it makes sense to have both non-destructive and destructive versions. Destructive versions can simply always be marked by suffix ! as is common in scheme and other lisps. But from the start, I'd focus on implementing "pure" versions (which means we still allow shared sub-structure, but all the necessary conses will be re-created). Destructive versions would probably provide a bit of a speed up and sometimes make the operations more natural, e.g. there would be no need to reassign the result to the variable: (setq map (-operation map ...))

Conclusion

I've learned a lot about APIs by contributing to dash, mostly that it's damn difficult to design. So I don't blame emacs devs for the mess, but I feel after 30 years it's time for something new. That said, the potentiality to get this to core is basically none, because "we already have cl" or similar argument. Meh. I don't care about core anymore anyway.

Thoughts?

@spacebat
Copy link

I think its a great idea, though I'd prefer them in the same package because to me the central idea is the ability to switch between implementations easily. To this end I'd also like to see a version of each function where possible that doesn't care what you pass in and dispatches based on type checking. The dispatch could be done in a macro or inline function so as to minimize the overhead. Something like the following, which defaults to treating nil as a plist, which is the simplest of the structures:

   (if (listp map)
       (if (consp (car map))
           '-al-func
         '-pl-func)
     (if (hash-table-p map)
         '-ht-func
       (error "what to do with %s" map)))

@Fuco1
Copy link
Collaborator Author

Fuco1 commented Mar 16, 2014

There are some problems with dispatch, for example, what do you do when the value is nil? It can be both plist and alist. One would need to provide some type information, as an argument or, let's say, giving it a [alist foo] or something similar. It would not even work with non-nils, consider an alist/plist indexed by lists.

The snippet above makes some assumptions and if we averted the users that this is how they have to use it (in case of the "smart dispatch" version), it might be useful, but possibly lead to much confusion.

I'm not strictly opposed to have it all in one package, let's see what others think.

@Wilfred
Copy link
Contributor

Wilfred commented Mar 16, 2014

For what it's worth, Nic and I discussed merging ht.el and kv.el, but decided against it: Wilfred/ht.el#2

The problem with alists and plists is that it's not possible to write a function equivalent to clrhash or remhash (so you won't be able to write -pl-delete-by, for example). It's also not always possible to look at a list and know if it's an alist or plist (some lists are well-formed alists and plists!), so you can't write a single generic function.

Proper hash tables are generally faster and use less memory, so I favour using them wherever possible (naturally I'm biased :) ).

Hope that helps. I'm interested to see what API you develop. How do you feel that ht.el's API could improve?

@nicferrier
Copy link
Contributor

From my pov I agree with Fuco. kv is a pita but it's useful stuff. Particularly since the muppet deprecated aget etc.

@Fuco1
Copy link
Collaborator Author

Fuco1 commented Mar 16, 2014

@Wilfred I don't see why you wouldn't be able to write -pl-delete-by. If you clear the entire list, you'll get nil, just like -remove does.

This is not about making "better DS", it's more about making something for stuff that already exists everywhere in emacs. The entire org mode, dired, ibuffer, file handling, text properties, it's all just plists and alists.

I'm not saying ht.el API is not good, but that it's not unified ("compatible") with other libraries---in fact there are no other libraries! It's better to have one system of calling conventions for all maps than having three different APIs for each different structure.

Also, speaking for me, I'm sort of "meh" concerning the generic dispatch. Elisp doesn't have type system and beating it into it will always end up with pain.

The major inspiration here is Data.Map from Haskell, adapted where necessary for sillyness of elisp.

@spacebat
Copy link

Yes the generic dispatch idea is probably a non-starter. I don't think it too bad that it assume the keys of a plist to be atoms so long as that's documented, the real killer is the way lisp lists can share structure and as in the ht-kv discussion, even mutating conses can't make a client's list reference nil, so there will always be differences.

@Wilfred
Copy link
Contributor

Wilfred commented Mar 16, 2014

If you clear the entire list, you'll get nil, just like -remove does.

Ah, gotcha. I was thinking of mutation, since you can't mutate a list to be empty (dash uses macros to work around this e.g. !cdr).

Your API looks nice at my first skim through :). A few thoughts:

  • Looking at your foo vs foo-by convention for specifying an equality function, how will you handle hash tables? They already store an equality function, which may not be equal.
  • I'd prefer predicate functions to end with ?, e.g. -pl-null?.
  • I'm not sure about putting the key last, I would expect map key value for any key setting function.
    (-plist-insert user-details "chocolate" 'favourite-ice-cream) reads oddly to me.
  • If I want to look up nested keys I'd generally use ->, I worry the calling convention will be too clever otherwise. Forcing key to always be the last argument is rather restrctive. Another alternative for nested keys would be supporting generalised variables, to allow writing (setf (-pl-get (-pl-get user-details 'favourites) 'ice-cream) "chocolate")

I'm really interested to see what you come up with. Dash.el is very polished (I love the documentation<->examples<->tests arrangment) so I'm sure these mapping datastructure functions will benefit from being part of the project.

That's my bikeshedding, hope it helps.

@Fuco1
Copy link
Collaborator Author

Fuco1 commented Mar 17, 2014

  • For hashtables it simply doesn't make much sense. We can implement it by getting the key list, finding the proper equal key, then looking that one up the conventional way. This of course means any speed advantage of hash table is lost. But then, I don't expect anybody doing that with a hash-table, it'd be generally silly.
  • Agreed, I'll add it to the write up: predicates will end with -p or ? just like in dash.
  • Well, I can see how -> works for lookup, but what about insertion, deletion, modification... We can have (setf (-pl-get (-pl-get user-details 'favourites) 'ice-cream) "chocolate") or (-pl-insert "chocolate" 'favourites 'ice-cream). I find the second much cleaner. You can read it as "insert chocolate into favourite ice-creams". The idea here is to minimze the "crap" user would need to use around this to modify nested structure. Another solution is to add some macro to properly set up the calls (something like (-with-keys (favourite ice-creams) (-pl-insert "chocolate"))), or use a list argument: (-pl-insert '(favourites ice-cream) "chocolate"). I'm not against these solutions, they just all seem inferior to me. But I'm glad there's a discussion, chances are if you find it weird more people would. Though the setf thing I'd rather avoid, it's... a bit too verbose.

Forcing key to always be the last argument is rather restrctive.

In what sense?

@Wilfred
Copy link
Contributor

Wilfred commented Mar 22, 2014

For hashtables it simply doesn't make much sense.

Ah, OK. I presume then that the API for hash tables won't take a predicate the way alists and plists will.

In what sense?

If key (or &rest keys) is always your last argument, you're in trouble if you ever want to write a function with an argument after key. Unlikely perhaps, but you are losing that flexibility.

@alphapapa
Copy link

Is there still interest in this? I really like Fuco's idea in his last comment. It would make modifying values in nested plists so much nicer.

@Fuco1
Copy link
Collaborator Author

Fuco1 commented Apr 6, 2017

Sadly I don't have the time now that I would require to push this somewhere, not in the near future anyway.

A review of changes in the core would also be welcome, I'm not sure if there aren't some new libraries (like seq for example) which would obsolete this proposal.

@alphapapa
Copy link

alphapapa commented Apr 7, 2017

I looked briefly at seq and it doesn't have anything for plists.

I looked at your "dangerous" repo. ;) -pl-insert sure looks useful...

@alphapapa
Copy link

This may not exactly be relevant, but I just (re?)discovered https://github.com/nicferrier/emacs-kv The kvplist->merge function looks very useful for updating plists.

@Fuco1 Fuco1 added the enhancement Suggestion to improve or extend existing behavior label Jul 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Suggestion to improve or extend existing behavior
Projects
None yet
Development

No branches or pull requests

5 participants