looking for nodes inside Groups #3523

ltalirz · 2019-11-07T20:53:47Z

Groups are a powerful tool for organising data in AiiDA - they can be created after the fact, modified, and they don't pollute the provenance.
One use case we have at the LSMO is to select data for visualizations (e.g. Materials Cloud DISCOVER sections), where they can replace multiple complicated (time-consuming) queries.

Groups have one downside, though: it can be difficult to find the node you are looking for in a group.

Here is an example:

$ verdi group show curated-cof_11040N2
-----------------  -------------------
Group label        curated-cof_11040N2
Group type_string  user
Group description  v2
-----------------  -------------------
# Nodes:
   PK  Type           Created
-----  -------------  ---------------
45690  Dict           24D:01h:56m ago
42318  Dict           24D:05h:28m ago
39798  WorkChainNode  24D:05h:49m ago
48582  Dict           23D:19h:35m ago
41719  Dict           24D:05h:30m ago
39794  WorkChainNode  24D:05h:49m ago
32063  Dict           27D:20h:19m ago
32018  CifData        27D:20h:31m ago
31962  WorkChainNode  27D:20h:39m ago
31961  Dict           27D:20h:39m ago
28710  WorkChainNode  28D:05h:49m ago
28708  Dict           28D:05h:49m ago
28494  CifData        28D:06h:51m ago

Obviously, it can be difficult to understand which node to pick here.

How about adding

a flag that adds a column with the Node label
a flag that allows to show the value of a certain attribute
a flag that allows to show the value of a certain extra

This could address the issue on the command line.

Then, there is a related issue in the python API - I load a group but then how can I easily address nodes within that group, e.g. by uuid/id/label (if unique)/extra (if unique)...?

Mentioning @giovannipizzi for comment
Mentioning @danieleongari for info

The text was updated successfully, but these errors were encountered:

ltalirz · 2019-11-08T09:10:57Z

Daniele mentions it would also be useful to be able to generate graphs from nodes in a group, which would become available e.g. via #3436

giovannipizzi · 2019-11-12T22:08:50Z

Re adding flags to verdi node show - I'm in principle ok!
Do you have an example of how you would like them to behave? E.g.
verdi node show -e volume would show the extra Volume (or None if not available, or leave an empty string, or you can specify a default e.g. with -e volume:-1, ...)?

Re "I load a group but then how can I easily address nodes within that group, e.g. by uuid/id/label (if unique)/extra (if unique)..." I think now the way is via the query builder. Also here, a suggestion of how you'd like to see this working would be useful. Maybe discussion could be joined with #3535 (more focused on nodes, but also here one could have a short syntax group.nodes_qb(f={'e.volume': {'<': 10}}) where .nodes_qb that would roughly return

QueryBuilder().append(Group, filters={'id': group.pk}).append(Node, filters={'extras.volume': {'<': 10}}, project='*')

so you can order, do .all(), .limit(), ...

Shorthand notations could be f= for filters, e. for filters on extras, a. for filters on attributes, l= for filter on labels, and I don't know if we need more short-hands (probably id and uuid are short enough).

I think the general question is is this way I'm suggesting of shortening the queries is eventually useful or more confusing - users' opinions would be very appreciated - as well as different easy but general ways of having easy queries

ltalirz · 2019-11-12T22:36:03Z

Re adding flags to verdi node show - I'm in principle ok!
Do you have an example of how you would like them to behave?

I guess you mean verdi group show.
how about directly piping through the field name, i.e. something like
verdi group show curated-cof_11040N2 --column uuid --column extras.abc
would add a two columns, one for the uuid and one for the extras.
Actually, perhaps it's even better to show only the columns uuid and extras.abc in this case. One can re-create the full view by adding --column PK , ... (for comparison, see e.g the openstack volume show cli. this one uses commas as separators, though, i.e. --column uuid,extras.abc).

If the field is not present, we need some indication (but no exception).
I guess an empty string would be good for the human eye but for grepping you rather want some string that indicates "missing".

I think now the way is via the query builder.

Adding a shorthand for the QueryBuilder may be useful (I agree that one has to make sure it's not too confusing). However, this comes at the cost of a query.

In our example, we know the nodes we need for a particular visualization and have put them into a group.
I thought it would be nice if we could have a performant way of loading them into a sort of dictionary that you can address (similar to what you would have with a folder & the file names in a file system).
E.g. we might want to say: give me a dictionary of nodes of this group, where the keys are the node uuid / node label / extra xyz / ...

Currently we are constructing this dictionary manually by looping over group.nodes and reading the extra of each node.
However, I believe this is quite inefficient - I don't know what the nodes iterator does (if it's clever it might do just one query but I didn't check) but certainly reading the extras will trigger one query per node.
In reality, I guess all of this could be done in a single query.

giovannipizzi · 2019-11-12T22:46:15Z

Well, you do that with a query similar to the one I wrote before, putting appropriate projections, no? This would do a single query.

ltalirz · 2019-12-06T14:41:47Z

Well, you do that with a query similar to the one I wrote before, putting appropriate projections, no? This would do a single query.

You're right - something like this:

def group_dict(group, attribute='id'):
    """Return dictionary of nodes inside group.

    Nodes can be addressed by any attribute shared by all nodes in the group.

    Note: This does a single query, and is therefore more performant than first loading all nodes and then reading their extras (=1 query per node).
    """
    qb=QueryBuilder().append(Group, filters={'id': group.pk})
    qb.append(Node, project=[attribute, '*'])
    res = qb.all()
    return { k: v for k,v in res }

We could make this function a bit more clever (e.g. to return only the nodes that have the specified attribute) and think about whether it is worth including in AiiDA, perhaps even on the Group object (something like Group.get_node_dict_by('uuid')).

I think using groups as "folders" is an important use case, and this is a first step towards addressing it.

ltalirz added type/feature request status undecided type/question may redirect to mailinglist labels Nov 7, 2019

sphuber added the topic/groups label Nov 8, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

looking for nodes inside Groups #3523

looking for nodes inside Groups #3523

ltalirz commented Nov 7, 2019 •

edited

Loading

ltalirz commented Nov 8, 2019

giovannipizzi commented Nov 12, 2019

ltalirz commented Nov 12, 2019

giovannipizzi commented Nov 12, 2019

ltalirz commented Dec 6, 2019 •

edited

Loading

looking for nodes inside Groups #3523

looking for nodes inside Groups #3523

Comments

ltalirz commented Nov 7, 2019 • edited Loading

ltalirz commented Nov 8, 2019

giovannipizzi commented Nov 12, 2019

ltalirz commented Nov 12, 2019

giovannipizzi commented Nov 12, 2019

ltalirz commented Dec 6, 2019 • edited Loading

ltalirz commented Nov 7, 2019 •

edited

Loading

ltalirz commented Dec 6, 2019 •

edited

Loading