Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AGE agtype_util.c] Fix issue #870 regarding orderability #994

Merged
merged 1 commit into from
Jun 29, 2023

Conversation

CapnSpek
Copy link
Contributor

Fixed issue #870
Odd behavior in context of orderability of different agtypes.

Implemented the solution suggested here:
#870 (comment)

@dehowef
Copy link
Member

dehowef commented Jun 15, 2023

Hey @CapnSpek , can you add some regression tests for this change that indicate the desired behavior and check for edge cases? I think this change might warrant further discussion. @jrgemignani @rafsun42 @muhammadshoaib Thoughts? Thanks

@rafsun42
Copy link
Member

@CapnSpek Are you following these documentation for orders:

Edge
Path
Map
Vertex
Edge
Array
String
Bool
Numeric, Integer, Float
NULL

@CapnSpek
Copy link
Contributor Author

@rafsun42 the issue only arose when I noticed the ambiguity in the position of edge in the documentation, regarding which I created an issue here apache/age-website#143

Then after discovering the odd behaviour I created the issue #870

In which I and John discussed how the original purpose of having orderability was "orderability was chosen over comparability in order to allow for the sorting, grouping, of output."

Based on this information, I extrapolated the currently present priority order (which did not include edge or path at all btw)
From

Object, Vertex, Array String, Bool, Numbers, Null

To

Object, (Path, Edge), Vertex, Array, String, Bool, Numbers, Null.

I reached on this decision because if the aim was orderability and grouping, then, it correctly orders types as:-

Objects, Collection of graph elements (path), Graph elements (edges and vertices), Array, String, Bool, Numbers, Null.

Because it only adds some priorities in between the previously defined priorities, and does not change the structure that previously was there, this change is backwards compatible, and fixes the bug adequately by defining the previously left ambiguous priorities.

I'll come up with regression tests soon.

@rafsun42 rafsun42 closed this Jun 16, 2023
@rafsun42 rafsun42 reopened this Jun 16, 2023
@rafsun42
Copy link
Member

@CapnSpek Sounds good. May I suggest the following order:

(Path, Edge, Vertex), (Object, Array), String, Bool, Numbers, Null.

I believe object refers to map. So, overall order would be: graph elements > composite types (map, list) > primitive types.

@CapnSpek
Copy link
Contributor Author

My only issue with it is that we're changing the original priorities that are currently part of the code and it can have some unwanted effects. In the other solution, only some priorities are added in between the existing ones.

I feel maybe having minimal changes to code can make it more backwards compatible and avoid any unwanted drawbacks.

What do you think?

@rafsun42
Copy link
Member

@CapnSpek I am okay with your argument. I am just curious about the unwanted effects. What are the side effects other than regression tests giving result in different order? For example, do we use the get_type_sort_priority function internally somewhere that changing it would break some other functionalities?

@CapnSpek
Copy link
Contributor Author

@rafsun42 I think not. I'll test it further and get back to you regarding this by Wednesday.

@CapnSpek
Copy link
Contributor Author

The function get_type_sort_priority is not used anywhere other than the function compare_agtype_containers_orderability which is the function that decides between 2 given agtypes (of the same type or different type) which one should be prioritized above other.

The function get_type_sort_priority is only used when deciding the orderability between different agtypes.
All other cases (of same agtypes) are handled with the function, or by other function compare_agtype_scalar_values.

I changed the get_type_sort_priority function to the suggested priority, reinstalled AGE, and ran the test cases. All the test cases have been passed.

It also solves all the problems that were given in issue #870 .

I think it is a good solution. Should I proceed to writing new regression tests?
@rafsun42

@jrgemignani
Copy link
Contributor

@CapnSpek All PRs should have [as complete as possible] regression tests for the work added, modified, or removed. You don't need to ask :)

@dehowef
Copy link
Member

dehowef commented Jun 22, 2023

Thank you for the putting in the work with your thorough explanation and documentation of this issue @CapnSpek . What you are saying makes sense to me.

@CapnSpek
Copy link
Contributor Author

While going through the regression tests relating to different agtypes (given in /regress/sql/agtype.sql )
I noticed all the tests are written in the following format: -

SELECT agtype_in( [some agtype ] ) < [other agtype;

I tried searching for a way to write an edge, or a vertex, or path that way, but I could not find anything.

So, I wanted to ask, would it be fine if I write the test case in the form of a cypher query?

@CapnSpek
Copy link
Contributor Author

I ran into a problem while testing it: -

Even after changing the priorities in the get_type_sort_priority, everything is good except for every query, object is the smallest agtype there is.

While debugging, I discovered this issue does not arise from that function, but instead from compare_agtype_containers_orderability function (agtype_util.c: line 231)

int compare_agtype_containers_orderability(agtype_container *a,
                                           agtype_container *b)
{
    agtype_iterator *ita;
    agtype_iterator *itb;
    int res = 0;

    ita = agtype_iterator_init(a);
    itb = agtype_iterator_init(b);

The function takes 2 agtype_container values as parameters (the two values to be compared).

Then it initializes an iterator for both, and declares the result variable in which the result of the comparison will be stored, then returned.

The issue takes place in the do-while loop: -

do
    {
        agtype_value va;
        agtype_value vb;
        agtype_iterator_token ra;
        agtype_iterator_token rb;

        ra = agtype_iterator_next(&ita, &va, false);
        rb = agtype_iterator_next(&itb, &vb, false);

The bottom two lines of the code above set both va.type & vb.type to AGTV_ARRAY in case of all agtypes except object, in which they set it to AGTV_OBJECT

The wrong AGTV_ARRAY assignment to all values only takes place in the first iteration of the loop. In the second iteration it corrects itself to the correct type.
The second iteration only takes place if both values have same agtype, thus this correction step is skipped when AGTV_OBJECT and AGTV_ARRAY are compared, and decision is readily made that AGTV_ARRAY > AGTV_OBJECT.

I believe it might be because agtype_container is an ARRAY or OBJECT type struct as written in its comment in agtype.h: line 224: -

/*
 * An agtype array or object node, within an agtype Datum.
 *
 * An array has one child for each element, stored in array order.
 *
 * An object has two children for each key/value pair.  The keys all appear
 * first, in key sort order; then the values appear, in an order matching the
 * key order.  This arrangement keeps the keys compact in memory, making a
 * search for a particular key more cache-friendly.
 */
typedef struct agtype_container
{
    uint32 header; /* number of elements or key/value pairs, and flags */
    agtentry children[FLEXIBLE_ARRAY_MEMBER];

    /* the data for each child node follows. */
} agtype_container;

Solution: -
Because it is a very special case (when both Agtypes differ, one is an array, other is an object)

The correction can be made into a special case below line 370 as follows

if(va.type == AGTV_ARRAY && vb.type == AGTV_OBJECT)
            {
                ra = agtype_iterator_next(&ita, &va, false);
            }
            else if(va.type == AGTV_OBJECT && vb.type == AGTV_ARRAY)
            {
                rb = agtype_iterator_next(&itb, &vb, false);
            }

That fixes the problem without any side effects.

@CapnSpek
Copy link
Contributor Author

@jrgemignani Can you take a look at this PR?

@jrgemignani
Copy link
Contributor

@CapnSpek Reviewed and requesting changes

@jrgemignani
Copy link
Contributor

@CapnSpek I would like to see some more regression tests for these changes. In particular, regression tests showing sorting/order by.

Copy link
Contributor

@jrgemignani jrgemignani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just add some more regression tests for sorting/order by.

@CapnSpek
Copy link
Contributor Author

So I wrote a test where a graph is created, then some vertices are added to that graph with a property having all the different agtypes for all the different vertices, and then I query for the vertices of that graph while also ordering them, and then the graph is dropped.

However, it seems it changes the graph_id for the creation of a graph in the next set of regression tests (catalog.out), so I would be modifying that as well here.

@jrgemignani
Copy link
Contributor

@CapnSpek

So I wrote a test where a graph is created, then some vertices are added to that graph with a property having all the different agtypes for all the different vertices, and then I query for the vertices of that graph while also ordering them, and then the graph is dropped.

However, it seems it changes the graph_id for the creation of a graph in the next set of regression tests (catalog.out), so I would be modifying that as well here.

You can add your new regression tests to the end of the sql file. Just make sure to state, in comments, what the regression tests are for, what they will do, and their expected results.

…dded regression tests

Fixed issue apache#870
 Odd behavior in context of orderability of different agtypes.

Implemented the solution suggested here:
apache#870 (comment)

Clearance given here:
apache#870 (comment)

-Added regression tests for both comparibility and orderability
@jrgemignani
Copy link
Contributor

@CapnSpek Looks good, adding @rafsun42 to get his opinion too.

Copy link
Member

@rafsun42 rafsun42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me as well. Thanks for adding the tests.

@CapnSpek
Copy link
Contributor Author

Thank you.
@jrgemignani Please merge.

@dehowef
Copy link
Member

dehowef commented Jun 29, 2023

I gave my review a week ago, but now that @rafsun42 and @jrgemignani have approved, I'll go ahead and merge it in.

@dehowef dehowef merged commit 88ead70 into apache:master Jun 29, 2023
M4rcxs pushed a commit to M4rcxs/age that referenced this pull request Jul 18, 2023
…dded regression tests (apache#994)

Fixed issue apache#870
 Odd behavior in context of orderability of different agtypes.

Implemented the solution suggested here:
apache#870 (comment)

Clearance given here:
apache#870 (comment)

-Added regression tests for both comparibility and orderability
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants