Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MERGE creates incomplete vertices after the first one #1709

Closed
mingfang opened this issue Mar 20, 2024 · 25 comments
Closed

MERGE creates incomplete vertices after the first one #1709

mingfang opened this issue Mar 20, 2024 · 25 comments
Labels
bug Something isn't working

Comments

@mingfang
Copy link

mingfang commented Mar 20, 2024

when creating multiple vertices like this

SELECT * FROM cypher('playground', $$
  UNWIND [{first: 'jon', last: 'snow'}, {first: 'ned', last: 'stark'}] AS map
  MERGE (v:PERSON {first: map.first})
  SET v=map
  RETURN v
$$) AS (v agtype)

returns 2 vertices, but all the vertices after the first one will be incomplete. e.g.

SELECT * FROM cypher('playground', $$
  MATCH (v:PERSON)                                                           
  RETURN v
$$) AS (v agtype)

returns

{"id": 3940649673949198, "label": "PERSON", "properties": {"last": "snow", "first": "jon"}}::vertex
{"id": 3940649673949199, "label": "PERSON", "properties": {"first": "ned"}}::vertex

Note the first vertex is correct but the second vertex is missing the "last" field. Very strange.

@mingfang mingfang added the bug Something isn't working label Mar 20, 2024
@mingfang mingfang changed the title MERGE creates incomplete vertices if the RETURN clause does not include the created vertices MERGE creates incomplete vertices after the first one Mar 20, 2024
@anfa-majid
Copy link

The issue you're encountering with MERGE creating incomplete vertices stems from the way the MERGE and SET commands are being applied. In your query, MERGE is used with a condition that solely relies on the first property to identify unique vertices. However, this approach can lead to unexpected results when the first property alone does not uniquely identify a vertex, as MERGE will match existing vertices based on the first property and might not correctly associate the last property with subsequent vertices.

To ensure each vertex is accurately created or matched with its complete set of properties (first and last), you need to adjust the MERGE condition to include all properties that together uniquely define a vertex. The combination of first and last names uniquely identifies a PERSON vertex, therefore both should be included in the MERGE condition to prevent the creation of incomplete vertices.

@mingfang
Copy link
Author

@anfa-majid
In my specific test case above, first does uniquely identify the vertex.
I think the problem is much deeper(and more serious) than what you described.

@Akaza20-03
Copy link

I think we can fix this issue by explicitly assign each property indivisually in merge operation

//previous line
SET v=map

//updated line
SET v.first=map.first , v.last=map.last

In this updated query, after the MERGE operation, we set each property first and last separately from the map variable, which should ensure that all vertices have both "first" and "last" fields correctly set.

@mingfang
Copy link
Author

@Akaza20-03
No that does not work either.
Please try it for yourself.

@jrgemignani
Copy link
Contributor

@mingfang It looks like maybe my PR #1718 might have fixed this one too?! This is in the master branch, btw.

psql-16.1-5432-pgsql=# SELECT create_graph('playground');
NOTICE:  graph "playground" has been created
 create_graph
--------------

(1 row)

psql-16.1-5432-pgsql=# SELECT * FROM cypher('playground', $$
psql-16.1-5432-pgsql$#   UNWIND [{first: 'jon', last: 'snow'}, {first: 'ned', last: 'stark'}] AS map
psql-16.1-5432-pgsql$#   MERGE (v:PERSON {first: map.first})
psql-16.1-5432-pgsql$#   SET v=map
psql-16.1-5432-pgsql$#   RETURN v
psql-16.1-5432-pgsql$# $$) AS (v agtype);
                                                  v
-----------------------------------------------------------------------------------------------------
 {"id": 844424930131969, "label": "PERSON", "properties": {"last": "snow", "first": "jon"}}::vertex
 {"id": 844424930131970, "label": "PERSON", "properties": {"last": "stark", "first": "ned"}}::vertex
(2 rows)

psql-16.1-5432-pgsql=#

@jrgemignani
Copy link
Contributor

@mingfang Oops, nvm, sigh,...

psql-16.1-5432-pgsql=# SELECT * FROM cypher('playground', $$
psql-16.1-5432-pgsql$#   MATCH (v:PERSON)
psql-16.1-5432-pgsql$#   RETURN v
psql-16.1-5432-pgsql$# $$) AS (v agtype);
                                                 v
----------------------------------------------------------------------------------------------------
 {"id": 844424930131969, "label": "PERSON", "properties": {"last": "snow", "first": "jon"}}::vertex
 {"id": 844424930131970, "label": "PERSON", "properties": {"first": "ned"}}::vertex
(2 rows)

psql-16.1-5432-pgsql=#

@jrgemignani
Copy link
Contributor

@mingfang I will look into this one. It looks like it is creating the correct return tuple, just not inserting the correct tuple. I sometimes think that I should just rewrite merge altogether.

@jrgemignani
Copy link
Contributor

@mingfang Question - Just curious, why are you doing it this particular way? If you use merge as follows, it works -

psql-16.1-5432-pgsql=# SELECT * FROM cypher('playground', $$
  UNWIND [{first: 'jon', last: 'snow'}, {first: 'ned', last: 'stark'}] AS map
  MERGE (v:PERSON {first: map.first, last: map.last}) RETURN v $$) AS (v agtype);
                                                  v
-----------------------------------------------------------------------------------------------------
 {"id": 844424930131971, "label": "PERSON", "properties": {"last": "snow", "first": "jon"}}::vertex
 {"id": 844424930131972, "label": "PERSON", "properties": {"last": "stark", "first": "ned"}}::vertex
(2 rows)

psql-16.1-5432-pgsql=# SELECT * FROM cypher('playground', $$
  MATCH (v:PERSON)
  RETURN v
$$) AS (v agtype);
                                                  v
-----------------------------------------------------------------------------------------------------
 {"id": 844424930131971, "label": "PERSON", "properties": {"last": "snow", "first": "jon"}}::vertex
 {"id": 844424930131972, "label": "PERSON", "properties": {"last": "stark", "first": "ned"}}::vertex
(2 rows)

psql-16.1-5432-pgsql=#

I'm thinking this might be an issue with SET, and not MERGE.

@mingfang
Copy link
Author

mingfang commented Apr 3, 2024

My real use case is to create a graph from a SQL query and I don't know ahead of time what fields are returned.
This is my function to run SQL and return Agtype

-- Apache AGE load_sql
CREATE OR REPLACE FUNCTION public.load_sql(query agtype) RETURNS SETOF agtype AS $$
BEGIN
    RETURN QUERY EXECUTE FORMAT($sql$
        WITH query AS (%s) 
        SELECT agtype_build_list(query) 
        FROM query
    $sql$, query::text);
END
$$ LANGUAGE plpgsql VOLATILE;

Then in cypher is would call that function and use MERGE to create graph.

SELECT * FROM cypher('movies', $$
    UNWIND public.load_sql('
        SELECT * FROM movies
    ') AS row
    MERGE (v:MOVIE {title: row.title})
    SET v=row
    RETURN count(v)
$$) AS (v agtype);

@jrgemignani
Copy link
Contributor

@mingfang I traced the code and found it skipping the insert for the latter tuple. This looks to be a transaction/command id issue.

@diangamichael
Copy link

The issue you're encountering is because of the way you're merging and setting properties in your Cypher query. When you use MERGE, it will only create a new node if it doesn't already exist with the specified properties. However, in your case, you're using SET after MERGE, which can overwrite existing properties on the node.

To fix this issue and ensure that properties are correctly set for all nodes, you should use ON CREATE SET in addition to SET. This will only set properties if the node is newly created during the MERGE operation.

Here's the corrected query:

SELECT * FROM cypher('playground', $$
UNWIND [{first: 'jon', last: 'snow'}, {first: 'ned', last: 'stark'}] AS map
MERGE (v:PERSON {first: map.first})
ON CREATE SET v=map
SET v.last = map.last
RETURN v
$$) AS (v agtype);
With this modification, each node will have its properties correctly set, even if they already exist in the database.

@jrgemignani
Copy link
Contributor

@diangamichael Could you, in addition to your response, include an example using Apache AGE showing this solves the issue?

@jrgemignani
Copy link
Contributor

@mingfang Yeah, it appears to be an issue with transaction/command ids. It likely has to do with the handoff between MERGE and SET as both are updates.

@sir-sa
Copy link

sir-sa commented Apr 4, 2024

The issue you're encountering is due to the behavior of the SET clause in your Cypher query. When you use SET v=map, you're overwriting all properties of the node v with the properties from the map. If the map doesn't contain a certain property (like "last" in the case of the second iteration), it will effectively remove that property from the node.

To fix this issue, you can modify your query to conditionally set properties only if they exist in the map. Here's how you can do it:

SELECT * FROM cypher('playground', $$
UNWIND [{first: 'jon', last: 'snow'}, {first: 'ned', last: 'stark'}] AS map
MERGE (v:PERSON {first: map.first})
SET v.first = map.first
SET v.last = CASE WHEN map.last IS NOT NULL THEN map.last ELSE v.last END
RETURN v
$$) AS (v agtype);

@jtomek
Copy link

jtomek commented Apr 4, 2024

I second the idea of implementing ON CREATE and ON MATCH.
This would significantly help with business logic needed to prevent duplicates.

A few neo4j examples as food for thought:

MERGE (keanu:Person {name: 'Keanu Reeves'})
ON CREATE
  SET keanu.created = timestamp()
ON MATCH
  SET keanu.lastSeen = timestamp()
RETURN keanu.name, keanu.created, keanu.lastSeen
MERGE (person:Person)
ON MATCH
  SET
    person.found = true,
    person.lastAccessed = timestamp()
RETURN person.name, person.found, person.lastAccessed

@jrgemignani
Copy link
Contributor

@sir-sa Respectfully, that is not how SET works internally nor is it the cause of this issue. What is interesting, though, is that your query works. But, it also works without the conditional part, highlighting what I've stated and pointing again to transaction/command ids.

@jrgemignani
Copy link
Contributor

@jtomek ON CREATE or ON MATCH may be good to add to Apache AGE. You should consider opening a separate discussion thread about them to see what others might think.

Unfortunately, for this particular issue, this is just a bug that needs to be fixed.

@jrgemignani
Copy link
Contributor

@mingfang From the example above I have a possible workaround for you while I figure out what the issue is with the ids -

psql-16.1-5432-pgsql=# SELECT * FROM cypher('playground', $$ MATCH (u) DELETE u $$) AS (u agtype);
 u
---
(0 rows)

psql-16.1-5432-pgsql=# SELECT * FROM cypher('playground', $$ MATCH (u) RETURN u $$) AS (u agtype);
 u
---
(0 rows)

psql-16.1-5432-pgsql=# SELECT * FROM cypher('playground', $$
UNWIND [{first: 'jon', last: 'snow'}, {first: 'ned', last: 'stark'}, {first: 'jane', last: 'doe'}] AS map
MERGE (v:PERSON {first: map.first})
SET v=map
SET v=map
RETURN v
$$) AS (v agtype);
                                                  v
-----------------------------------------------------------------------------------------------------
 {"id": 844424930132012, "label": "PERSON", "properties": {"last": "snow", "first": "jon"}}::vertex
 {"id": 844424930132013, "label": "PERSON", "properties": {"last": "stark", "first": "ned"}}::vertex
 {"id": 844424930132014, "label": "PERSON", "properties": {"last": "doe", "first": "jane"}}::vertex
(3 rows)

psql-16.1-5432-pgsql=# SELECT * FROM cypher('playground', $$ MATCH (u) RETURN u $$) AS (u agtype);
                                                  u
-----------------------------------------------------------------------------------------------------
 {"id": 844424930132012, "label": "PERSON", "properties": {"last": "snow", "first": "jon"}}::vertex
 {"id": 844424930132013, "label": "PERSON", "properties": {"last": "stark", "first": "ned"}}::vertex
 {"id": 844424930132014, "label": "PERSON", "properties": {"last": "doe", "first": "jane"}}::vertex
(3 rows)

psql-16.1-5432-pgsql=#

@mingfang
Copy link
Author

mingfang commented Apr 4, 2024

@jrgemignani
omg your workaround does work!
With this workaround and your other recent fixes, I'm now unblocked.
You da man!

@jrgemignani
Copy link
Contributor

@mingfang Don't get too carried away ;) it is just a temporary workaround. We shouldn't need to run SET twice to get it to work.

@jrgemignani
Copy link
Contributor

jrgemignani commented Apr 4, 2024

@mingfang I found the issue and corrected it. Although, I need to review that it is correct and there aren't other areas that may have a similar issue.

Here is a sample output that also highlights something interesting. I'm curious of your thoughts -

psql-16.1-5432-pgsql=# SELECT * FROM cypher('playground', $$ MATCH (v) RETURN v $$) AS (v agtype);
 v
---
(0 rows)

psql-16.1-5432-pgsql=# SELECT * FROM cypher('playground', $$
UNWIND [{first: 'jon', last: 'snow'}, {first: 'ned', last: 'stark'}, {first: 'jane', last: 'doe'}, {first: 'ned', last: 'flanders'}] AS map
MERGE (v:PERSON {first: map.first}) SET v=map RETURN v $$) AS (v agtype);
                                                   v
--------------------------------------------------------------------------------------------------------
 {"id": 844424930132054, "label": "PERSON", "properties": {"last": "snow", "first": "jon"}}::vertex
 {"id": 844424930132055, "label": "PERSON", "properties": {"last": "stark", "first": "ned"}}::vertex
 {"id": 844424930132056, "label": "PERSON", "properties": {"last": "doe", "first": "jane"}}::vertex
 {"id": 844424930132055, "label": "PERSON", "properties": {"last": "flanders", "first": "ned"}}::vertex
(4 rows)

psql-16.1-5432-pgsql=# SELECT * FROM cypher('playground', $$ MATCH (v) RETURN v $$) AS (v agtype);
                                                   v
--------------------------------------------------------------------------------------------------------
 {"id": 844424930132054, "label": "PERSON", "properties": {"last": "snow", "first": "jon"}}::vertex
 {"id": 844424930132056, "label": "PERSON", "properties": {"last": "doe", "first": "jane"}}::vertex
 {"id": 844424930132055, "label": "PERSON", "properties": {"last": "flanders", "first": "ned"}}::vertex
(3 rows)

psql-16.1-5432-pgsql=# 

Note that the correct data is stored in the end.

@jrgemignani
Copy link
Contributor

jrgemignani commented Apr 5, 2024

@mingfang PR #1721 should address this issue. It just needs the prerequisite review and merge.

@diangamichael
Copy link

Let's close this bug since the earlier suggestion works perfectly 🙂🙂🙂

@jrgemignani
Copy link
Contributor

@diangamichael Thank you for your input. However, this issue is a bug and the suggestion is a temporary workaround. This issue cannot be closed until the issue is resolved and @mingfang, who opened the issue, is satisfied with the solution.

@mingfang
Copy link
Author

mingfang commented Apr 5, 2024

@jrgemignani
Confirm fixed(by your PR). Thanks!

@mingfang mingfang closed this as completed Apr 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

7 participants