Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

InvalidSyntax: Invalid input '{': expected whitespace, comment or a label name #56

Open
mohummedalee opened this issue Jan 29, 2016 · 8 comments

Comments

@mohummedalee
Copy link

I am using mongo-connector to do the initial bulk_upsert operation between MongoDB and Neo4J. At some point while querying with py2neo, the InvalidSyntax exception is occurring due to which nothing is being inserted into graph database. I believe the issue lies somewhere in the DocManager during syntax translations (which is why I'm raising the issue here). I am running py2neo v2.0.8 and Neo4J v2.3.1.
Here is the detailed stack trace:

Exception in thread Thread-2:
Traceback (most recent call last):
  File "//anaconda/lib/python2.7/threading.py", line 810, in __bootstrap_inner
    self.run()
  File "//anaconda/lib/python2.7/site-packages/mongo_connector/util.py", line 85, in wrapped
    func(*args, **kwargs)
  File "//anaconda/lib/python2.7/site-packages/mongo_connector/oplog_manager.py", line 256, in run
    docman.upsert(doc, ns, timestamp)
  File "//anaconda/lib/python2.7/site-packages/mongo_connector/doc_managers/neo4j_doc_manager.py", line 66, in upsert
    tx.commit()
  File "//anaconda/lib/python2.7/site-packages/py2neo/cypher/core.py", line 333, in commit
    return self.post(self.__commit or self.__begin_commit)
  File "//anaconda/lib/python2.7/site-packages/py2neo/cypher/core.py", line 288, in post
    raise self.error_class.hydrate(error)
InvalidSyntax: Invalid input '{': expected whitespace, comment or a label name (line 1, column 20 (offset: 19))
"MERGE (d:Document: { _id: {parameters}._id})"

This is the first time I'm raising an issue on Git so please go easy on me :)

@mohummedalee
Copy link
Author

I've been able to diagnose this a little. If a nested JSON document has its own _id field, this error occurs.

@johnymontana
Copy link

Thanks @mohummedalee for reporting this. Neo4j Doc Manager uses a key naming convention of xxx_id to identify relationships, where the value of a property with key xxx_id is assumed to be an id referencing a document in collection xxx. This convention allows us to define relationships from the document data model and is documented here I'm assuming that the error here is caused by Neo4j Doc Manager treating the nested document's _id field as a relationship, but not checking for a null collection name (since nothing appears before _id in the key). This is a bug and we'll add a check for this to avoid the Cypher syntax error.

However, this brings up another issue with the way Neo4j Doc Manager is handling node properties. When converting to a property graph model, each subdocument is extracted out into a node but keeps the _id property of the root document. Therefore, having an _id field on a subdocument becomes problematic when translating to the property graph model. So we will need to add some logic to deal with these naming collisions. Is it common practice to use an _id field in subdocuments in MongoDB?

@mohummedalee
Copy link
Author

Hi @johnymontana. Conventionally, a lot of people use the xxx_id format you specified. But I do think there should be some way to have an _id associated with a node so people can query for objects by their unique identifiers, I believe that is a fairly common scenario.

johnymontana added a commit that referenced this issue Feb 3, 2016
If a subdocument contains an _id property, do not treat it as a
reference/relationship, instead ignore the _id property as the
_id from the root level document should take precedence. See #56
@johnymontana
Copy link

Thanks for the feedback @mohummedalee Commit 8d8aab2 should solve the Cypher syntax error you experienced.

Regarding the collision of _id properties on both the root level document node and the subdocument node, I've decided to just ignore the _id property of the subdocument and continue to store the root level document _id. I'm open to discussing other approaches, though.

@johnymontana
Copy link

This fix is now available in the 0.1.1 version on PyPi.

@johnymontana
Copy link

I added a proposal for the behavior when encountering a subdocument _id property key in #57. Would appreciate any feedback. Would this fit you use case @mohummedalee ?

@mohummedalee
Copy link
Author

Sorry for replying late @johnymontana, after reviewing a bit of the architecture, we realized at my company that we needed to model relationships in a custom way. So we ended up writing our own DocManager taking help from your code (which is pretty well written thankfully).

Are you saying you're ignoring the subdocument _id altogether and dropping it off? If that's the case, it will circumvent the issue I reported but it won't let users query the subdocument by its _id. Please correct me if I'm wrong.

@kalami-f
Copy link

CALL gds.alpha.eigenvector.stream
({
node Projection: 'User' ,
relationship Projection: 'validated' ,
normalization: 'max'
})
YIELD node Id, score
RETURN gds.util.as Node(node Id).code As code, score
ORDER BY connections DESC limit 10

error:
Invalid input '{': expected "+" or "-" (line 2, column 2 (offset: 35))
"({"
^

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants