Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elasticsearch connector documentation inconsistencies #12880

Closed
fastcatch opened this issue May 30, 2019 · 11 comments
Closed

Elasticsearch connector documentation inconsistencies #12880

fastcatch opened this issue May 30, 2019 · 11 comments
Labels

Comments

@fastcatch
Copy link

While trying to connect Presto to an Elasticsearch index we have found a few errors / inconsistencies / rooms for improvement in the documentation (version 0.220 at this point). In some cases I believe they are actually bugs (i.e. the code needs fixing) but I'm new to this and not always positive.

  1. In table definition files we have found that the schemaName is not optional. If not defined Presto fails to start with [...] java.lang.IllegalArgumentException: schemaName is null or empty [...]. (Since there is a default definition section for this at the connector level and even that is supposedly optional I guess this is a bug.)

  2. The example table definition has a hostAddress property in it; the description below the table uses host (and the running code also expects host). We first copy-pasted the example to fill in and couldn't understand for a while what might have gone wrong.

  3. When the columns are not defined (note: they are marked optional) Presto will start but queries on the table fail with 'Internal error'. (SELECT * ... type queries as well as SHOW COLUMNS IN ... query.) They work fine if columns are defined.

  4. Field descriptions are not really helpful, especially at Column Metadata. (Some of this may be because we have not been very deep in Elastic either, though.) It took us quite a while to figure what to use for type at table definition. The JSONPath is still a bit of a mystery to us (what is it for and how is it used). Also we found most of the column metadata to be required in practice in spite of being marked as optional.

  5. The type in the column definitions looks outright wrong: for example it requires to be defined as varchar for Elastic's string. Maybe it is right, though, just the description is misleading?

And last but not least: there is an open issue about the connector being able to process only all-lowercase Elastic field names at this point. It took us nearly a day to figure why Presto did not return rows for a table while properly reporting its size. (I.e. SELECT COUNT(1) FROM table reported 13407, SELECT COUNT(Id) FROM table reported NULL and we knew that the Id field was filled in for each row...) I suggest the until this gets resolved there should be an emphasized note in the docs about it.

@rschlussel
Copy link
Contributor

@fastcatch We'd love a pr to improve the documentation! Would you like to work on this?
cc: @zhenxiao

@fastcatch
Copy link
Author

fastcatch commented May 31, 2019

Sure. I can fix 2, 3 (if it's to be a doc-fix which I guess it is), and 5. I can give a go for 4 as well but I may be too much of a novice here on this. Shall I do a separate PR for each or one together?

1 should be a code fix I guess that I'd give a pass for now.

I can also update the docs on the lowercase fieldname issue but honestly it's such a bummer that I'd rather see it solved. However, if you feel it's to take much longer to get sorted, I'll add the notice.

@zhenxiao
Copy link
Collaborator

zhenxiao commented Jun 1, 2019

thanks for bring this up, @fastcatch
would be nice if you could submit PRs to fix the documentation. You could have one commit with message 'fix elasticsearch connector documentation'

We are working in progress to fix the lowercase field name:
#12791

1, 2 seems like documentation issues, host actually refers to hostname of search node
3 seems more like a bug, when designing it, columns definitions should be optional. It is OK to have documentation saying it is required if currently behaves so

@fastcatch
Copy link
Author

I may not be able to completely follow you; I understand that you propose this:

  1. (schemaName not optional): should be fixed in the docs. This is not very intuitive to me: what is the default-schema-name connector level parameter is there for then?
  2. (hostAddress): doc fix; OK, I'll do it.
  3. (columns definitions are not optional): doc fix; OK, I'll do it. Do I understand it correctly that they should be optional (but they are not now and I'll fix it in the docs like that)?
  4. (field descriptions): you did not mention but I guess it's up to me to improve :)
  5. (type description links to the wrong place): you did not instruct me on this
  6. thanks

@zhenxiao
Copy link
Collaborator

zhenxiao commented Jun 2, 2019

Hi @fastcatch

  1. could you please provide a detail stack trace? this seems like a bug
  2. let's continue with ur PR
  3. could you please provide a detail stack trace? after looking at it, we could decide wether temporarily fix doc, or we fix the bug directly
  4. yep, go ahead with a PR with ur improvement :)
  5. sorry, I did not quite follow, what do you think is incorrect there?

thank you for working on it :)

@fastcatch
Copy link
Author

@zhenxiao: I split 1 and 3 so that this thread should not blow up. I'll proceed with a PR to 2 and 4. I'll attempt a fix to 5 as well and then you can perhaps better understand what my problem is.

@gurinderiitr
Copy link

@fastcatch
I have defined the columns but when I query the data it gives back NoNodeAvailableException[None of the configured nodes are available
Below is the .json file. Any pointers?
{ "tableName": "poc", "schemaName": "es_local", "host": "127.0.0.1", "port": "9300", "clusterName": "docker-cluster", "index": "presto", "indexExactMatch": true, "type": "poc", "client.transport.sniff": "true", "username": "elastic", "password": "changeme", "columns": [ { "name": "db", "type": "varchar", "jsonPath": "db", "jsonType": "varchar", "ordinalPosition": "0" } ] }

@fastcatch
Copy link
Author

I'm not sure but my first guess is that your Elastic may not be running on 127.0.0.1:9300. Can you confirm that it is available there?

@gurinderiitr
Copy link

Yes, I am able to connect to ES on 127.0.0.1:9200 via 'elastic-search head' plugin.
I change the port to 9300 when connecting via command line. I tried both 9200 and 9300 though, but they throw the same error.

@zhenxiao
Copy link
Collaborator

zhenxiao commented Jul 2, 2019

Hi @gurinderiitr seems like Presto could not connect to Elasticsearch
are you sure you are running Elasticsearch locally or setting up proxy so that it is approachable via 127.0.0.1?

@stale
Copy link

stale bot commented Jul 2, 2021

This issue has been automatically marked as stale because it has not had any activity in the last 2 years. If you feel that this issue is important, just comment and the stale tag will be removed; otherwise it will be closed in 7 days. This is an attempt to ensure that our open issues remain valuable and relevant so that we can keep track of what needs to be done and prioritize the right things.

@stale stale bot added the stale label Jul 2, 2021
@stale stale bot closed this as completed Jul 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants