-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hive metastore integration with installation files and a sample application #5
Hive metastore integration with installation files and a sample application #5
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have a look @srikumar003
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't able of getting the Hive part running on minikube. Needs some more refinement before we can merge it.
src/dataset-operator/pkg/controller/dataset/dataset_controller.go
Outdated
Show resolved
Hide resolved
examples/hive/k8s/README.md
Outdated
$ envsubst < conf/hive-site.tmpl | tee conf/hive-site.xml | ||
``` | ||
``` | ||
$ envsubst < conf/metastore-site.tmpl | tee conf/metastore-site.xml | ||
``` | ||
``` | ||
$ mv deploy/s3-secret.tmpl deploy/s3-secret.yaml | ||
$ kubectl apply -f deploy/s3-secret.yaml | ||
``` | ||
``` | ||
$ kubectl apply -f deploy/postgres-secret.yaml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These steps could probably be automated as part of the installation. I would also consider driving this installation from the main Makefile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Makefile for hive installation now takes care of all the configuration file modification. Fixed in https://github.com/srikumar003/dataset-lifecycle-framework/commit/2f40baf08e9f33f466e8c10283cf3cf4ce885ac4
examples/hive/sampleapp/populate.sh
Outdated
DOCKER_REGISTRY_COMPONENTS="registry_used_in_installation" | ||
HIVESERVER_IMAGE="hive-server:latest" | ||
|
||
docker run -v $(PWD):/sampleapp -it ${DOCKER_REGISTRY_COMPONENTS}/${HIVESERVER_IMAGE} bin/beeline -u "jdbc:hive2://$HIVE_CLI_IP:$HIVE_CLI_PORT/;transportMode=http;httpPath=/cliservice" -f /sampleapp/sample.hql |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, this last command returns:
20/03/13 09:26:22 [main]: WARN jdbc.HiveConnection: Failed to connect to 192.168.39.51:31321
Error: Could not open client transport with JDBC Uri: jdbc:hive2://192.168.39.51:31321/;transportMode=http;httpPath=/cliservice: Could not establish connection to jdbc:hive2://192.168.39.51:31321/;transportMode=http;httpPath=/cliservice: org.apache.http.conn.HttpHostConnectException: Connect to 192.168.39.51:31321 [/192.168.39.51] failed: Connection refused (Connection refused) (state=08S01,code=0)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Variable substitution fixed in https://github.com/srikumar003/dataset-lifecycle-framework/commit/74d2c763bc1bdb3190430c2ea89f60c7c6cfc999
Provided a convenience script for setting up sample app
examples/hive/sampleapp/README.md
Outdated
### On Minikube with Nooba | ||
0. You would need an S3 client for this step. We will use `s3cmd` which you can get on MacOS by: | ||
``` | ||
$ brew install s3cmd | ||
``` | ||
and on Linux by using your distribution's package manager. For e.g., on Debian and Ubuntu: | ||
``` | ||
$ apt-get install s3cmd | ||
``` | ||
We need to configure `s3cmd` once. So, first, we'll have to gather the connection information: | ||
``` | ||
$ export S3_HOST=$(minikube service s3 --url | head -n1 | sed -e 's#^http://##') | ||
$ export NOOBA_HOME=<path_to_your_Nooba_installation> | ||
$ export AWS_ACCESS_KEY_ID=$(${NOOBA_HOME}/noobaa status 2>/dev/null | grep AWS_ACCESS_KEY_ID | awk -F ": " '{print $2}') | ||
$ export AWS_SECRET_ACCESS_KEY=$(${NOOBA_HOME}/noobaa status 2>/dev/null | grep AWS_SECRET_ACCESS_KEY | awk -F ": " '{print $2}') | ||
``` | ||
Then, run | ||
``` | ||
$ s3cmd --configure | ||
``` | ||
which will open up an interactive session to input various S3 connection parameters stored in the above environment variables. Also, please ensure that you say no to the question about HTTPS connections. | ||
When all that is done, then check your connection by executing | ||
|
||
``` | ||
$ mc config host rm local | ||
$ mc config host add local $(minikube service minio-service --url) minio minio123 | ||
$ s3cmd ls | ||
``` | ||
which should return the buckets created by the example provided in the main installation guide. | ||
|
||
To continue this example, | ||
|
||
1. Create a bucket in S3 storage | ||
|
||
``` | ||
$ mc mb local/book-test | ||
$ s3cmd mb s3://book-test | ||
``` | ||
|
||
2. Upload the given CSV file to the bucket | ||
|
||
``` | ||
$ mc cp books.csv local/book-test | ||
$ s3cmd put books.csv s3://book-test/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All this could be automated the same way we load example data in NooBaa for testing the regular datasets. There is a docker image wtih awscli that is built during the installation process and you could use that one to load the csv inside the new bucket.
See data-loader-noobaa.yaml and noobaa_install.sh functions build_data_loader
and run_data_loader
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Provided a convenience script for creating the sample application: https://github.com/srikumar003/dataset-lifecycle-framework/commit/ae9a885eed1c985fabbad5155531a2ea7734cb03
I am probably configuring something wrong in the hive example on the
However, when I try to do this As I understand the |
Removed the need for configuring |
There is still an issue with the main makefile. Your branch contains the wrong values for registry, secret and namespace that make the process fail while installing on minikube. |
Looks good now! |
# This is the 1st commit message: Error in datashim doi # This is the commit message #2: adding files generated by code-generator # This is the commit message #3: adding a test file # This is the commit message #4: changing apiclient to be its own module to handle dependencies # This is the commit message #5: updating datasetinternals to datasetinternal
No description provided.