Hive metastore integration with installation files and a sample application #5

srikumar003 · 2020-02-17T08:44:24Z

No description provided.

YiannisGkoufas

have a look @srikumar003

Makefile

examples/hive/k8s/Makefile

release-tools/Makefile

christian-pinto

I wasn't able of getting the Hive part running on minikube. Needs some more refinement before we can merge it.

src/dataset-operator/pkg/controller/dataset/dataset_controller.go

examples/hive/k8s/README.md

christian-pinto · 2020-03-06T11:05:23Z

examples/hive/k8s/README.md

+$ envsubst < conf/hive-site.tmpl | tee conf/hive-site.xml
+```
+```
+$ envsubst < conf/metastore-site.tmpl | tee conf/metastore-site.xml
+```
+```
+$ mv deploy/s3-secret.tmpl deploy/s3-secret.yaml
+$ kubectl apply -f deploy/s3-secret.yaml
+```
+```
+$ kubectl apply -f deploy/postgres-secret.yaml


These steps could probably be automated as part of the installation. I would also consider driving this installation from the main Makefile.

The Makefile for hive installation now takes care of all the configuration file modification. Fixed in https://github.com/srikumar003/dataset-lifecycle-framework/commit/2f40baf08e9f33f466e8c10283cf3cf4ce885ac4

examples/hive/k8s/Makefile

Makefile

src/dataset-operator/pkg/admissioncontroller/mutate.go

examples/hive/k8s/deploy/hiveserver.yaml

examples/hive/k8s/deploy/metastore.yaml

christian-pinto · 2020-03-13T09:13:06Z

examples/hive/sampleapp/populate.sh

+DOCKER_REGISTRY_COMPONENTS="registry_used_in_installation"
+HIVESERVER_IMAGE="hive-server:latest"
+
+docker run -v $(PWD):/sampleapp -it ${DOCKER_REGISTRY_COMPONENTS}/${HIVESERVER_IMAGE} bin/beeline -u "jdbc:hive2://$HIVE_CLI_IP:$HIVE_CLI_PORT/;transportMode=http;httpPath=/cliservice" -f /sampleapp/sample.hql


$(PWD) -> ${PWD}

Also, this last command returns:

20/03/13 09:26:22 [main]: WARN jdbc.HiveConnection: Failed to connect to 192.168.39.51:31321 Error: Could not open client transport with JDBC Uri: jdbc:hive2://192.168.39.51:31321/;transportMode=http;httpPath=/cliservice: Could not establish connection to jdbc:hive2://192.168.39.51:31321/;transportMode=http;httpPath=/cliservice: org.apache.http.conn.HttpHostConnectException: Connect to 192.168.39.51:31321 [/192.168.39.51] failed: Connection refused (Connection refused) (state=08S01,code=0)

Variable substitution fixed in https://github.com/srikumar003/dataset-lifecycle-framework/commit/74d2c763bc1bdb3190430c2ea89f60c7c6cfc999

Provided a convenience script for setting up sample app

christian-pinto · 2020-03-16T08:46:57Z

examples/hive/sampleapp/README.md

+### On Minikube with Nooba
+0. You would need an S3 client for this step. We will use `s3cmd` which you can get on MacOS by:
+```
+$ brew install s3cmd 
+```
+and on Linux by using your distribution's package manager. For e.g., on Debian and Ubuntu:
+```
+$ apt-get install s3cmd
+```
+We need to configure `s3cmd` once. So, first, we'll have to gather the connection information:
+```
+$ export S3_HOST=$(minikube service s3 --url | head -n1 | sed -e 's#^http://##') 
+$ export NOOBA_HOME=<path_to_your_Nooba_installation> 
+$ export AWS_ACCESS_KEY_ID=$(${NOOBA_HOME}/noobaa status 2>/dev/null | grep AWS_ACCESS_KEY_ID | awk -F ": " '{print $2}')
+$ export AWS_SECRET_ACCESS_KEY=$(${NOOBA_HOME}/noobaa status 2>/dev/null | grep AWS_SECRET_ACCESS_KEY | awk -F ": " '{print $2}')
+```
+Then, run
+```
+$ s3cmd --configure
+```
+which will open up an interactive session to input various S3 connection parameters stored in the above environment variables. Also, please ensure that you say no to the question about HTTPS connections.
+When all that is done, then check your connection by executing

 ```
-$ mc config host rm local 
-$ mc config host add local $(minikube service minio-service --url) minio minio123
+$ s3cmd ls 
 ```
+which should return the buckets created by the example provided in the main installation guide.
+
+To continue this example, 

 1. Create a bucket in S3 storage

 ```
-$ mc mb local/book-test
+$ s3cmd mb s3://book-test  
 ```

 2. Upload the given CSV file to the bucket 

 ```
-$ mc cp books.csv local/book-test
+$ s3cmd put books.csv s3://book-test/


All this could be automated the same way we load example data in NooBaa for testing the regular datasets. There is a docker image wtih awscli that is built during the installation process and you could use that one to load the csv inside the new bucket.

See data-loader-noobaa.yaml and noobaa_install.sh functions build_data_loader and run_data_loader.

Provided a convenience script for creating the sample application: https://github.com/srikumar003/dataset-lifecycle-framework/commit/ae9a885eed1c985fabbad5155531a2ea7734cb03

YiannisGkoufas · 2020-03-23T10:51:47Z

I am probably configuring something wrong in the hive example on the s3cmd --configure step.
While this s3cmd ls works and shows the buckets:

2020-03-23 10:48  s3://first.bucket
2020-03-23 10:48  s3://my-bucket-5fc7f163-aeba-459b-8b66-69caeeec3887

However, when I try to do this s3cmd mb s3://book-test I get the error:
ERROR: S3 error: 403 (InvalidAccessKeyId): The AWS Access Key Id you provided does not exist in our records.

As I understand the s3cmd --configure command creates a ~/.s3cfg file which stores the configuration parameters. Maybe showing an example of ~/.s3cfg might be helpful

srikumar003 · 2020-04-01T08:46:03Z

I am probably configuring something wrong in the hive example on the s3cmd --configure step.
While this s3cmd ls works and shows the buckets:
2020-03-23 10:48  s3://first.bucket
2020-03-23 10:48  s3://my-bucket-5fc7f163-aeba-459b-8b66-69caeeec3887
However, when I try to do this s3cmd mb s3://book-test I get the error:
ERROR: S3 error: 403 (InvalidAccessKeyId): The AWS Access Key Id you provided does not exist in our records.

As I understand the s3cmd --configure command creates a ~/.s3cfg file which stores the configuration parameters. Maybe showing an example of ~/.s3cfg might be helpful

Removed the need for configuring s3cmd. We are using the Docker image for awscli that is wrapped in a convenience script to set up the sample application. Fixed in https://github.com/srikumar003/dataset-lifecycle-framework/commit/ae9a885eed1c985fabbad5155531a2ea7734cb03

christian-pinto · 2020-04-02T12:11:49Z

There is still an issue with the main makefile. Your branch contains the wrong values for registry, secret and namespace that make the process fail while installing on minikube.

christian-pinto · 2020-04-09T15:25:56Z

Looks good now!

# This is the 1st commit message: Error in datashim doi # This is the commit message #2: adding files generated by code-generator # This is the commit message #3: adding a test file # This is the commit message #4: changing apiclient to be its own module to handle dependencies # This is the commit message #5: updating datasetinternals to datasetinternal

srikumar003 changed the title ~~Hive metastore turnkey~~ Hive metastore integration with installation files and a sample application Feb 17, 2020

srikumar003 requested a review from YiannisGkoufas February 17, 2020 08:44

YiannisGkoufas reviewed Feb 18, 2020

View reviewed changes

Makefile Outdated Show resolved Hide resolved

examples/hive/k8s/Makefile Outdated Show resolved Hide resolved

release-tools/Makefile Outdated Show resolved Hide resolved

YiannisGkoufas requested a review from christian-pinto March 4, 2020 11:20

christian-pinto requested changes Mar 6, 2020

View reviewed changes

christian-pinto reviewed Mar 12, 2020

View reviewed changes

Makefile Outdated Show resolved Hide resolved

christian-pinto reviewed Mar 12, 2020

View reviewed changes

Makefile Outdated Show resolved Hide resolved

christian-pinto reviewed Mar 12, 2020

View reviewed changes

src/dataset-operator/pkg/admissioncontroller/mutate.go Outdated Show resolved Hide resolved

christian-pinto reviewed Mar 13, 2020

View reviewed changes

examples/hive/k8s/deploy/hiveserver.yaml Outdated Show resolved Hide resolved

christian-pinto reviewed Mar 13, 2020

View reviewed changes

examples/hive/k8s/deploy/metastore.yaml Outdated Show resolved Hide resolved

christian-pinto reviewed Mar 13, 2020

View reviewed changes

christian-pinto reviewed Mar 16, 2020

View reviewed changes

christian-pinto approved these changes Apr 6, 2020

View reviewed changes

Hive Metastore Integration

92f5783

christian-pinto merged commit 7bea384 into datashim-io:master Apr 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hive metastore integration with installation files and a sample application #5

Hive metastore integration with installation files and a sample application #5

srikumar003 commented Feb 17, 2020

YiannisGkoufas left a comment

christian-pinto left a comment •

edited

Loading

christian-pinto Mar 6, 2020

srikumar003 Mar 31, 2020

christian-pinto Mar 13, 2020

christian-pinto Mar 13, 2020

srikumar003 Mar 31, 2020

christian-pinto Mar 16, 2020

srikumar003 Mar 31, 2020

YiannisGkoufas commented Mar 23, 2020

srikumar003 commented Apr 1, 2020

christian-pinto commented Apr 2, 2020

christian-pinto commented Apr 9, 2020

Hive metastore integration with installation files and a sample application #5

Hive metastore integration with installation files and a sample application #5

Conversation

srikumar003 commented Feb 17, 2020

YiannisGkoufas left a comment

Choose a reason for hiding this comment

christian-pinto left a comment • edited Loading

Choose a reason for hiding this comment

christian-pinto Mar 6, 2020

Choose a reason for hiding this comment

srikumar003 Mar 31, 2020

Choose a reason for hiding this comment

christian-pinto Mar 13, 2020

Choose a reason for hiding this comment

christian-pinto Mar 13, 2020

Choose a reason for hiding this comment

srikumar003 Mar 31, 2020

Choose a reason for hiding this comment

christian-pinto Mar 16, 2020

Choose a reason for hiding this comment

srikumar003 Mar 31, 2020

Choose a reason for hiding this comment

YiannisGkoufas commented Mar 23, 2020

srikumar003 commented Apr 1, 2020

christian-pinto commented Apr 2, 2020

christian-pinto commented Apr 9, 2020

christian-pinto left a comment •

edited

Loading