Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hive metastore integration with installation files and a sample application #5

Merged
merged 1 commit into from
Apr 9, 2020
Merged

Hive metastore integration with installation files and a sample application #5

merged 1 commit into from
Apr 9, 2020

Conversation

srikumar003
Copy link
Collaborator

No description provided.

@srikumar003 srikumar003 changed the title Hive metastore turnkey Hive metastore integration with installation files and a sample application Feb 17, 2020
Copy link
Collaborator

@YiannisGkoufas YiannisGkoufas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have a look @srikumar003

Makefile Outdated Show resolved Hide resolved
examples/hive/k8s/Makefile Outdated Show resolved Hide resolved
release-tools/Makefile Outdated Show resolved Hide resolved
Copy link
Collaborator

@christian-pinto christian-pinto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't able of getting the Hive part running on minikube. Needs some more refinement before we can merge it.

examples/hive/k8s/README.md Outdated Show resolved Hide resolved
examples/hive/k8s/README.md Outdated Show resolved Hide resolved
examples/hive/k8s/README.md Outdated Show resolved Hide resolved
examples/hive/k8s/README.md Outdated Show resolved Hide resolved
Comment on lines 22 to 32
$ envsubst < conf/hive-site.tmpl | tee conf/hive-site.xml
```
```
$ envsubst < conf/metastore-site.tmpl | tee conf/metastore-site.xml
```
```
$ mv deploy/s3-secret.tmpl deploy/s3-secret.yaml
$ kubectl apply -f deploy/s3-secret.yaml
```
```
$ kubectl apply -f deploy/postgres-secret.yaml
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These steps could probably be automated as part of the installation. I would also consider driving this installation from the main Makefile.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Makefile for hive installation now takes care of all the configuration file modification. Fixed in https://github.com/srikumar003/dataset-lifecycle-framework/commit/2f40baf08e9f33f466e8c10283cf3cf4ce885ac4

examples/hive/k8s/Makefile Outdated Show resolved Hide resolved
examples/hive/k8s/Makefile Outdated Show resolved Hide resolved
examples/hive/k8s/Makefile Outdated Show resolved Hide resolved
Makefile Outdated Show resolved Hide resolved
Makefile Outdated Show resolved Hide resolved
DOCKER_REGISTRY_COMPONENTS="registry_used_in_installation"
HIVESERVER_IMAGE="hive-server:latest"

docker run -v $(PWD):/sampleapp -it ${DOCKER_REGISTRY_COMPONENTS}/${HIVESERVER_IMAGE} bin/beeline -u "jdbc:hive2://$HIVE_CLI_IP:$HIVE_CLI_PORT/;transportMode=http;httpPath=/cliservice" -f /sampleapp/sample.hql
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$(PWD) -&gt; ${PWD}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this last command returns:

20/03/13 09:26:22 [main]: WARN jdbc.HiveConnection: Failed to connect to 192.168.39.51:31321
Error: Could not open client transport with JDBC Uri: jdbc:hive2://192.168.39.51:31321/;transportMode=http;httpPath=/cliservice: Could not establish connection to jdbc:hive2://192.168.39.51:31321/;transportMode=http;httpPath=/cliservice: org.apache.http.conn.HttpHostConnectException: Connect to 192.168.39.51:31321 [/192.168.39.51] failed: Connection refused (Connection refused) (state=08S01,code=0)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable substitution fixed in https://github.com/srikumar003/dataset-lifecycle-framework/commit/74d2c763bc1bdb3190430c2ea89f60c7c6cfc999

Provided a convenience script for setting up sample app

Comment on lines 3 to 42
### On Minikube with Nooba
0. You would need an S3 client for this step. We will use `s3cmd` which you can get on MacOS by:
```
$ brew install s3cmd
```
and on Linux by using your distribution's package manager. For e.g., on Debian and Ubuntu:
```
$ apt-get install s3cmd
```
We need to configure `s3cmd` once. So, first, we'll have to gather the connection information:
```
$ export S3_HOST=$(minikube service s3 --url | head -n1 | sed -e 's#^http://##')
$ export NOOBA_HOME=<path_to_your_Nooba_installation>
$ export AWS_ACCESS_KEY_ID=$(${NOOBA_HOME}/noobaa status 2>/dev/null | grep AWS_ACCESS_KEY_ID | awk -F ": " '{print $2}')
$ export AWS_SECRET_ACCESS_KEY=$(${NOOBA_HOME}/noobaa status 2>/dev/null | grep AWS_SECRET_ACCESS_KEY | awk -F ": " '{print $2}')
```
Then, run
```
$ s3cmd --configure
```
which will open up an interactive session to input various S3 connection parameters stored in the above environment variables. Also, please ensure that you say no to the question about HTTPS connections.
When all that is done, then check your connection by executing

```
$ mc config host rm local
$ mc config host add local $(minikube service minio-service --url) minio minio123
$ s3cmd ls
```
which should return the buckets created by the example provided in the main installation guide.

To continue this example,

1. Create a bucket in S3 storage

```
$ mc mb local/book-test
$ s3cmd mb s3://book-test
```

2. Upload the given CSV file to the bucket

```
$ mc cp books.csv local/book-test
$ s3cmd put books.csv s3://book-test/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All this could be automated the same way we load example data in NooBaa for testing the regular datasets. There is a docker image wtih awscli that is built during the installation process and you could use that one to load the csv inside the new bucket.

See data-loader-noobaa.yaml and noobaa_install.sh functions build_data_loader and run_data_loader.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@YiannisGkoufas
Copy link
Collaborator

I am probably configuring something wrong in the hive example on the s3cmd --configure step.
While this s3cmd ls works and shows the buckets:

2020-03-23 10:48  s3://first.bucket
2020-03-23 10:48  s3://my-bucket-5fc7f163-aeba-459b-8b66-69caeeec3887

However, when I try to do this s3cmd mb s3://book-test I get the error:
ERROR: S3 error: 403 (InvalidAccessKeyId): The AWS Access Key Id you provided does not exist in our records.

As I understand the s3cmd --configure command creates a ~/.s3cfg file which stores the configuration parameters. Maybe showing an example of ~/.s3cfg might be helpful

@srikumar003
Copy link
Collaborator Author

I am probably configuring something wrong in the hive example on the s3cmd --configure step.
While this s3cmd ls works and shows the buckets:

2020-03-23 10:48  s3://first.bucket
2020-03-23 10:48  s3://my-bucket-5fc7f163-aeba-459b-8b66-69caeeec3887

However, when I try to do this s3cmd mb s3://book-test I get the error:
ERROR: S3 error: 403 (InvalidAccessKeyId): The AWS Access Key Id you provided does not exist in our records.

As I understand the s3cmd --configure command creates a ~/.s3cfg file which stores the configuration parameters. Maybe showing an example of ~/.s3cfg might be helpful

Removed the need for configuring s3cmd. We are using the Docker image for awscli that is wrapped in a convenience script to set up the sample application. Fixed in https://github.com/srikumar003/dataset-lifecycle-framework/commit/ae9a885eed1c985fabbad5155531a2ea7734cb03

@christian-pinto
Copy link
Collaborator

There is still an issue with the main makefile. Your branch contains the wrong values for registry, secret and namespace that make the process fail while installing on minikube.

@christian-pinto
Copy link
Collaborator

Looks good now!

@christian-pinto christian-pinto merged commit 7bea384 into datashim-io:master Apr 9, 2020
srikumar003 added a commit that referenced this pull request Jan 19, 2022
# This is the 1st commit message:

Error in datashim doi
# This is the commit message #2:

adding files generated by code-generator

# This is the commit message #3:

adding a test file

# This is the commit message #4:

changing apiclient to be its own module to handle dependencies

# This is the commit message #5:

updating datasetinternals to datasetinternal
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants