Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade Thrift and Client Libs #33

Merged
merged 36 commits into from
Jul 26, 2019
Merged

Upgrade Thrift and Client Libs #33

merged 36 commits into from
Jul 26, 2019

Conversation

xyu
Copy link
Contributor

@xyu xyu commented Jul 24, 2019

This repo is getting long in the tooth, let's upgrade Thrift and the associated libs for Hive/Impala and implement users for Impala.

xyu added 25 commits July 22, 2019 18:04
Now that we have cleaned up the namespace for thrift generated PHP files we no longer have collisions on the `Query` object so we can have both of the following:

```
\ThriftSQL\Query
\ThriftGenerated\Query
```

This cleans up the namespace a bit for our class.
This applies the diff in 61eb605 to our Thrift generated PHP files at build time.

> Rename `echo` method to `echo_str` in Beeswax Service
>
> This changes the interface of Beeswax exposed to PHP however it's needed
> because `echo` is a reserved keyword.
xyu added 2 commits July 24, 2019 19:09
Set a default username of `php-thrift-sql` when there is no username given for either Hive or Impala.

Also set a default password for Hive if none given. With SASL Hive just needs something set to not error out which prevents some common problems when using this lib. This closes #22 and closes #23.
@xyu xyu added this to the v0.3 milestone Jul 24, 2019
@xyu xyu marked this pull request as ready for review July 25, 2019 17:01
@xyu xyu requested review from anandnalya and bperson July 25, 2019 17:02
@xyu
Copy link
Contributor Author

xyu commented Jul 25, 2019

Summary of Changes

  • Added - Ability to send Impala queries with usernames
  • Added - Default usernames to help limit invalid configurations
  • Added - Automatically closing query handles after all results are read
  • Added - Canceling queries on the server when timeouts are reached
  • Upgraded - Libraries to Thrift@0.12.0, Hive@1.1.0, Impala@2.12.0
  • Improved - Switched to a different classmap generator to get rid of a needless monkey patch of Beeswax service interface
  • Improved - Better wrapping of underlying exceptions to ease debugging (Props @Napas / Fixed exceptions on some queries types. #15)
  • Improved - Reduced size of PHP namespace and reorder them to better avoid collisions
  • Improved - Strip down the distribution PHAR and removed extraneous artifacts from Packagist
  • Improved - A better build and contribution process

$this->_username = self::USERNAME_DEFAULT;
}
if ( empty( $this->_password ) ) {
$this->_password = 'ANY-PASSWORD';

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it ok to pass a non-null password values. Will this work when authentication is really enabled?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unsure, there's very little documentation on any of this... I'm basically piecing together what needs to be done with Thrift calls using just whatever is exposed in the interfaces and looking through HiveServer2/Impala server code to figure out what can be done. In theory this change only sets a random password if none is already passed in which fixes an invalid use case of setting a username but no password with SASL in Hive connections. I'm not sure what happens when secured auth is actually turned on. We'll need to test when that happens I guess.

Copy link
Contributor

@bperson bperson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a brew install thrift and a make away is much cleaner than rebuilding it from git commits

LGTM 👍

}

public function close() {
try {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we wrap all exceptions in a \ThriftSQL\Exception?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I'm just catching and ignoring with a finally only the Impala service documents that it can throw an exception on close but feels like we can safely ignore any potential issues as there's no way to actually recover from not being able to close a query handle. (Because that just means something is totally broken and the query handle is invalid anyway.)

build.patch Outdated
@@ -0,0 +1,24 @@

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we put an explanation somewhere as to the why we need to patch this?

I assume it's coming from Thrift generation? is there a relevant issue somewhere maybe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point I'll see if I can add some comments to the patch file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So instead of commenting on why we need the patch I committed 0e856d2 which switches out the the existing classmap generator so that we no longer need to do this patch.

insert_final_newline = true
charset = utf-8
indent_style = space
indent_size = 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:ewwww:

( it's Friday, I gotta troll :trollface: )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I thought about changing the indents here but figured maybe that's a good first bug for someone else at some point. Just matching with what the existing code is doing. :)

xyu added 2 commits July 26, 2019 17:03
Start using `theseer/autoload` instead of `jesseschalken/autoload-generator` to generate classmap used for autoloading the PHAR. This is done because the jesseschalken version is quite old now, last update 2016, and is using `nikic/php-parser` 1.4 which has a bug during tokenization. That bug causes `echo()` method definitions to be confused with `echo()` function redefinitions forcing us to use a monkey patch to change the Thrift Beeswax interface to avoid using `BeeswaxServiceIf::echo()`.

By switching out to the newer `theseer/autoload` classmap generator, last update May 2019, we get to use PHP's "native" `ext/tokenizer` extension instead which does a better job tokenizing and does not have a problem with `echo()` methods allowing us to get rid of the hacky monkey patch and follow the Thrift service defination more closely. The only downside of this new classmap generator does not force preloading of `Thrift/Transport/TSocketPool.php` which the classmap generator did due to the use of `function_exists('apc_fetch')` in that file. In theory this lazyloading should not matter in our particular case but may change if `TSocketPool` starts trying to do something fancier then a noop ployfill of APC.
@xyu xyu merged commit 8dc9f5b into master Jul 26, 2019
@xyu xyu deleted the upgrade/thrift-libs branch July 26, 2019 18:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants