Releases: apache/druid
Druid 0.7.3 - Stable
This release is mainly to get out dimension compression and rework the druid documentation. There are no update concerns with this version of Druid.
New Features
- Added support for Dimension compression of segment columns, enabled by default. Compression is applied to the column storing the dimension value indices, but not to the dimension values themselves. This change only applies to single value dimensions, multi-value dimensions are left uncompressed. With real-world data we have seen segments sizes reduced by 50% for some datasources, but actual compression ratios will vary based on the data. Sparse and repetitive columns will benefit the most, whereas more random and higher cardinality columns will benefit less. Old segments can be converted using the updated VersionConverterTask.
- Initial support for Microsoft Azure as a deep storage option has been added. Thanks @davrodpin!
Improvements
- Improved VersionConverterTask to allow for an IndexSpec and forced updates. This enables the ability to convert old segments to use dimension compression,
- Improved how the datasource metadata query filters on segments to scan.
Bug Fixes
- Ignore rows with invalid interval for index task (#1264)
- always re-upload snapshot self-contained jars to hdfs (#1261)
- Skip raising false alert when the coordinator loses leadership (#1224)
- Fix an issue that after broker forwards GroupByQuery to historical, havingSpec is still applied (#1292). Thanks @guobingkun!
- Fix type incorrect types for update sql statement for metadata storage (#1295). Thanks @anubhgup!
- fix serde issue when pulling timestamps from cache (#1304)
Documentation
- Reworked the Druid documentation such that it can be consumed in order.
- Many documentation fixes and improvements thanks to @bobrik, @infynyxx, @rasahner, @b-slim, @textractor, @truenorth, @gknapp, and others we may have missed!
Druid 0.7.1.1 - Stable
New Features
-
Group results by day of week, hour of day, etc.
We added support for time extraction functions where you can group by results based on anything DateTimeFormatter supports. For more details, see http://druid.io/docs/latest/DimensionSpecs.html#time-format-extraction-function .
-
Audit rule and dynamic configuration changes
Druid now provides support for remembering why a rule or configuration change was made, and who made the change. Note that you must provide the author and comment fields yourself. The IP which issued the configuration change will be recorded by default. For more details, see headers "X-Druid-Author" and "X-Druid-Comment" on http://druid.io/docs/latest/Coordinator.html
-
Provide support for a password provider for the metadata store
This enables people to write a module extension which implements the logic for getting a password to the metadata store.
-
Enable servlet filters on Druid nodes
This enables people to write authentication filters for Druid requests.
Improvements
-
Query parallelization on the broker for long interval queries
We’ve added the ability to break up a long interval query into multiple shorter interval queries that can be run in parallel. This should improve the performance of more expensive
groupBy
s. For more details, see "chunkPeriod" on http://druid.io/docs/latest/Querying.html#query-context -
Better schema exploration
The broker can now return the dimensions and metrics for a datasource broken down by interval.
-
Improved code coverage
We’ve added numerous unit tests to improve code coverage and will be tracking coverage in the future with Coveralls.
-
Additional ingestion metrics
Added additional metrics for failed persists and failed handoffs.
-
Configurable InputFormat for batch ingestion (#1177)
Bug Fixes
- Fixed a bug where sometimes the broker and coordinator would miss announcements of segments, leading to null pointer exceptions. (#1161)
- Fixed a bug where groupBy queries would fail when aggregators and post-aggregators were named the same (#1044)
- Fixed a bug where not including a
pagingSpec
in a select query generates a obscure NPE (#1165). Thanks to friedhardware! - "bySegment" groupBy queries should now work (#1180)
- Honor
ignoreInvalidRows
in reducer for Hadoop indexing - Download dependencies from Maven Central over https
- Updated MySQL connector to fix issues with recent MySQL versions
- Fix
timeBoundary
query on union datasources (#1243) - Fix Guice injections for
DruidSecondaryModule
(#1245) - Fix log4j version dependencies (#1239)
- Fix NPE when partition number 0 does not exist (#1190)
- Fix arbitrary granularity spec (#1214) and ignore rows with invalid interval for index task (#1264)
- Fix thread starvation in AsyncQueryForwardingServletTest (#1233)
- More useful ZooKeeper log messages
- Various new unit tests for things
- Updated MapDB to 1.0.7 for bugfixes
- Fix re-uploading of self-containted SNAPSHOT jars when developing on hadoop (#1261)
Documentation
- Reworked the flow of Druid documentation and fixed numerous errors along the way.
- Thanks to @infynxx for fixing many of our broken links!
- Thanks to @mrijke for many fixes with metrics and emitter configuration.
- Thanks to @b-slim, @bobrik and @andrewserff for documentation and example fixes
Misc
- Improved startup scripts thanks to @housejester.
Druid 0.7.0 - Stable
Updating to Druid 0.7.0 – Things to be Aware
-
New ingestion spec
Druid 0.7.0 requires a new ingestion spec format. Druid 0.6.172 supports both the old and new formats of ingestion and has scripts to convert from the old to the new format. This script can be run with 'tools convertSpec' using the same Main used to run Druid nodes. You can update your Druid cluster to 0.6.172, update your ingestion specs to the new format, and then update to Druid 0.7.0. If you update your cluster to Druid 0.7.0 directly, make sure your real-time ingestion pipeline understands the new spec.
-
MySQL is no longer the default metadata storage
Druid now defaults to embedding Apache Derby, which was chosen mainly for testability purposes. However, we do not recommend using Derby in production. For anything other than testing, please use MySQL or PostgreSQL metadata storage.
Configuration parameters for metadata storage were renamed from
druid.db
todruid.metadata.storage
and an additionaldruid.metadata.storage.type=<mysql|postgresql>
is required to use anything other than Derby.The
convertProps
tool can assis you in convertng all 0.6.x properties to 0.7 properties. -
Druid is now case-sensitive
Druid column names are now case-sensitive. We previously tried to be case-insensitive for queries and case-preserving for data, but we decided to make this change as there were numerous bugs related to various casing problems.
If you are upgrading from version 0.6.x:
- Please make sure the column casing in your queries matches the casing of your column names in your data and update your queries accordingly.
- One very important thing to note is that 0.6 internally lower-cased all column names at ingestion time and query time. In 0.7, this is no longer the case, however, we still strongly recommend that you use lowercase column names in 0.7 for simplicity.
- If you are currently ingesting data with mixed case column names as part of your data or ingestion schema:
- for TSV or CSV data, simply lower-case your column names in your schema when you update to 0.7.0.
- for JSON data with mixed case fields and if you were not specifying the names of the columns, you can use the
jsonLowerCase
parseSpec to lower-case the data for you at ingestion time and maintain backwards compatibility.
For all other parse specs, you will need to lower-case the
metric/aggregator names if you were using mixed case before.
-
Batch segment announcement is now the default
Druid now uses batch segment announcement by default for all nodes. If you are already using batch segment announcement, you should be all set.
If you have not yet updated to using batch segments announcement, please read this guide in the forum on how to update your current 0.6.x cluster to use batch announcement first.
-
Kafka 0.7.x removed in favor of Kafka 0.8.x
If you are using Kafka 0.7, you will have to build the
kafka-seven
extension manually. It is commented out in the build, because Kafka 0.7 is not available in Maven Central. The Kafka 0.8 (kafka-eight
) extension is unaffected. -
Coordinator endpoint changes
Numerous coordinator endpoints have changed. Please refer to the coordinator documentation for what they are.
In particular:
/info
on the coordinator has been removed./health
on historical nodes has been removed
-
Separate jar required for
com.metamx.metrics.SysMonitor
If you currenly have
com.metamx.metrics.SysMonitor
as part of yourdruid.monitoring.monitors
configuration and would like to keep it, you will have to add the SIGAR library jar to your classpath.Alternatively, you can simply remove
com.metamx.metrics.SysMonitor
if you do not rely on thesys/.*
metrics.We had to remove the direct dependency on SIGAR in order to move Druid artifacts to Maven Central, since SIGAR is currently not available there.
-
Update Procedure
If you are running a version of Druid older than 0.6.172, please upgrade to 0.6.172 first. See the 0.6.172 release notes for instructions.
In order to ensure a smooth rolling upgrade without downtime, nodes must be updated in the following order:
- historical nodes
- indexing service/real-time nodes
- router nodes (if you have any),
- broker nodes
- coordinator nodes
New Features
-
Long metric column support
Until now Druid stored all metrics as single precision floating point values, which could introduce rounding errors and unexpected results with queries using
longSum
aggregators, especially for groupBy queries. -
Pluggable metadata storage
MySQL, PostgreSQL, and Derby (for testing) are now supported out of the box. Derby only supports single master or should not be used for high availability production, use MySQL or PostgreSQL failover for that.
-
Simplified data ingestion API
completely redo Druid’s data ingestion API.
-
Switch compression for metric colums from LZF to LZ4
Initial performance tests show it may be between 15% and 25% faster, and results in segments about 3-5% smaller on typical data sets.
-
Configurable inverted bitmap indexes
Druid now supports Roaring Bitmaps in addition to the default Concise Bitmaps. Initial performance tests show Roaring may be up to 20% faster for certain types of queries, at the expense of segments being 20% larger on average.
-
Integration tests
We have added a set of integration tests that use Docker to spin up a Druid cluster to run a series of indexing and query tests.
-
New Druid Coordinator console
We introduced a new Druid console that should hopefully provide a better overview of the status of your cluster and be a bit more scalable if you have hundreds of thousands of segments. We plan to expand this console to provide more information about the current state of a Druid cluster.
-
Query Result Context
Result contexts can report errors during queries in the query headers. We are currently using this feature for internal retries, but hope to expand it to report more information back to clients.
Improvements
-
Faster query speeds
Lots of speed improvements thanks to faster compression format, small optimizations in column structure, and optimizations of queries with multiple aggregations, as well as numerous groupBy query performance improvements. Overall, some queries can be up to twice as fast using the new index format.
-
Druid artifacts in Maven Central
Druid artifacts are now available in Maven Central to make your own builds and deployments easier.
-
Common Configuration File
Druid now has a common.runtime.properties where you can declare all global properties as well as all of your external dependencies. This avoids repeated configuration across multiple nodes and will hopefully make setting up a Druid cluster a little less painful.
-
Default host names, port and service names
Default host names, ports, and service names for all nodes means a lot less configuration is required upfront if you are happy with the defaults. It also means you can run all node types on a single machine without fiddling with port conflicts.
-
Druid column names are now case sensitive
Death to casing bugs. Be aware of the dangers of updating to 0.7.0 if you have mixed case columns and are using 0.6.x. See above for more details.
-
Query Retries
Druid will now automatically retry queries for certain classes of failures.
-
Background caching
For certain types of queries, especially those that involve distinct (hyperloglog) counts, this can improve performance up over 20%. Background caching is disabled by default.
-
Reduced coordinator memory usage
Reduced coordinator memory usage (by up to 50%). This fixes a problem where a coordinator would sometimes lose leadership due to frequent GCs.
-
Metrics can now be emitted to SSL endpoints
-
Additional AWS credentials support, Thanks @gnethercutt
-
Additional persist and throttle metrics for real-time ingestion
This should help diagnose when real-time ingestion is being throttled and how long persists are taking. These metrics provide a good indication of when it is time to scale up real-time ingestion.
-
Broker initialization endpoint
Brokers now provides a status endpoint at /druid/broker/v1/loadstatus to indicate whether they are ready to be queried, making rolling upgrades / restarts easier.
Bug Fixes
- Support multiple shards of the same datasource on the same realtime node. Thanks @zhaown
- HDFS task logs should now work as expected. Thanks @flowbehappy.
- Possible deadlock condition fixed in the Broker.
- Various fixes for GZIP compression in returning results.
druid.host
should now support IPv6 addresses as well.
Documentation
- New tutorials.
- New ingestion documentation.
- New configuration documentation.
- Improvements to rule documentation. Thanks @mrijke
Known issues
- Merging segments with different types of bitmap indices is currently not possible, so if you have both types of indices in your cluster, you must set
druid.coordinator.merge.on
to false. ‘false’ is the default value of the config. - https://github.com/druid-io/d...
Druid 0.6.172 - Stable
Druid 0.6.172 fixes a few bugs to make the upgrade path towards Druid 0.7.0 seamless:
- Fixes ingestion schema forward-compatibility with 0.7.0
- Fixes dynamic worker configuration and worker affinity settings for the indexing service
Updating
If you are not already running 0.6.171, please see the 0.6.171 release notes for important notes on the upgrade procedure.
Druid 0.6.171 - Stable
Druid 0.6.171 is a bug fix stable mainly meant to enable a less painful update to Druid 0.7.0. Going forward, we will be backporting fixes to 0.6.x as required for the community and continuing to develop major features on 0.7.x.
Download
http://static.druid.io/artifacts/releases/druid-services-0.6.171-bin.tar.gz
Updating, Things to be Aware
Both this version and 0.7.0-RC1 provide much better out of the box support for PostgreSQL as a metadata store. In order to provide this functionality, we had to make some small changes to the way data is stored in metadata storage for MySQL setups.
Before updating to 0.6.171, please make sure that:
All Druid MySQL metadata tables are using UTF-8 encoding for all string/text columns,
The default character set for the Druid MySQL database has been changed to UTF-8.
Druid Coordinator and Overlord will refuse to start if the database default character set is not UTF-8.
To check column character encoding, use
SHOW CREATE TABLE <table>;
.
If the default table encoding is not UTF-8 or if any columns are encoded using anything other than UTF-8 you will need to convert those tables.
To check the database default encoding, use
SHOW VARIABLES LIKE 'character_set_database';
If you are not already using UTF-8 encoding for your columns, you can convert your tables and change the database default using the following commands. Please keep in mind that table conversion can take a while (order of minutes) and segment loading / handoff will be interrupted for the duration of the upgrade.
Make a backup of your database before performing the upgrade!
ALTER TABLE druid_config CONVERT TO CHARSET utf8;
ALTER TABLE druid_rules CONVERT TO CHARSET utf8;
ALTER TABLE druid_segments CONVERT TO CHARSET utf8;
ALTER TABLE druid_tasks CONVERT TO CHARSET utf8;
ALTER TABLE druid_tasklogs CONVERT TO CHARSET utf8;
ALTER TABLE druid_tasklocks CONVERT TO CHARSET utf8;
-- replace druid with your Druid database name here
ALTER DATABASE druid DEFAULT CHARACTER SET utf8;
Improvements
We introduced several query optimizations, mainly for topNs and HLLs
The overlord can now optionally choose what worker to send tasks to #904
Improved retry logic for realtime plumbers when handoffs fail during the final merge step
Bug Fixes
- Fixed searching with same value in multiple columns
- Fixed jetty defaults to increase number of threads and prevent lockups
- Fixed query/wait metrics being emitted twice
- Fixed default dimension exclusions for timestamp and aggregators in ingestion schema
- Fixed missing origin in cache key for period granularities
- Fixed default FilteredServerView to actually be filtered
- Fixed files not cleaning up correctly in segment cache directory
- Fixed results sometimes coming in out of order
- Fixed bySegment TopN queries not returning at the broker level
- Fixed a few bugs related to filtered aggregators
- Fixed crazy amounts of logging when coordinator loses leadership
- Updated jetty and spymemcached libraries for various fixes
- Fixed cardinality aggregator caching schema problem
- Fixed Coordinator and overlord '/status' page should not be redirected to the leader instances
- Made postgres actually work out of the box in 0.6.x
Druid 0.6.160 - Stable
Improvements
- Broker nodes now only start up after reading all information about segments in Zookeeper
- Nested groupBy queries should now work with post aggregations.
- Nested groupBy queries should now work with complex metrics.
- The overlord in the indexing service can now assign tasks to workers based on strategies.
- Local firehose can now find all files under a directory.
- Timestamp and metrics are now automatically added to dimension exclusions.
- Improved failure handling during real-time hand-offs.
- Parallel downloading of segments. Multiple threads can now be used to download segments from deep storage.
- Segments can be announced and queried as a node is initially loading up.
- Native filtered aggregators for selector type filters
- Custom Broker selection strategy for Router can now be written in JavaScript
Documentation
- Example Hadoop Configuration now available
- Best Practices and Recommendations now updated
- Experimental Router node is now documented (druid.io/docs/latest/Router.html).
- Local firehose is now documented (http://druid.io/docs/latest/Firehose.html).
- Numerous improvements to FAQS, segment metadata docs improved, ingest firehose docs improved, full - cluster view explained. Thanks @pdeva!
- Updates to Cassandra documentation. Thanks @lexicalunit!
Bug Fixes
- Added a workaround for a jetty half open connection issue that appears when client connections terminate a long running query. The symptoms when this bug appears are that the cluster appears stuck and unresponsive. Another workaround for this issue is to simply use query context timeouts.
- Fixed merging results from partitions with time gaps, which could cause out of order unmerged results (#796).
- HDFS should now work for non-default filesystems. Thanks @flowbehappy!
- Multiple spatial dimensions can now be ingested.
- Fixed a bug with approximate histograms not working with groupBy queries.
- Fixed last 8kb not working for non-s3 task logs.
- Fixed dynamic configuration not working for replication throttling.
- Fix search queries throwing exceptions if querying for non-existing dimensions
- Fix ingest firehose breaking for non-present dimensions.
- Select queries now work if you specify non-existing dimensions (#778)
- groupBy cache now works with complex metrics
- Fixed some serde problems that existed with RabbitMQ (#794)
Druid 0.6.146 - Stable
New features
- Reschema capabilities added. You can now ingest an existing Druid segment and change the name, dimensions, metrics, rollup, etc. of the segment. (More info: http://druid.io/docs/0.6.146/Ingestion-FAQ.html)
- Approximate histograms and quantiles. We’ve open sourced a new module, druid-histogram that includes a new aggregator to build approximate distributions and can be used for quantiles. Depending on the accuracy of the desired results, this aggregator can be slower than the other Druid aggregators. This features is still somewhat experimental, but we would really love to work with the community to make it more production stable.
(More info: http://druid.io/docs/0.6.146/ApproxHisto.html) - Query timeout and cancellation. You can now specify an optional “timeout” key and a long value in the Druid query context to cancel queries that have been running for too long. You can also issue explicit query cancellation.
(More info: http://druid.io/docs/0.6.146/Querying.html) - groupBy and select query caching (disabled by default). Select and groupBy queries do not cache by default. This is to prevent large result sets from these queries overflowing the cache. However, if your workload generates groupBy results of reasonable size and you’d like to enable the cache for these queries, you can override the default values for druid.*.cache.unCacheable (http://druid.io/docs/0.6.146/Broker-Config.html).
- Middle-managers can now be blacklisted. This allows for rolling updates of middleManagers. See new docs on rolling Druid updates. (http://druid.io/docs/0.6.146/Rolling-Updates.html)
- S3 credentials can now be read from file. Thanks @metacret!
- HDFS task logs for the indexing service now supported. Thanks @realfun!
- Index tasks now support manual specification of shardSpecs and the ability to skip the determine partitions step.
- TimeBoundary queries can now return just the max or min time.
http://druid.io/docs/0.6.146/TimeBoundaryQuery.html
Improvements
- Nested groupBy queries now support post aggregators and all functionality of normal groupBy queries.
- groupBy queries now support cardinality aggregators.
- Port finding strategies for peons are smarter and can now reuse ports.
- Existing complete sinks will now try to be handed off much sooner after real-time updates or restarts.
- More flexible userData for indexing service autoscaling on EC2 that is no longer tied to our deployment environment.
- The async logic in the Druid router was improved significantly.
- Routers now support optional routing strategy overrides.
- Druid 0.6.x deployments now work with Apache Whirr. We are going to create a way of deploying Druid with docker soon as well.
- Cleaned up some redundant configs in the indexing service.
- A whole bunch of query and caching unit tests were added.
- Explicit job properties can now be added for Hadoop ingestion tasks.
Docs
- There are now docs about how to do rolling Druid updates and restarts.
http://druid.io/docs/0.6.146/Rolling-Updates.html - New docs for configuring logging in Druid.
http://druid.io/docs/0.6.146/Logging.html - Kafka 8 docs now added. Thanks @r4j4h
http://druid.io/docs/0.6.146/Kafka-Eight.html - Added docs for inverted topNs
http://druid.io/docs/0.6.146/TopNMetricSpec.html#inverted-topnmetricspec - Updated Cassandra documentation. Thanks @lexicalunit
https://github.com/metamx/druid/pull/680
Misc
- Curator version bumped to 2.6.0
- Jetty version bumped to 9.2.2
- Guava version bumped to 16.0.1
- Logging for coordinator and historical nodes is now less verbose
Druid 0.6.121 - Stable
This is a small release with mainly stability and performance updates.
Updating
- If updating from 0.6.105, no particular steps need to be taken.
- If updating from an older release, see the notes for Druid 0.6.105
Release Notes
New features
- new cardinality estimation aggregator: uses hyperUnique (the optimized HyperLogLog aggregator) to estimate the cardinality of a dimension
- we have completely redone the ingestion schemas to consolidate batch and real-time ingestion. Everything is backwards compatible for the time being, and we hope to have new examples and tutorials that show how to use the new schema. It should hopefully simplify ingestion.
- alphanumeric sorted topNs
- a new union query (right now this only works if there are commonly named columns and metrics among your datasources)
- allow config-based overriding of Hadoop job properties for batch ingestion
- multi-threaded the coordinator cost balancing algorithm for faster load balancing decisions (the number of threads to use is dynamically configurable, it is 1 by default)
- added a context parameter to force a 2-pass topN optimization algorithm (previously this was done a heuristic that was rarely used)
- additional coordinator endpoints to return more info about cluster state
Improvements
- improved real-time ingestion memory usage. Depending on the number of total segments in your cluster, much less memory can now be used for real-time ingestion.
- faster batch ingestion when there are numerous individual raw data files. Thanks @deepujain.
- more resilient rabbitMQ firehoses. Thanks @tucksaun.
- JavaScript aggregator now supports multi-valued dimensions.
- inverted topN now works with lexicographic sorting
- lexicographic topN now supports dimension extraction functions
Bug Fixes
- several fixes for hyperUnique aggregator where large errors in estimates could be reported in certain edge cases
- fixed an edge case race condition in the coordinator where it could load/drop segments incorrectly when disconnecting/reconnecting from Zookeeper
- fixed an edge condition with real-time ingestion where a bad sink can be created with delayed events
- updated jetty to 9.1.5 for a fix of a half-open connection problem that occurs occasionally (it’s been extremely difficult for us to reproduce this -- but when it occurs nodes appear to have their jetty threads stalled while writing to a channel that is already closed)
- fixed a bug where cached results would get combined in arbitrary order
- fixed additional casing bugs
- Druid now passes tests with Java 8
Documentation
- new documentation about possible hardware for production nodes and configuration for them. Look for more improvements to configuration coming soon.
- Fixed several broken links in docs. Thanks @jcollum.
druid-0.6.120
[maven-release-plugin] copy for tag druid-0.6.120
Druid 0.6.105 - Stable
Updating
When updating Druid with no downtime, we highly recommend updating historical nodes and real-time nodes before updating the broker layer. Changes in queries are typically compatible with an old broker version and a new historical node version, but not vice versa. Our recommended rolling update process is:
- indexing service/real-time nodes
- historical nodes (with a wait in between each node, the wait time corresponds to how long it takes for a historical node to restart and load all locally cached segments)
- broker nodes
- coordinator nodes
Release Notes
- Historical nodes can now use and maintain a local cache (disabled by default). This cache can either be heap based or memcached. This allows historical nodes to merge results locally and reduces much of the memory pressure seen on brokers while pulling a large number of results from the cache. Populating the cache is also now done in an asynchronous manner.
- Experimental router node. We’ve been experimenting with a fully asynchronous router node that can route queries to different brokers depending on the actual query. Currently, the router node makes decisions about which broker to talk to based on rules from the coordinator node. It is our goal to at some point merge the router and broker logic and move towards hierarchical brokers.
- Post aggregation optimization. We’ve optimized calculations of post aggregations (previously post aggs were being calculated more than necessary). In some initial benchmarks, this can lead to 20%-30% improvement in queries that involve post aggregations.
- Support hyperUnique in groupBys. We’ve fixed a reported problem where groupBys would report incorrect results when using complex metrics (especially hyperUnique).
- Support dimension extraction functions in groupBy
- Persist and persist-n-merge threads now no longer block each other during real-time ingestion. We added a parameter for throttling real-time ingestion a few months ago, and what we’ve seen is that very high ingestion rates that lead to a high number of intermediate persists can be blocked while waiting for a hand-off operation to complete. This behavior has now been improved. You are also now able to set maxPendingPersists in the plumber.
- hyperUnique performance optimizations: ~30-50% faster aggregations
Miscellaneous other things
- Fix integer overflow in hash based partitions
- Support for arbitrary JSON objects in query context
- Request logs now include query timing statistics
- Hadoop 2.3 support by default
- Update to Jetty 9
- Do not require valid database connections for testing
- Gracefully handle NaN / Infinity returned by compute nodes
- better error reporting for cases where the ChainedExecutionQueryRunner throws NPEs
Extensions:
- HDFS Storage should now work better with Cloudera CDH4
- S3 Storage: object ACLs now consistently default to "bucket owner full control"