Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for cloud data to xdmod-shredder and xdmod-ingestor #739

Merged

Conversation

eiffel777
Copy link
Contributor

This PR adds support for shredding and ingesting cloud event logs in either the OpenStack or Eucalyptus(soon to be renamed Generic Cloud) log file formats.

For shredding the cloud log files two shredder classes are added that implement the shredDirectory function that is used when xdmod-shredder is called using the -d option. Shredding cloud logs by file using the -i flag is not supported and throws an exception.

For ingesting the shredded data a new command line option called datatype is added that allows you to specify what sort of data you want to ingest. Currently only openstack and genericcloud are supported.

The aggregate option now takes an optional parameter to specify what data you want to aggregate. Currently this is either jobs or cloud. If no parameter is specified it will attempt to aggregate both jobs and cloud data.

If no command line options are specified when running xdmod-ingestor then it will try to ingest both cloud and jobs data. There may be cases where only the jobs realm is being used but xdmod-ingestor with no command line options is run. In this example xdmod-ingestor will attempt to ingest jobs data and cloud data. In order to prevent errors from trying to ingest cloud data that doesn't exist the new function realmEnabled is used in order to only attempt ingestion of realms that are enabled.

There are some pipeline actions that needed to exist outside of the jobs pipelines becuase of foreign key constraints on tables used to ingest resources. This was done for actions pertaining to ingesting organizations and resource_types from their respective config files.

The runEtlPipeline function was added to /ETL/Utilities class. It is similar to the runEtlPipeline function in DataWarehouseInitializer but is not private and does not require you to instantiate the DataWarehouseInitializer class and the two other classes that are passed to the DataWarehouseInitializer constructor but not used in runEtlPipeline. In a subsequent pull request I will remove the runEtlPipeline function from the DataWarehouseInitializer class and point anything that uses it to the ETL\Utilities::runEtlPipeline().

Motivation and Context

We want to have a unified way of shredding and ingesting all types of data that xdmod needs.

Tests performed

Manual testing in docker and added new tests for shredder classes that test to make sure if a directory that does not exist is passed to the shredder then false is returned.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My code follows the code style of this project as found in the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

…d etl pipeline function. removed previously added command line flags
… Generalcloud.php to Genericcloud.php. replaces etl_overseer script calls in bootstrap.sh to use xdmod-shredder and xdmod-ingestor instead
…gest data for that realm. moved ingestion of organizations and resource_types to their own etl json files so they can be ingested independent of the jobs pipelines.
@eiffel777 eiffel777 self-assigned this Nov 30, 2018
@eiffel777 eiffel777 added Category:ETL Extract Transform Load Category:Cloud Cloud Realm labels Nov 30, 2018
@eiffel777 eiffel777 added this to the 8.1.0 milestone Nov 30, 2018
@@ -108,6 +111,7 @@ function main()
break;
case 'aggregate':
$aggregate = true;
$realmToAggregate = $value;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the docs the $value for an optional argument is false if it is not specified. Assuming the documentation is correct you'll need to check for === false rather than === null on line 287. Please confirm.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jpwhite4 Yeah, that does seem to be the case. It should be changed now.

@eiffel777 eiffel777 force-pushed the add-cloud-ingestion-to-ingestor-script branch from 902b7ba to e228e6d Compare December 3, 2018 19:06
bin/xdmod-ingestor Outdated Show resolved Hide resolved
@eiffel777 eiffel777 force-pushed the add-cloud-ingestion-to-ingestor-script branch from edd9609 to 64abdda Compare December 14, 2018 18:43
@@ -233,7 +233,7 @@ function main()
$logger->notice("Job errors written to '$jobErrorLogFile'");
}

if (!$dryRun) {
if (!$dryRun && in_array($format, array('pbs', 'slurm', 'lsf', 'sge', 'uge'))) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please can you explain the reason for this change.

@@ -323,4 +323,54 @@ public static function quoteVariables(array $variables, VariableStore $variableS

return $localVariableMap;
} // quoteVariables()

public static function runEtlPipeline(array $pipelines, $logger, array $params = array())
Copy link
Contributor

@plessbd plessbd Dec 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like this should be a separate pull request that also removes similar code from the code base:

$overseer = new EtlOverseer($overseerOptions, $this->logger);

$overseer = new EtlOverseer($overseerOptions, $this->logger);

$overseer = new \ETL\EtlOverseer($overseerOptions, $logger);

$overseer = new EtlOverseer($overseerOptions, $log);

Specifically so we dont have the many places this is done in our code base and we dont forget about it.

You do state in the description that there is a subsequent PR for this can you please link to it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the branch, https://github.com/eiffel777/xdmod/tree/move-runEtlPipeline-references-to-etl-utilities-class, that has the changes talked about in the last paragraph of the description. It's a bit behind xdmod8.1 right now so there is a some extra code in the diff that can be ignored

@@ -7,14 +7,14 @@

namespace OpenXdmod\Shredder;

use CCR\DB\iDatabase;

class OpenStack extends aCloud
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the filename match the class name OpenStack vs Openstack

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the catch. It's changed now.

private function realmEnabled($realm)
{
$realms = $this->warehouseDb->query("SELECT * FROM moddb.realms WHERE display = :realm", [':realm' => $realm]);
return (count($realms) > 0) ? true : false;
Copy link
Contributor

@plessbd plessbd Dec 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return (count($realms) > 0) ? true : false;
return (count($realms) > 0);

@@ -234,11 +234,16 @@ function main()
}

if (!$dryRun) {
$logger->notice('Normalizing data');
$logger->notice('Normalizing data!');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am so excited we are normalizing this data, before I was just fine.

@eiffel777 eiffel777 merged commit 0febec4 into ubccr:xdmod8.1 Dec 20, 2018
@eiffel777 eiffel777 added the new feature New functionality label Mar 28, 2019
@smgallo smgallo changed the title Add cloud data to xdmod-shredder and xdmod-ingestor Add support for cloud data to xdmod-shredder and xdmod-ingestor Apr 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Category:Cloud Cloud Realm Category:ETL Extract Transform Load new feature New functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants