Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Operation result #774

Merged
merged 43 commits into from
May 21, 2018
Merged

Operation result #774

merged 43 commits into from
May 21, 2018

Conversation

beckjake
Copy link
Contributor

This PR is a superset of #768 and can replace it.

This generates a catalog through the new comand dbt catalog generate to create a file named "catalog.json" in the target directory, which has the column and table attributes discussed in #763. It uses the operation type internally instead of raw SQL embedded in the adapter's get_catalog method, and adds a run_operation method to adapters that accepts the manifest as an argument.

This feature is only implemented for PostgreSQL adapters.

Copy link
Member

@cmcarthur cmcarthur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is close, but there are some nitpicky things i want you to take a look at before this is ready.


# This is a lot of magic, have to know the magic name is 'catalog'.
# TODO: How can we make this part of the data set? Could we make it
# the operation's name/unique ID somehow instead?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment is out of date


# TODO: should I get the return value here in case future operations
# want to return some string? Jinja (I think) stringifies the results
# so it's not super useful. Status, I guess?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, i think this is fine as is. can you remove this comment?

Return an an AttrDict with three attributes: 'table', 'data', and
'status'. 'table' is an agate.Table.
"""
operation = manifest.find_operation_by_name(operation_name, None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i believe you can pass "dbt" as the package name here instead of None

# Abstract methods involving the flat graph
###
@classmethod
def get_catalog(cls, profile, project_cfg, run_operation):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's some naming inconsistency here. the comment should read manifest instead of flat graph, run_operation should be manifest to be consistent with the underlying implementations


def find_operation_by_name(self, name, package):
return self._find_by_name(name, package, 'macros',
[NodeType.Operation])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

love this

dbt/main.py Outdated
catalog_sub = subs.add_parser('catalog', parents=[base_subparser])
catalog_subs = catalog_sub.add_subparsers()
# it might look like catalog_sub is the correct parents entry, but that
# will cause weird errors about 'conflicting option strings'.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@drewbanin should this be docs? dbt catalog generate vs dbt docs generate?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was thinking dbt docs generate here. Might be worth changing the GenerateTask in generate.py to DocsTask and docs.py?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, I'll fix the catalog -> docs.

Is this actually a "docs task"? I assume the point of having generate be a subcommand of docs is that there are other docs tasks planned, right? If not, maybe this could just be dbt docs?


@property
def skipped(self):
return self.skip
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks to me like you don't need to override errored, failed, or skipped

return structured


# derive from BaseTask as I don't really want any result interpretation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment isn't necessary, if you feel that this is confusing, you should add docstrings to BaseTask and RunnableTask explaining the differences

# project (I think?) - it assumes that you have an existing and
# populated project['modules-path'], but 'catalog generate' shouldn't
# require that. It might be better to suppress the exception in
# dependency_projects if that's reasonable, or make it a flag.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when you say existing and populated project['modules-path'], does that mean that the directory exists, and that dbt deps has been run to pull in the project's dependencies?

imo, dbt deps should be run before dbt catalog generate -- I can't imagine this working any other way

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. As it is, this works in the sample talk project I have without having to run dbt deps - the only thing it needs the manifest for at all is an internally defined operation and internally defined macros (right?). So it shouldn't have to look at any non-global dependencies.

But maybe that could be because it's such a simple project, I don't know.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's just a side effect of it being such a simple project. There are cases where there are actually models in the dependency project that end up in the manifest, and the corresponding tables in the database should end up in the catalog as well. But I think this code is fine as written. It's perfectly fine to assert that the user has to run dbt deps before one or both of these commands will work even if it's not theoretically necessary to generate the catalog right now.

@cmcarthur
Copy link
Member

@beckjake this looks good to me.

@beckjake beckjake merged commit e9f5cdb into dev/betsy-ross May 21, 2018
@beckjake beckjake deleted the operation-result branch May 21, 2018 15:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants