Skip to content

Commit

Permalink
Merge pull request #2481 from fishtown-analytics/fix/docs-perf
Browse files Browse the repository at this point in the history
Improve docs performance (#2480)
  • Loading branch information
beckjake authored May 22, 2020
2 parents f4c6bf3 + 0d825eb commit 6dac4c7
Show file tree
Hide file tree
Showing 15 changed files with 392 additions and 199 deletions.
4 changes: 2 additions & 2 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ version: 2.1
jobs:
unit:
docker: &test_only
- image: fishtownanalytics/test-container:5
- image: fishtownanalytics/test-container:7
environment:
DBT_INVOCATION_ENV: circle
steps:
Expand Down Expand Up @@ -30,7 +30,7 @@ jobs:
destination: dist
integration-postgres-py36:
docker: &test_and_postgres
- image: fishtownanalytics/test-container:5
- image: fishtownanalytics/test-container:7
environment:
DBT_INVOCATION_ENV: circle
- image: postgres
Expand Down
4 changes: 3 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,12 @@
- Fixed a number of issues with globally-scoped vars ([#2473](https://github.com/fishtown-analytics/dbt/issues/2473), [#2472](https://github.com/fishtown-analytics/dbt/issues/2472), [#2469](https://github.com/fishtown-analytics/dbt/issues/2469), [#2477](https://github.com/fishtown-analytics/dbt/pull/2477))
- Fixed DBT Docker entrypoint ([#2470](https://github.com/fishtown-analytics/dbt/issues/2470), [#2475](https://github.com/fishtown-analytics/dbt/pull/2475))
- Fixed a performance regression that occurred even when a user was not using the relevant feature ([#2474](https://github.com/fishtown-analytics/dbt/issues/2474), [#2478](https://github.com/fishtown-analytics/dbt/pull/2478))
- Substantial performance improvements for parsing on large projects, especially projects with many docs definition. ([#2480](https://github.com/fishtown-analytics/dbt/issues/2480), [#2481](https://github.com/fishtown-analytics/dbt/pull/2481))
- Expose Snowflake query id in case of an exception raised by connector ([#2201](https://github.com/fishtown-analytics/dbt/issues/2201), [#2358](https://github.com/fishtown-analytics/dbt/pull/2358))

Contributors:
- [@dmateusp](https://github.com/dmateusp) ([#2475](https://github.com/fishtown-analytics/dbt/pull/2475))
- [@ChristianKohlberg](https://github.com/ChristianKohlberg) (#2358](https://github.com/fishtown-analytics/dbt/pull/2358))

## dbt 0.17.0rc1 (May 12, 2020)

Expand Down Expand Up @@ -67,7 +70,6 @@ Contributors:
- Add a 'depends_on' attribute to the log record extra field ([#2316](https://github.com/fishtown-analytics/dbt/issues/2316), [#2341](https://github.com/fishtown-analytics/dbt/pull/2341))
- Added a '--no-browser' argument to "dbt docs serve" so you can serve docs in an environment that only has a CLI browser which would otherwise deadlock dbt ([#2004](https://github.com/fishtown-analytics/dbt/issues/2004), [#2364](https://github.com/fishtown-analytics/dbt/pull/2364))
- Snowflake now uses "describe table" to get the columns in a relation ([#2260](https://github.com/fishtown-analytics/dbt/issues/2260), [#2324](https://github.com/fishtown-analytics/dbt/pull/2324))
- Expose Snowflake query id in case of an exception raised by connector ([#2201](https://github.com/fishtown-analytics/dbt/issues/2201), [#2358](https://github.com/fishtown-analytics/dbt/pull/2358))
- Sources (and therefore freshness tests) can be enabled and disabled via dbt_project.yml ([#2283](https://github.com/fishtown-analytics/dbt/issues/2283), [#2312](https://github.com/fishtown-analytics/dbt/pull/2312), [#2357](https://github.com/fishtown-analytics/dbt/pull/2357))
- schema.yml files are now fully rendered in a context that is aware of vars declared in from dbt_project.yml files ([#2269](https://github.com/fishtown-analytics/dbt/issues/2269), [#2357](https://github.com/fishtown-analytics/dbt/pull/2357))
- Sources from dependencies can be overridden in schema.yml files ([#2287](https://github.com/fishtown-analytics/dbt/issues/2287), [#2357](https://github.com/fishtown-analytics/dbt/pull/2357))
Expand Down
2 changes: 1 addition & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ RUN apt-get update && \
apt-get install -y --no-install-recommends \
netcat postgresql curl git ssh software-properties-common \
make build-essential ca-certificates libpq-dev \
libsasl2-dev libsasl2-2 libsasl2-modules-gssapi-mit \
libsasl2-dev libsasl2-2 libsasl2-modules-gssapi-mit libyaml-dev \
&& \
add-apt-repository ppa:deadsnakes/ppa && \
apt-get install -y \
Expand Down
20 changes: 20 additions & 0 deletions core/dbt/clients/jinja.py
Original file line number Diff line number Diff line change
Expand Up @@ -493,13 +493,33 @@ def _requote_result(raw_value: str, rendered: str) -> str:
return f'{quote_char}{rendered}{quote_char}'


# performance note: Local benmcharking (so take it with a big grain of salt!)
# on this indicates that it is is on average slightly slower than
# checking two separate patterns, but the standard deviation is smaller with
# one pattern. The time difference between the two was ~2 std deviations, which
# is small enough that I've just chosen the more readable option.
_HAS_RENDER_CHARS_PAT = re.compile(r'({[{%]|[}%]})')


def get_rendered(
string: str,
ctx: Dict[str, Any],
node=None,
capture_macros: bool = False,
native: bool = False,
) -> str:
# performance optimization: if there are no jinja control characters in the
# string, we can just return the input. Fall back to jinja if the type is
# not a string or if native rendering is enabled (so '1' -> 1, etc...)
# If this is desirable in the native env as well, we could handle the
# native=True case by passing the input string to ast.literal_eval, like
# the native renderer does.
if (
not native and
isinstance(string, str) and
_HAS_RENDER_CHARS_PAT.search(string) is None
):
return string
template = get_template(
string,
ctx,
Expand Down
15 changes: 14 additions & 1 deletion core/dbt/clients/yaml_helper.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,17 @@
from typing import Any

import dbt.exceptions

import yaml
import yaml.scanner

# the C version is faster, but it doesn't always exist
YamlLoader: Any
try:
from yaml import CSafeLoader as YamlLoader
except ImportError:
from yaml import SafeLoader as YamlLoader


YAML_ERROR_MESSAGE = """
Syntax error near line {line_number}
Expand Down Expand Up @@ -44,9 +53,13 @@ def contextualized_yaml_error(raw_contents, error):
raw_error=error)


def safe_load(contents):
return yaml.load(contents, Loader=YamlLoader)


def load_yaml_text(contents):
try:
return yaml.safe_load(contents)
return safe_load(contents)
except (yaml.scanner.ScannerError, yaml.YAMLError) as e:
if hasattr(e, 'problem_mark'):
error = contextualized_yaml_error(contents, e)
Expand Down
3 changes: 2 additions & 1 deletion core/dbt/context/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
from dbt import flags
from dbt import tracking
from dbt.clients.jinja import undefined_error, get_rendered
from dbt.clients import yaml_helper
from dbt.contracts.graph.compiled import CompiledResource
from dbt.exceptions import raise_compiler_error, MacroReturn
from dbt.logger import GLOBAL_LOGGER as logger
Expand Down Expand Up @@ -383,7 +384,7 @@ def fromyaml(value: str, default: Any = None) -> Any:
-- ["good"]
"""
try:
return yaml.safe_load(value)
return yaml_helper.safe_load(value)
except (AttributeError, ValueError, yaml.YAMLError):
return default

Expand Down
28 changes: 28 additions & 0 deletions core/dbt/context/providers.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@
ParsedMacro, ParsedSourceDefinition, ParsedSeedNode
)
from dbt.exceptions import (
CompilationException,
InternalException,
ValidationException,
RuntimeException,
Expand Down Expand Up @@ -130,6 +131,19 @@ def _repack_args(
else:
return [package, name]

def validate_args(self, name: str, package: Optional[str]):
if not isinstance(name, str):
raise CompilationException(
f'The name argument to ref() must be a string, got '
f'{type(name)}'
)

if package is not None and not isinstance(package, str):
raise CompilationException(
f'The package argument to ref() must be a string or None, got '
f'{type(package)}'
)

def __call__(self, *args: str) -> RelationProxy:
name: str
package: Optional[str] = None
Expand All @@ -140,6 +154,7 @@ def __call__(self, *args: str) -> RelationProxy:
package, name = args
else:
ref_invalid_args(self.model, args)
self.validate_args(name, package)
return self.resolve(name, package)


Expand All @@ -148,12 +163,25 @@ class BaseSourceResolver(BaseResolver):
def resolve(self, source_name: str, table_name: str):
pass

def validate_args(self, source_name: str, table_name: str):
if not isinstance(source_name, str):
raise CompilationException(
f'The source name (first) argument to source() must be a '
f'string, got {type(source_name)}'
)
if not isinstance(table_name, str):
raise CompilationException(
f'The table name (second) argument to source() must be a '
f'string, got {type(table_name)}'
)

def __call__(self, *args: str) -> RelationProxy:
if len(args) != 2:
raise_compiler_error(
f"source() takes exactly two arguments ({len(args)} given)",
self.model
)
self.validate_args(args[0], args[1])
return self.resolve(args[0], args[1])


Expand Down
Loading

0 comments on commit 6dac4c7

Please sign in to comment.