Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

graph visualization fails because of unescaped config values #1200

Closed
MG-MW opened this issue Oct 23, 2024 · 1 comment
Closed

graph visualization fails because of unescaped config values #1200

MG-MW opened this issue Oct 23, 2024 · 1 comment
Labels
triage label for issues that need to be triaged.

Comments

@MG-MW
Copy link
Contributor

MG-MW commented Oct 23, 2024

Depending on given configuration, graph visualization fails

Current behavior

import hamilton.driver
import hamilton.ad_hoc_utils


def world(hello: str) -> str:
    return f"{hello} world"


temp_module = hamilton.ad_hoc_utils.create_temporary_module(world)


driver = hamilton.driver.Builder().with_modules(temp_module).with_config({"hello": "Hi"}).build()
# does not fail
driver.display_upstream_of("world")

driver = hamilton.driver.Builder().with_modules(temp_module).with_config({"hello": "<"}).build()
# throws error
driver.display_upstream_of("world")

Stack Traces

Stack trace is not really helpful --------------------------------------------------------------------------- CalledProcessError Traceback (most recent call last) File ~/.envs/py312test/lib/python3.12/site-packages/graphviz/backend/execute.py:88, in run_check(cmd, input_lines, encoding, quiet, **kwargs) 87 try: ---> 88 proc.check_returncode() 89 except subprocess.CalledProcessError as e:

File ~/.local/share/mise/installs/python/3.12.3/lib/python3.12/subprocess.py:502, in CompletedProcess.check_returncode(self)
501 if self.returncode:
--> 502 raise CalledProcessError(self.returncode, self.args, self.stdout,
503 self.stderr)

CalledProcessError: Command '[PosixPath('dot'), '-Kdot', '-Tsvg']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

CalledProcessError Traceback (most recent call last)
File ~/.envs/py312test/lib/python3.12/site-packages/IPython/core/formatters.py:977, in MimeBundleFormatter.call(self, obj, include, exclude)
974 method = get_real_method(obj, self.print_method)
976 if method is not None:
--> 977 return method(include=include, exclude=exclude)
978 return None
979 else:

File ~/.envs/py312test/lib/python3.12/site-packages/graphviz/jupyter_integration.py:98, in JupyterIntegration.repr_mimebundle(self, include, exclude, **_)
96 include = set(include) if include is not None else {self._jupyter_mimetype}
97 include -= set(exclude or [])
---> 98 return {mimetype: getattr(self, method_name)()
99 for mimetype, method_name in MIME_TYPES.items()
100 if mimetype in include}

File ~/.envs/py312test/lib/python3.12/site-packages/graphviz/jupyter_integration.py:112, in JupyterIntegration._repr_image_svg_xml(self)
110 def _repr_image_svg_xml(self) -> str:
111 """Return the rendered graph as SVG string."""
--> 112 return self.pipe(format='svg', encoding=SVG_ENCODING)

File ~/.envs/py312test/lib/python3.12/site-packages/graphviz/piping.py:104, in Pipe.pipe(self, format, renderer, formatter, neato_no_op, quiet, engine, encoding)
55 def pipe(self,
56 format: typing.Optional[str] = None,
57 renderer: typing.Optional[str] = None,
(...)
61 engine: typing.Optional[str] = None,
62 encoding: typing.Optional[str] = None) -> typing.Union[bytes, str]:
63 """Return the source piped through the Graphviz layout command.
64
65 Args:
(...)
102 '<?xml version='
103 """
--> 104 return self._pipe_legacy(format,
105 renderer=renderer,
106 formatter=formatter,
107 neato_no_op=neato_no_op,
108 quiet=quiet,
109 engine=engine,
110 encoding=encoding)

File ~/.envs/py312test/lib/python3.12/site-packages/graphviz/_tools.py:171, in deprecate_positional_args..decorator..wrapper(*args, **kwargs)
162 wanted = ', '.join(f'{name}={value!r}'
163 for name, value in deprecated.items())
164 warnings.warn(f'The signature of {func.name} will be reduced'
165 f' to {supported_number} positional args'
166 f' {list(supported)}: pass {wanted}'
167 ' as keyword arg(s)',
168 stacklevel=stacklevel,
169 category=category)
--> 171 return func(*args, **kwargs)

File ~/.envs/py312test/lib/python3.12/site-packages/graphviz/piping.py:121, in Pipe._pipe_legacy(self, format, renderer, formatter, neato_no_op, quiet, engine, encoding)
112 @_tools.deprecate_positional_args(supported_number=2)
113 def _pipe_legacy(self,
114 format: typing.Optional[str] = None,
(...)
119 engine: typing.Optional[str] = None,
120 encoding: typing.Optional[str] = None) -> typing.Union[bytes, str]:
--> 121 return self._pipe_future(format,
122 renderer=renderer,
123 formatter=formatter,
124 neato_no_op=neato_no_op,
125 quiet=quiet,
126 engine=engine,
127 encoding=encoding)

File ~/.envs/py312test/lib/python3.12/site-packages/graphviz/piping.py:149, in Pipe._pipe_future(self, format, renderer, formatter, neato_no_op, quiet, engine, encoding)
146 if encoding is not None:
147 if codecs.lookup(encoding) is codecs.lookup(self.encoding):
148 # common case: both stdin and stdout need the same encoding
--> 149 return self._pipe_lines_string(*args, encoding=encoding, **kwargs)
150 try:
151 raw = self._pipe_lines(*args, input_encoding=self.encoding, **kwargs)

File ~/.envs/py312test/lib/python3.12/site-packages/graphviz/backend/piping.py:212, in pipe_lines_string(engine, format, input_lines, encoding, renderer, formatter, neato_no_op, quiet)
206 cmd = dot_command.command(engine, format,
207 renderer=renderer,
208 formatter=formatter,
209 neato_no_op=neato_no_op)
210 kwargs = {'input_lines': input_lines, 'encoding': encoding}
--> 212 proc = execute.run_check(cmd, capture_output=True, quiet=quiet, **kwargs)
213 return proc.stdout

File ~/.envs/py312test/lib/python3.12/site-packages/graphviz/backend/execute.py:90, in run_check(cmd, input_lines, encoding, quiet, **kwargs)
88 proc.check_returncode()
89 except subprocess.CalledProcessError as e:
---> 90 raise CalledProcessError(*e.args)
92 return proc

CalledProcessError: Command '[PosixPath('dot'), '-Kdot', '-Tsvg']' returned non-zero exit status 1. [stderr: "Error: : syntax error in line 9 near 'subgraph'\n"]

Library & System Information

hamilton version: 1.81.0

$ dot --version
dot - graphviz version 12.0.0 (20240803.0821)

Expected behavior

Escape representations of nodes

Additional context

I could rectify the behaviour by patching this code:

hamilton/hamilton/graph.py

Lines 258 to 273 in bc7cbbf

def _get_node_label(
n: node.Node,
name: Optional[str] = None,
type_string: Optional[str] = None,
) -> str:
"""Get a graphviz HTML-like node label. It uses the DAG node
name and type but values can be overridden. Overriding is currently
used for materializers since `type_` is stored in n.tags.
ref: https://graphviz.org/doc/info/shapes.html#html
"""
name = n.name if name is None else name
if type_string is None:
type_string = get_type_as_string(n.type) if get_type_as_string(n.type) else ""
return f"<<b>{name}</b><br /><br /><i>{type_string}</i>>"

like this

    def _get_node_label(
        n: node.Node,
        name: Optional[str] = None,
        type_string: Optional[str] = None,
    ) -> str:
        """Get a graphviz HTML-like node label. It uses the DAG node
        name and type but values can be overridden. Overriding is currently
        used for materializers since `type_` is stored in n.tags.

        ref: https://graphviz.org/doc/info/shapes.html#html
        """
        import html

        name = n.name if name is None else name
        if type_string is None:
            type_string = get_type_as_string(n.type) if get_type_as_string(n.type) else ""

        escaped_type_string=type_string

        # unrelated but my config values can be pretty long
        if len(escaped_type_string) > 80:
            escaped_type_string = escaped_type_string[:80] + "[...]"

        escaped_type_string=html.escape(escaped_type_string, quote=True)
        return f"<<b>{name}</b><br /><br /><i>{escaped_type_string}</i>>"

Given that a python class repr and a configuration value in particular can contain any chars imaginable, these should probably be escaped.

I pretty busy this week, but maybe I can throw a PR with a proper testcase together by next week.

@MG-MW MG-MW added the triage label for issues that need to be triaged. label Oct 23, 2024
@zilto
Copy link
Collaborator

zilto commented Oct 23, 2024

Thanks for raising the issue! Here are a few notes to add to your assessment:

  • The issue is associated with this line f"<<b>{name}</b><br /><br /><i>{type_string}</i>>" which is used for formatting (bold and italic). However, keeping f"{name} {type_string}" could still raise errors
  • It is specific to config values because the type_string is actually a string representation of the config value. This is not the case for other types of nodes.
  • The problem could be encountered frequently because some library use the format <MyClass(...)> as __repr__().
  • In most cases, removing <, > wouldn't hurt. We could log a warning if escaped_type_string != type_string to indicate config could be odd looking. Probably a rare enough scenario.
  • Graphviz's syntax isn't actually HTML, but probably not an issue in this case

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triage label for issues that need to be triaged.
Projects
None yet
Development

No branches or pull requests

2 participants