Make `rich` traceback and REPL handling more configurable #1728

merelcht · 2022-07-27T12:30:45Z

Is it worth to add a section to pyproject.toml to configure settings for rich, e.g. [tool.kedro.rich]? Would this allow users to for example change show_locals to be set to True? (See #1712 for why we set it to False)

The text was updated successfully, but these errors were encountered:

datajoely · 2022-07-27T12:32:44Z

I'm really excited for this, it would make things a lot more configurable :)

We also have had reports that the logging mode on databricks is unpleasant, so maybe we should have a fallback to no rich logging too.

antonymilne · 2022-07-28T09:20:23Z

Agree that it would be nice to make the rich settings more configurable, but I don't think pyproject.toml would be the right place for this. I have some ideas for how to do it - will post more later.

antonymilne · 2022-08-01T13:53:15Z

Currently the rich setup does three slightly different things (see _ProjectLogging):

logging: rich.logging.RichHandler in logging.yml. This is both framework- and project-side
traceback handling: rich.traceback.install. This is just framework-side
pretty rendering on REPL: rich.pretty.install. This is just framework-side (and not so important - it's only really relevant for the interactive workflow)

Since the logging handler is already configurable on the project-side, the question of configuring the setup concerns the latter two points. Possibly in future we'll want some way for people to customise other things (e.g. turn on/off rich progress bar, change theme more easily), but let's leave that aside for now.

I think there are three possible places where this sort of thing could be configured.

1. settings.py

This is the only way that works currently. You can put your own calls to rich.traceback.install and rich.pretty.install in there, using whichever arguments you like:

rich.traceback.install(...)
rich.pretty.install(...)

Or effectively uninstall the rich traceback/REPL handling altogether:

import sys
sys.excepthook = sys.__excepthook__
sys.displayhook = sys.__displayhook__ 

# The above won't actually do anything in IPython.
# We need the below to undo pretty.install.
from IPython.core.formatters import PlainTextFormatter
ip = get_ipython()
ip.display_formatter.formatters["text/plain"] = PlainTextFormatter()

# And to undo traceback.install in IPython (doesn't work in Databricks - see below).
from IPython.core.interactiveshell import InteractiveShell
ip._showtraceback = InteractiveShell()._showtraceback
ip.showtraceback = InteractiveShell().showtraceback
ip.showsyntaxerror = InteractiveShell().showsyntaxerror

Note. We could make this much smoother by building a better uninstall functionality into rich. This got the thumbs up in principle: Textualize/rich#2461.

2. Hooks

Arguably setting the traceback handling would be best done through a hook in the first place. However, if a hook that does rich traceback handling is enabled by default, we'd have to think about the best way to implement that (i.e. how to register a built-in hook, how a user would disable or change hook from settings.py).

Another possible use case for built-in hook might be #1563 (Maybe in future SparkHooks will live somewhere within the kedro package so that users can just do from kedro.extras.hooks.pyspark import SparkHooks and not need to define the hook themselves). Although in that case the hook would presumably be disabled by default, which is a bit easier.

3. logging.yml

Not currently possible but this would be a sensible place to put configuration for other rich things. e.g.

rich.traceback.install:
  kwarg1: value1
  kwarg2: value2
  ...

# or some way to do rich.traceback.install: off

When you configure Python logging with dictConfig then these extra keys will be ignored, but we can use them in our _ProjectLogging setup like rich.traceback.install(**logging["rich.traceback.install"]).

Notes

solution 1 feels temporary, but could maybe be made better by introducing something like a clearer RICH_TRACEBACK_ENABLED boolean variable
advantage of solution 1 is that we already have settings.py which centralises dependency injection and app configuration
solution 3 could be changed per-environment (but in reality would never be) whereas solutions 1 and 2 couldn't
solution 3 only works well for kwargs that can be specified in yaml, not those that require Python. This is not quite all of the arguments since we currently need Python to find the right argument for suppress argument in rich.traceback.install
need to think about which of these solutions would work well with multiprocessing - off the top of my head, maybe only solution 3 since only configure_logging is called by _bootstrap_subprocess. This might be the deciding factor actually

antonymilne · 2022-08-01T14:00:25Z

Also I should say that I don't think this is urgent since I don't think it's going to be a common request, and solution 1 is a reasonable workaround. Maybe as we continue integrating rich further there will be more call for it though.

noklam · 2022-08-01T15:01:46Z

Agree with many of the points. In short, I am in favor of solution 1.

I tend to think of pyproject.toml as a static config which is mainly DevOps related (Build/dependecies/Linting/Coverage Configuration).

settings.py is the entrypoint of more advanced kedro features, i.e. hooks/TemplatedConfigLoader/Custom context/Session, it feels right to put rich into this category.

One of the values that kedro is to get the best practice right out of the box, so most users shouldn't have to touch these configurations at all. If Rich Logging doesn't work well on Databricks(I think it's because of the coloring?). If colored logs is impossible on Databricks, I am in favor go all the way to disable rich by default on databricks, since most of the users are either on databricks or with the local development environment. It doesn't feel right to have an awful experience by default.

CC @AntonyMilneQB

antonymilne · 2022-08-01T15:42:41Z

Thanks for the comments @noklam. Completely agree that we should try to make it so that people don't have to touch any settings at all by default, although in the databricks case I would point out:

regardless of rich, users currently have to edit logging.yml to disable file-based logging if using Databricks repos due to the read-only file system used there. Hopefully Databricks make repos writeable some time (apparently they plan to); otherwise we might need to come up with a system to make this smoother in the future
I don't think there are any fundamental problems with using rich on databricks, just a few small niggles which we should be able to fix

And just for reference (to explain more why, as you point out, pyproject.toml would not be the right place for this):

pstanisl · 2022-08-03T09:26:07Z

For me the rich.traceback does not work in Databricks because it prevents detecting an error by Databricks, i.e., the issue in the code does not cause a notebook cell execution error. The following cells run because everything seems okay. Automation of that notebook (i.e., like a job in Databricks) with that error handling is almost impossible.

Is there a workaround for my case to revert the installation of rich.traceback?

noklam · 2022-08-03T09:56:22Z

@pstanisl Can you explain it a little bit more? How would it behave differently without rich.traceback? There is a workaround though not elegant at the moment.

pstanisl · 2022-08-03T10:32:31Z

In the image below, you can see the difference. In the first cell, the Databricks properly catch the exception and highlight the error cell, i.e., stop the execution of the notebook if required. After installing rich.traceback, the exception is invisible for the Databricks because it is as "just" a print of some message, i.e., the cell is finished without any error.

So imagine, I have a Kedro project deployed as a Databricks job by dbx. I orchestrate it from Azure Data Factory. Suppose there is any issue in the Kedro pipeline. In that case, the job will end up successfully, and Azure Data Factory will continue with the following steps because no error is returned from Databricks.

noklam · 2022-08-03T12:34:07Z

@pstanisl The lost of color log is definitely not ideal, but I think it is still a proper Exception and it shouldn't have any effect on error handling.

pstanisl · 2022-08-03T12:48:28Z

@noklam The color is not a problem at all. The problem is running following cells, see video below

Zaznam.obrazovky.2022-08-03.v.14.44.57.mov

Same for Kedro project and orchecstration.

mjmare · 2022-08-05T19:40:21Z

I have the impression that PyCharm is also confused by the rich backtrace.
I was too lazy to revert to Kedro 0.18.1 today to verify, but normally PyCharm parses the backtrace and makes the line with the error clickable, very convenient. This doesn't work anymore, very inconvenient. That, plus the pages and pages of boxes....

noklam · 2022-08-05T21:37:25Z

@mjmare Sorry that this is no the smoothest experience, we shoule have highlights this. In Pycharm you need to set emulate terminal and it would become a clickable traceback again.

https://kedro.readthedocs.io/en/stable/development/set_up_pycharm.html?highlight=Emulate#set-up-run-configurations

mjmare · 2022-08-06T20:48:02Z

Unfortunately that does not work. Enabling the "emulate terminal in output console" option (which I have disabled by default) does nothing for me except make the mess colourful.
The only thing that worked was:

import sys
sys.excepthook = sys.__excepthook__

in settings.py.

See: #1705 (reply in thread)

pstanisl · 2022-08-08T10:13:48Z

@AntonyMilneQB and @noklam for Jupyter/Databricks notebook workaround #1 does not work because sys.excepthook is not replaced, see

https://github.com/Textualize/rich/blob/2ba277ac54a153d3030b496698133ec2e4f67e1d/rich/traceback.py#L137-L146

What is working is to save iPython._showtraceback before installing rich.traceback (i.e., before using bootstrap_project) and then revert it after installing.

ip = get_ipython()
ip__showtraceback = ip._showtraceback

from kedro.framework.startup import bootstrap_project

ip._showtraceback = ip__showtraceback

It will revert rich.traceback for Jupyter/Databricks notebook.

noklam · 2022-08-08T10:26:41Z

@mjmare This works for me, I am not sure if the version of PyCharm matters.

OS: MacOS

Settings for PyCharm

antonymilne · 2022-08-08T10:54:57Z

@pstanisl thank you very much for this very clear explanation and the workaround. I definitely understand why this would be a problem. It only seems to happen on Databricks (not plain Jupyter), so must be something to do with how they wrap the default IPython exception handling 😬 We could try to ask rich to fix this but it might be a tall order unfortunately.

I had also figured out that reverting traceback.install would need to be different in IPython but hadn't yet figured out the right way to do it yet. It would be nice if there were some way of doing this "in one go" after the Kedro project has been bootstrapped (which is possible for undoing the pretty.install). Let me see if I can figure that out - otherwise I'll raise an issue on the rich repo for it.

mjmare · 2022-08-08T11:11:29Z

Hi @noklam,

I'm using PyCharm:
PyCharm 2022.2 (Professional Edition)
Build #PY-222.3345.131, built on July 27, 2022

on OSX. I use ZSH as my shell.

As soon as I turn on emulate terminal option, everything turns into a mess.

antonymilne · 2022-08-08T11:11:50Z

@pstanisl please could you try this out in Databricks for me? 🙂

from IPython.core.interactiveshell import InteractiveShell

ip._showtraceback = InteractiveShell()._showtraceback
ip.showtraceback = InteractiveShell().showtraceback
ip.showsyntaxerror = InteractiveShell().showsyntaxerror

This should undo all of the modifications rich makes to the tracebacks. The problem might be that it replaces the ip.show_traceback etc. with the IPython default, which is not necessarily the same as the databricks default since I guess they might have customised this already.

Ultimately, though, I think we will probably just have to turn off rich traceback handling in databricks (or add something to rich that does that).

noklam · 2022-08-08T11:13:55Z

@mjmare Would you be able to drop a gif or a screenshot how does it looks like with emulated mode?

mjmare · 2022-08-08T11:17:09Z

Emulate OFF:

Emulate ON:

pstanisl · 2022-08-08T11:20:19Z

@AntonyMilneQB It reverts the visual appearance of the error in the cell, but it does not fix the problem with the error.

When I was exploring the workaround, only ip._showtraceback made the reverting.

antonymilne · 2022-08-08T11:36:33Z

Thanks for checking @pstanisl. Last try - I think the correct equivalent on Databricks would actually be this:

from PythonShell import InteractiveShell
ip._showtraceback = InteractiveShell()._showtraceback
ip.showtraceback = InteractiveShell().showtraceback
ip.showsyntaxerror = InteractiveShell().showsyntaxerror

pstanisl · 2022-08-08T11:44:53Z

@AntonyMilneQB same result (original formatting of the error but still ignoring it), but with the message after importing

/databricks/python_shell/scripts/PythonShell.py:46: ImportWarning: PythonShell.py is a launch script, which shall not be imported directly. You might want to import PythonShellImpl instead. Call stack:

  File "/databricks/python_shell/scripts/PythonShell.py", line 29, in <module>
    launch_process()
  File "/databricks/python_shell/scripts/PythonShellImpl.py", line 1233, in launch_process
    shell.executor.run()
  File "/databricks/python_shell/scripts/PythonShellImpl.py", line 268, in run
    self.shell.shell.run_cell(command_id, cmd, store_history=True)
  File "/databricks/python_shell/scripts/PythonShellImpl.py", line 756, in run_cell
    super(IPythonShell, self).run_cell(raw_cell, store_history, silent, shell_futures)
  File "/databricks/python/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 2894, in run_cell
    result = self._run_cell(
  File "/databricks/python/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 2940, in _run_cell
    return runner(coro)
  File "/databricks/python/lib/python3.8/site-packages/IPython/core/async_helpers.py", line 68, in _pseudo_sync_runner
    coro.send(None)
  File "/databricks/python/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3165, in run_cell_async
    has_raised = await self.run_ast_nodes(code_ast.body, cell_name,
  File "/databricks/python/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3357, in run_ast_nodes
    if (await self.run_code(code, result,  async_=asy)):
  File "/databricks/python/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3437, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<command-3483914239647235>", line 1, in <module>
    from PythonShell import InteractiveShell
  File "/databricks/python_shell/dbruntime/PythonPackageImportsInstrumentation/__init__.py", line 167, in import_patch
    original_result = python_builtin_import(name, globals, locals, fromlist, level)
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 848, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/databricks/python_shell/scripts/PythonShell.py", line 42, in <module>
    traceback.print_stack(file=call_stack)

  warnings.warn(

with

from PythonShellImpl import InteractiveShell

ip._showtraceback = InteractiveShell()._showtraceback
ip.showtraceback = InteractiveShell().showtraceback
ip.showsyntaxerror = InteractiveShell().showsyntaxerror

the same behavior (original formatting of the error but still ignoring it) without warning.

antonymilne · 2022-08-08T14:41:43Z

Thanks very much for all your testing @pstanisl, much appreciated. Thanks also for raising the issue on the rich repo - I was just about to do so and saw that you had already! Textualize/rich#2455

I've created a draft PR for kedro which would disable rich traceback handling on databricks so you don't need to do so yourself: #1769. Hopefully this gets fixed on the rich side so we don't need to merge it, but if they can't fix it then this should do it.

antonymilne · 2022-08-11T13:47:16Z

We have the thumbs up in principle to adding/improving an uninstall functionality on rich itself: Textualize/rich#2461.

deepyaman · 2022-08-22T13:40:10Z

I've expressed this before, but I personally like the approach libraries like Typer take in enabling rich tracebacks if it's installed, and not using them if not installed, since there's no need to push the use of rich tracebacks upon everyone.

PedroAbreuQB · 2022-08-24T16:52:22Z

Adding here that we're facing the exact same problem as @pstanisl, if it helps in any way with raising the priority. The issue is currently blocking us from detecting errors in kedro pipelines executed on Databricks:

can't detect failed jobs easily when they fail inside a kedro run;
can't run integration tests on Databricks because if they fail, Databricks still doesn't see it as a failure and as such reports "success" back to the CI system.

datajoely · 2022-08-24T17:03:01Z

One possibly related point - by default, logging goes into stderr, for some reason RichHandler changes that.

To enforce this behaviour you have to do:

RichHandler(console=Console(stderr=True))

antonymilne · 2022-08-26T10:38:27Z

FYI @pstanisl since it's not clear how long it will take to fix this on rich, we're merging a fix in kedroi to disable rich tracebacks on databricks. #1769. This will be released with 0.18.3 (probably within the next week or so).

antonymilne · 2022-09-23T11:51:55Z

Just for the record, I think there's going to be much less call for making the rich settings configurable following the changes made in Kedro 0.18.3. I'm leaving this discussion open because I think it's something we should still think about and I'm interested in hearing if users continue to ask for such a feature, but I don't think it's at all urgent at the moment. We should deal with this as part of Logging improvements part 2 which I'll develop further in #1461 or a new issue in due course. Given the other things that might need to be changed there for Kedro 0.19, I think ultimately the right setup here might be different from all three options I outlined above and instead rely on hooks.

antonymilne · 2023-01-16T11:56:36Z

Superseded by #2206.

noklam mentioned this issue Jul 28, 2022

[Parent] - Logging related issues #1735

Closed

6 tasks

antonymilne mentioned this issue Jul 29, 2022

Jupyter nb output in PyCharm IDE breaks with Kedro 0.18.2 #1719

Closed

antonymilne changed the title ~~Add section to pyproject.toml to configure rich~~ Make rich setup more configurable Aug 1, 2022

antonymilne changed the title ~~Make rich setup more configurable~~ Make rich traceback and REPL handling more configurable Aug 1, 2022

antonymilne mentioned this issue Aug 3, 2022

Rich output is too long #1750

Closed

antonymilne mentioned this issue Aug 8, 2022

Disable rich tracebacks on Databricks #1769

Merged

5 tasks

noklam mentioned this issue Aug 26, 2022

Packaged kedro pipeline does not work well on Databricks #1807

Closed

antonymilne added this to the Improve the setup of logging milestone Sep 23, 2022

antonymilne mentioned this issue Sep 23, 2022

Improve logging setup in Kedro #1461

Closed

18 tasks

This was referenced Jan 13, 2023

Drastically simplify the default project template #2149

Closed

Make framework-side logging.yml user-specifiable #2206

Closed

Make rich markup compatible with file-based logs #2207

Closed

antonymilne closed this as completed Jan 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make `rich` traceback and REPL handling more configurable #1728

Make `rich` traceback and REPL handling more configurable #1728

merelcht commented Jul 27, 2022

datajoely commented Jul 27, 2022

antonymilne commented Jul 28, 2022

antonymilne commented Aug 1, 2022 •

edited

Loading

antonymilne commented Aug 1, 2022

noklam commented Aug 1, 2022 •

edited

Loading

antonymilne commented Aug 1, 2022

pstanisl commented Aug 3, 2022

noklam commented Aug 3, 2022 •

edited

Loading

pstanisl commented Aug 3, 2022 •

edited

Loading

noklam commented Aug 3, 2022

pstanisl commented Aug 3, 2022 •

edited

Loading

mjmare commented Aug 5, 2022

noklam commented Aug 5, 2022

mjmare commented Aug 6, 2022

pstanisl commented Aug 8, 2022

noklam commented Aug 8, 2022

antonymilne commented Aug 8, 2022

mjmare commented Aug 8, 2022

antonymilne commented Aug 8, 2022

noklam commented Aug 8, 2022

mjmare commented Aug 8, 2022

pstanisl commented Aug 8, 2022

antonymilne commented Aug 8, 2022

pstanisl commented Aug 8, 2022 •

edited

Loading

antonymilne commented Aug 8, 2022 •

edited

Loading

antonymilne commented Aug 11, 2022

deepyaman commented Aug 22, 2022

PedroAbreuQB commented Aug 24, 2022

datajoely commented Aug 24, 2022

antonymilne commented Aug 26, 2022 •

edited

Loading

antonymilne commented Sep 23, 2022

antonymilne commented Jan 16, 2023

Make rich traceback and REPL handling more configurable #1728

Make rich traceback and REPL handling more configurable #1728

Comments

merelcht commented Jul 27, 2022

datajoely commented Jul 27, 2022

antonymilne commented Jul 28, 2022

antonymilne commented Aug 1, 2022 • edited Loading

1. settings.py

2. Hooks

3. logging.yml

Notes

antonymilne commented Aug 1, 2022

noklam commented Aug 1, 2022 • edited Loading

antonymilne commented Aug 1, 2022

pstanisl commented Aug 3, 2022

noklam commented Aug 3, 2022 • edited Loading

pstanisl commented Aug 3, 2022 • edited Loading

noklam commented Aug 3, 2022

pstanisl commented Aug 3, 2022 • edited Loading

mjmare commented Aug 5, 2022

noklam commented Aug 5, 2022

mjmare commented Aug 6, 2022

pstanisl commented Aug 8, 2022

noklam commented Aug 8, 2022

Settings for PyCharm

antonymilne commented Aug 8, 2022

mjmare commented Aug 8, 2022

antonymilne commented Aug 8, 2022

noklam commented Aug 8, 2022

mjmare commented Aug 8, 2022

pstanisl commented Aug 8, 2022

antonymilne commented Aug 8, 2022

pstanisl commented Aug 8, 2022 • edited Loading

antonymilne commented Aug 8, 2022 • edited Loading

antonymilne commented Aug 11, 2022

deepyaman commented Aug 22, 2022

PedroAbreuQB commented Aug 24, 2022

datajoely commented Aug 24, 2022

antonymilne commented Aug 26, 2022 • edited Loading

antonymilne commented Sep 23, 2022

antonymilne commented Jan 16, 2023

Make `rich` traceback and REPL handling more configurable #1728

Make `rich` traceback and REPL handling more configurable #1728

antonymilne commented Aug 1, 2022 •

edited

Loading

noklam commented Aug 1, 2022 •

edited

Loading

noklam commented Aug 3, 2022 •

edited

Loading

pstanisl commented Aug 3, 2022 •

edited

Loading

pstanisl commented Aug 3, 2022 •

edited

Loading

pstanisl commented Aug 8, 2022 •

edited

Loading

antonymilne commented Aug 8, 2022 •

edited

Loading

antonymilne commented Aug 26, 2022 •

edited

Loading