Orca integration for static image export #1120

jonmmease · 2018-08-19T09:26:50Z

Overview

This PR integrates orca into plotly.py to support exporting figures as static images 🎉

cc @chriddyp @jackparmer @nicolaskruchten @cldougl @Kully @etpinard @malmaud

Even if you don't have time to look at the code or test out the branch, I'd appreciate any feedback on the architecture and API notes.

Here goes...

Background

See #1105 for background information and discussion of related work.

Architecture

In this PR I went with method (3) from the issue above, "Use orca in server mode".

The first time an image export operation is performed, an orca server process is launched in the background (as non-blocking subprocess). Image export requests are posted to the server on a local port.

By default, the server process runs until the main process exits. But there is also a timeout configuration option (more on configuration options below) that allows a user to specify that the server should be automatically shut down after a certain period of inactivity.

Regardless of whether a timeout is set, the server may also be manually shutdown and manually started.

Implementation Notes

Starting the server

The server subprocess is launched using subprocess.Popen to create a long-running background process. The server is launched in --graph-only mode to be as lean as possible (this avoids running processes for exporting thumbnails, dashboards, etc.)

Communicating with the Server

Communication with the server is done using requests.post. The request function is wrapped in the @retrying.retry decorator to handle the automatic retrying of failed requests. The retrying logic is very convenient, as it allows an image request to be made right after the server process is launched and the request will simply block until the server responds.

Shutting down the server

It's possible to terminate the particular process created using subprocess.Popen with the Popen.terminate method. Unfortunately, this isn't always enough to actually shut down the server. The trouble is that typical orca entry points (orca.sh, orca.js, orca.cmd) are simply wrapper scripts that call the main orca/electron executable. In my testing on OS X, Linux, and Windows I found that Popen.terminate generally only terminates the shell/wrapper process, leaving the orca server running. This is definitely not acceptable, as a user could end up with a new orca process each time they restart their kernel and export images.

I initially tried some workarounds involving process groups, and sending different signals, but the result ended up being platform dependent and still not fully reliable. I settled on introducing the psutil library as a new optional dependency. psutil provides a platform agnostic API for iterating over the children of a process, and then terminating them. In my testing, this psutil approach has been fully reliable in terminating the server processes across platforms. Since our CI test suites is Linux only at this point, I'm especially glad to not need to introduce any OS X/Windows specific process management logic.

Shutdown server after timeout

If a timeout is configured when the server process is launched, a threading.Timer object is created to call the shutdown function after timeout seconds.

Each time an image render request is made, any existing Timer object is canceled, and a new Timer is created.

Importantly, each timer thread has the daemon property set to True. This prevents the main process from waiting for the timer to complete before exiting.

Shutdown on exit

The shutdown function is annotated with the @atexit.register decorator to ensure that the server is properly shutdown when the main Python process exits.

API Design

This PR introduces the beginning of the plotly.io module.

Image export

Two image export functions are introduced. These function follow the export conventions proposed in #1098.

plotly.io.write_image(fig, file, format=None, scale=None, width=None, height=None)

This functions works very much like the matplotlib savefig function. fig is a Figure or compatible dict. file may be a string referring to a local filesystem path, or a file-like object to be written to. If file is a string, then the file extension is used to infer the image format if possible. The format may be used to explicitly specify the format, and it is required if file is not a string with a common extension. Supported formats are png, jpeg (jpg extension supported as well), webp, svg, pdf, and eps (with poppler installed). scale, width, and height work as you would expect.

plotly.io.to_image(fig, format=None, width=None, height=None, scale=None)

This function may be used to return the binary representation of the image directly (no temp files or messing with io.BytesIO!). This can be used in conjunction with IPython.display.Image to display static images directly in the notebook or QtConsole.

Orca management

If users install orca using conda or npm, they should be able to use the above methods immediately, without additional configuration. But for more technical users, and for general users if things go wrong, there is a new plotly.io.orca module.

Manual server management

The server may be manually started using plotly.io.orca.ensure_orca_server(), and it may be manually shut down using plotly.io.orca.shutdown_orca_server()

Orca config

plotly.io.orca.config is an orca configurations/settings object. Here are the properties that can be configured

orca configuration
------------------
    port: None
    executable: orca
    timeout: None
    default_width: None
    default_height: None
    default_scale: 1
    default_format: png
    mathjax: https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/MathJax.js
    topojson: None
    mapbox_access_token: None

constants
---------
    plotlyjs: /path/to/plotly/package_data/plotly.min.js 
    config_file: /Users/username/.plotly/.orca

If automatic port selection is not desirable, an explicit port value may be set here. If an executable named orca cannot be found on the path, then the executable property may be set to the absolute path to an orca executable. This is where the timeout property is set. The default width, height, scale, and format, control the default values used by to_image when not otherwise specified.

I took the liberty of supplying a default mathjax CDN, this way latex image export just works as long as the user is online. For offline use, the mathjax property can be set to the path to a local mathjax installation. When topojson is None the plot.ly CDN will be used, but a local path can be supplied if working offline. Finally, the mapbox_access_token property can store a mapbox token that will automatically be applied when exporting mapbox traces.

Properties can be set using property assignment

plotly.io.orca.config.mapbox_access_token = 'xyz...'

or using the update method

plotly.io.orca.config.update(mapbox_access_token='xyz...')

The constants are not settable and are listed for informational purposes.

Saving configuration properties

The config values may optionally be saved to the ~/.plotly settings directory as ~/.plotly/.orca using the plotly.io.config.save() method. If present, these setting are automatically loaded on import.

Orca status

The current status of the orca server process can be displayed using the plotly.io.orca.status object.

At initial startup the state will be unvalidated

orca status
-----------
    executable: None
    version: None
    port: None
    pid: None
    state: unvalidated
    command: None

After a valid orca executable has been found, and the server is not yet running, the state will be `validated'

orca status
-----------
    executable: /anaconda3/envs/plotly_dev/bin/orca
    version: 1.1.0
    port: None
    pid: None
    state: validated
    command: None

Here the user can see which orca executable was found on the path, and what version it is.

When the server process is currently running, the state will be running

orca status
-----------
    executable: /anaconda3/envs/plotly_dev/bin/orca
    version: 1.1.0
    port: 59997
    pid: 83079
    state: running
    command: ['orca', 'serve', '-p', '59997', '--graph-only', '--plotly', '/path/to/plotly/package_data/plotly.min.js', '--mathjax', 'https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/MathJax.js', '--mapbox-access-token', 'pk...']

Here the user can see the details of the running process (port, pid) and the exact command line arguments that were passed to the orca server at startup.

Error messages

There are a lot of things that can potentially go wrong here, so I've tried to make the error messages as helpful as possible. For example here's the error that is raised if the orca executable cannot be found on the path:

The orca executable is required in order to export figures as static images,
but it could not be found on the system path.

Searched for executable 'orca2' on the following path:
    /anaconda3/envs/plotly_dev/bin
    /usr/local/bin
    /usr/bin
    /bin
    /usr/sbin
    /sbin
    /Applications/VMware Fusion.app/Contents/Public
    /Library/TeX/texbin

If you haven't installed orca yet, you can do so using conda as follows:

    $ conda install -c plotly plotly-orca

After installation is complete, no further configuration should be needed. 
For other approaches to installing orca, see the orca project README at
https://github.com/plotly/orca.

If you have installed orca, then for some reason plotly.py was unable to
locate it. In this case, set the `plotly.io.orca.config.executable`
property to the full path to your orca executable. For example:

    >>> plotly.io.orca.config.executable = '/path/to/orca'

After updating this executable property, try the export operation again.
If it is successful then you may want to save this configuration so that it
will be applied automatically in future sessions. You can do this as follows:

    >>> plotly.io.orca.config.save() 

If you're still having trouble, feel free to ask for help on the forums at
https://community.plot.ly/c/api/python

Testing

I've added two new test suites.

plotly/tests/test_orca/test_orca_server.py. These tests cover the logic for locating and validating the orca executable. And the logic for launching and shutting it down. This testing relies on psutil to check that the process with the right pid is running and then not running. And it relies on pinging the server to make sure it's running on the right port, and that it stops responding when it should be shut down.
plotly/tests/test_orca/test_to_image.py. These tests cover the image conversion logic. I've generated a set or reference images to compare against. These ensure that valid images are produces where they should be, and that the topojson and mathjax configuration is working properly. Unfortunately, the images are not exactly reproducible between my local mac and CircleCI, so for the time being there is a separate directory of reference images for OS X and Linux (though I'm not sure Linux is fine grained enough).

These tests are working on CircleCI. The new tests follow a new conda environment pathway so that orca can be installed using conda. The tests are run with Python 2.7, 3.5, and 3.7.

Performance

The whole reason for using this more complex client/server architecture is to improve image export performance. So how well does it do?

This is not an extensive performance comparison, but I did an initial comparison of matplotlib, this branch, and bokeh (setup instructions). The test was to create a 1000 point scatter plot with varying point size and color and then save it to a png.

So after the orca server is running, the export time here is right on par with matplotlib (~215ms), and much faster than bokeh (~1.7s).

Being on par with matplotlib here is really exciting, and opens up a lot of new use cases for plotly.py. I'm thinking, in particular, of the possibility of a static image backend for interactive use outside of the notebook/browser context.

Side note: bokeh isn't doing any wrong here. This is just how expensive it is to launch a web browser from scratch. This is also about how long it takes the orca server to start up the first time. The advantage with this orca approach is that the server only needs to start up once per session, instead of once per image.

Produced images

And here are the images produced by matplotlib, this branch, and bokeh

TODO

Various things still to do/look into:

Add validate option to to_image and write_image
Look into validating poppler installation, or providing better error message on eps failure.

Works in QtConsole if you initialize the qt event loop with: from PyQt5.QtWebEngineWidgets import QWebEngineView %gui qt QWebEngineView import must precede the %gui qt command.

Added orca server process management tests

- Save/load settings from ~/.plotly/.orca file - More validation - write image - add image options (format, size, scale)

Lets save some complexity and not support using an external orca server for now.

The old approach required OS-specific process management and it still didn't kill the child process for orca installed with npm. Now all of the OS-specifics are in psutil. psutil is an optional import that is check when the server is first requested.

We could leave the plotly.io._show module in place, so people could experiment with the image backend concept.

It emits some errors when children are killed, but these are harmless

This way program exist won't wait for it to complete

…Mathjax CDN - Add topojson files to `plotly/package_data` - Add new config settings for plotly.js bundle (use local by default), topojson, mathjax, and mapbox access token - Add image tests for `topojson` images and mathjax images - Remove saving of orca config to ~/.plotly. Need more a more wholistic settings solution that handles environments - Shutdown server when setting config parameters that won't be active until server restarts (e.g. plotlyjs bundle) - Make default timeout None. So shutting down the server due to inactivity is now opt-in.

details in error message.

…ical

to bypass figure dict validation. Also improve presentation of orca error messages and added a special check for EPS failures that might be due to the needed poppler dependency

… fails to communicate with the orca server process.

Needed for EPS tests

On windows, this avoids Popen being unable to find the orca executable when it is on the environment path. [ci skip]

[skip ci]

If orca returns a 525: 'plotly.js error', and the figure contains at least one mapbox trace, and not mapbox_access_token is configured, then include a error message explaining what to do.

jonmmease · 2018-08-25T21:43:33Z

Alright, time to merge this thing!

jonmmease added 30 commits August 10, 2018 20:13

Experiments with orca and io module

9ffd55a

Added simple pyqt5 window backend

de10887

Works in QtConsole if you initialize the qt event loop with: from PyQt5.QtWebEngineWidgets import QWebEngineView %gui qt QWebEngineView import must precede the %gui qt command.

Added OrcaStatus singleton to hold status of orca server process

38127fa

Added orca server process management tests

More progress on the orca module.

fd32d64

- Save/load settings from ~/.plotly/.orca file - More validation - write image - add image options (format, size, scale)

Added update method to orca config object

49130b1

Remove unused import

264d19b

Remove autostart and hostname configuration parameters

9b6f8c8

Lets save some complexity and not support using an external orca server for now.

Docstrings and cleanup

8d31c9b

Don't auto import any of the show/backend logic

af10ff6

We could leave the plotly.io._show module in place, so people could experiment with the image backend concept.

auto import the plotly.io module

eb21a70

Capture server process output.

0183d43

It emits some errors when children are killed, but these are harmless

Merge branch 'master' into orca_integration

f75652c

Initial Python 2 support

fd3e65c

Run orca auto-shutdown timer as daemon thread

15e62a0

This way program exist won't wait for it to complete

Added to_image tests against reference images

2d5a074

Added image generation tests, converted orca server tests to use pytest

2df38cc

Add orca tests to 3.6 optional test suite

0d1e0eb

Cannot install orca globally

f81e63c

Adds psutil dependency and try to add orca to path before running tox

6f78063

Forgot && in tox command

fc35eff

Try global install of orca in tox environment

48d4de5

Local install in tox

88b293a

Tox whitelist export command

86fc540

whitelist ls and echo

702dd59

Set extend PATH in tox setenv block

196b58b

If search for orca fails, try orca.js. Also display search path

2f08714

details in error message.

Changing course. Try creating a conda environment

2217628

Attempt to cache miniconda directory

a6a255a

jonmmease added 21 commits August 22, 2018 09:11

Reorder properties in config and status repr to be a bit more log…

1fa866b

…ical

Merge branch 'master' into orca_integration

cff8f1c

Add validate option to to_image and write_image

9c70b3a

to bypass figure dict validation. Also improve presentation of orca error messages and added a special check for EPS failures that might be due to the needed poppler dependency

Added more helpful (at least friendlier) error message when plotly.py…

8d97381

… fails to communicate with the orca server process.

Add poppler dependency to circle conda environment

c58d688

Needed for EPS tests

Create failed directory and print failed image path

8967568

Fix store_artifacts directory

c5dd2f3

Update EPS images for CircleCI

dab4e14

Re-enable all tests

3137b0c

Extend time for orca startup to help CI robustness

005299f

Remove explicit encoding when opening orca settings for Python 2 compat

5f8b813

Fix executable version check error messages

94c1bab

Make orca executable detection more specific.

2b4fc0c

Python 2 test fix [ci skip]

54d38d4

Use status.executable when building the orca command.

f38beb5

On windows, this avoids Popen being unable to find the orca executable when it is on the environment path. [ci skip]

Use os.pathset so we don't mess up Windows paths

e383c57

[skip ci]

Add helpful message when orca returns a 'plotly.js error' on mapbox

88b37a0

If orca returns a 525: 'plotly.js error', and the figure contains at least one mapbox trace, and not mapbox_access_token is configured, then include a error message explaining what to do.

Removed prototype plotly.io._show module.

0a0adac

revert tox changes (not using tox to test orca after all)

f37682f

Remove topojson from packagedata (not distributing it after all)

d8958aa

Full PR review

25f1f81

jonmmease merged commit 46e0683 into master Aug 25, 2018

jonmmease mentioned this pull request Mar 21, 2019

plotly.io HTML functions, modular renderers framework, future flags system #1474

Merged

1 task

nicolaskruchten deleted the orca_integration branch June 19, 2020 16:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Orca integration for static image export #1120

Orca integration for static image export #1120

jonmmease commented Aug 19, 2018 •

edited

Loading

jonmmease commented Aug 25, 2018

Orca integration for static image export #1120

Orca integration for static image export #1120

Conversation

jonmmease commented Aug 19, 2018 • edited Loading

Overview

Background

Architecture

Implementation Notes

Starting the server

Communicating with the Server

Shutting down the server

Shutdown server after timeout

Shutdown on exit

API Design

Image export

Orca management

Manual server management

Orca config

Saving configuration properties

Orca status

Error messages

Testing

Performance

Produced images

TODO

jonmmease commented Aug 25, 2018

jonmmease commented Aug 19, 2018 •

edited

Loading