Replies: 2 comments 3 replies
-
Appreciate the detailed releases notes, does this release change the focus of perspective more towards |
Beta Was this translation helpful? Give feedback.
0 replies
-
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
We've just released Perspective v3.0.0, the latest major release of the open-source high performance data-visualization and analytics component underlying Prospective.co.
Perspective
3.0.0
is, by definition, the worst version of Perspective ever released, on account of the Major Version tick which indicates we've intentionally broken the API of this previously-working product. Until today, Perspective2.10.1
, had matured to confident stability on the back of 25 minor and patch version releases in the2.x.x
series. Now ... well, Perspective2.10.1
still exists technically, but we've also introduced a shiny new version that is guaranteed (numerically) to be the worst version yet!There are a lot of breaking API changes in this release, some behavior has been changed or deprecated, and a few new features have been added as well. Read on to learn some mor eabout why we went and broke this perfeclty good open-source project, or skip to the end to get a summary of the actual API changes.
How did we get here?
Perspective used to look like ****. It still does, but it used to also.
In ancient times (2016), Perspective's data engine bore little resemblance to its modern OSS version. Its API was complex, often requiring dozens of classes to configure even simple data queries, and much of its implementation was buggy, incomplete or just plain non-existent. Data ingestion had to be written by hand, cell-by-cell. The developer experience was overall cumbersome, error-prone and frustrating, especially for implementing UI workflows with dynamic (not hard-coded) queries.
Perspective OSS did away with most of that nonsense when porting this component to WebAssembly. In its new life as a data engine for a high-performance Browser-based JavaScript UI, Perspective needed a simpler API that could be queried from a single simple serializable query configuration, rather than a long sequence of oft-quirky method calls. Manual, cell-by-cell data ingestion was replaced with efficient support for formats data actually comes in, such as CSV, JSON and Apache Arrow.
The overall API was reduced to just 2 classes comprising 39 methods in its JavaScript incarnation. This simplicity allowed us to architect a UI which was mostly decoupled from the features of the engine, only interacting with it asynchronously and windowing data requests to avoid extracting the entire dataset at once in-memory.
Everything was fixed and no further changes were ever needed!
Further changes were needed
As the project improved, new features demanded more and more of our simple data engine architecture.
Perspective soon added WebWorker support, allowing queries against the data engine to run in parallel with the UI rendering. Accompanying this change came a rudimentary JSON-based RPC protocol, based on the newly-simplified 39 method data engine JavaScript API.
Later, Python, Jupyter and Node.js engine bindings were added via a WebSocket transport, leading to an explosion of Client and Server implementations for this ad-hoc JSON RPC protocol. The duplicate protocol implementations suffered from sometimes-subtle system idiosyncrasies as well as copious opportunity for simple programmer error. Lacking a spec, each implementation grew its own extensions and exceptions -
DataFrame
support in Python,Date
object support in Node.js, etc. As behavior between implementations began to drift, bugs began to emerge and Perspective's test suite struggled to cope.Further, network transports like the WebSockets API have message size, update rate and connection durability limitations not present in the relatively straightforward WebWorker API. Suddenly, our simple RPC protocol needed features such as batching, chunking, throttling, error transmission and multiplexing. Features which needed to be duplicated, tested and documented for each combination of binding Language, Client and Server.
This design limited how much we could extend the platform, and how quickly. It locked us into the existing API patterns, and the tight coupling between engine and RPC protocol constrained what features we could deliver.
In the end, at least we were confident the complexity was justified by the engine's unique WebAssembly support.
WebAssembly support is no longer unique
When Perspective was open-sourced circa 2017, there weren't many options available for the plucky web developer seeking a high performance browser-side query engine. Today, that is no longer the case. Perspective's own data engine continues to offer a unique mix of high-performance data ingestion & processing, OLAP-style multi-axis pivotting and streaming support - but other offerings are starting to grow their own unique features in this space. As the ecosystem matures, we'd like to extend our UI to the feature sets of new data engines availble in the browser, allowing high performance visualization on top of shared data in engines like DuckDB, rather than serializing and copying as must be done today.
On the server (in Python for example), the environment is much more competetive. There are great options available for streaming, and basically any other data consideration is catered to with mature and performant solutions. While Perspective on the server is still quite fast, its simple design limits the feature set and overall scalability - and for ecosystems with a mature data solution already in place, Perspective on the server becomes an extra in-memory bottleneck for distributed or out-of-memory Tables. Supporting these platforms via a common high-performance virtual data API, without copying into Perspective's in-memory model, would enable a Perspective UX with the raw query performance, table size and features of your server-side data engine of choice.
What do we want to change?
The Perspective project's goal is to be a great data visualization tool. To the extent that a data engine is core to the overall user experience, it is core to Perspective. Metrics like CPU/memory performance, query features, data size limits and idempotent streaming/static, ultimately limit the problems that can be solved with Perspective, making it less useful and introducing the need for specialized cohorts for "big", "fast" or "complex" data sets.
In order to support pluggable data engines, we need a stable, minimal, unambiguously simple API - to minimize the developer effort required to implement this API on the Server side for new data engines such as DuckDB, SQLite and Polars. It needs to be easy to implement this API with good performance by default, and easy to implement the entire API correctly so that Perspective's UX doesn't degrade between engines. It needs to be tested, documented, and properly versioned, so integrations can be robust as the feature set expands.
We need a portable design which makes it easy to support Perspective myriad of current-and-future language bindings. The cost of this complexity in
2.10.1
made certain feature choices (like SQL join operations) intimidating, because any extension of the RPC API would need multiple platform-specific implementations. The more code we can write once and compile to multiple platforms, the less code there is to test, document, or potentially break.As I said in the title, any change of this mangnitude will make this the worst version of Perspective ever released. It would therefore be helpful if this API (RPC and developer-facing) can be iterated on rapidly. We will add new features to Perspective's UX and data engine, as well as begin to model the feature sets of other engines. All of these will likely require elaborations (or consolidations) of this new RPC API. The lack of a reference implementation of
2.10.1
's API meant that we relied on ancestral stability to maintain order, and changing a published API meant tedious and risky re-building multi-language refactoring (and 3 new docs sites to update and publish!).Overall, we want a solution that is rigorous, testable, typeable, assertable, and portable.
Design decisions
Rust
The friction with the original API led us to the decision to write our new
Client
in a compiled language that would be portable and easily bindable to all the different target languages we want to support. Rust has the best story for cross-language compatibility right now: it's fast (can make memory-efficient use of a binary message protocol, e.g.), and it embeds in everything (we usepyo3
andwasm-bindgen
). It has excellent build tooling in general, and (subjectively) the best WebAssembly compatibility story in general, along with the most mature WebAssembly ecosystem.Writing the Client bindings in Rust resulted in a subsystem that behaves the same in every language. It has allowed us to re-use our benchmark and test suites interchangeably across all languages we support, and eliminated thousands of lines of duplicated code between languages. It promises to keep ongoing maintenance of language bindings low, and has already allowed us to add native bindings to Rust itself.
It's also fun! But that wasn't the point (... unless?)
Protobuf
Choice of wire format can be a contentious affair for a project, especially when performance is of prime concern as it often correlates inversely with developer sanity. For Perspective
3.0.0
however, that question was easy and the answer was Protobuf.We wanted something with efficient serialization and broad platform support, but more importantly we wanted something rigorous and dependable. The shared
Client
/Server
design allows us to potentially swap out this serialization format for all client/server/languages at once. The appeal of Protobuf came largely from its many high-quality implementations, long industry track record and general aura of warm indifference it inspires. No one ever got fired for suggesting Protobuf, as I'm sure someone important said once.Using a binary protocol allows target languages like Python to parse and generate the message stream efficiently and off-GIL, so that communications overhead won't impact the runtime performance of the Python server. In WebAssembly, it allows us to avoid extensive allocation on JavaScript heap when interacting with messages (even though the runtime is currently single-threaded).
What's changed?
Rust library
Perspective
3.0.0
adds a native Rust library (perspective
on crates.io) alongside the Python and JavaScript versions. See the newrust-axum
example which embeds Perspecitve in anaxum
.C++ Server
The Perspective Server (also package as
perspective-server
on crates.io) has been ported almost entirely into C++ for everything (there is no JS or Python wrapping layer), and the API is simply two methods:receive_request
andsend_response
, which each take a protobuf payload. As a result, binding the server into other languages is dead simple, and binding the client to an arbitrary message transport (Socket, external message queue, smoke signal, etc.) similarly only requires the send and receive implementations in the host language.Many features needed to be de-duplicated from Python/JavaScript into an idiomatic C++ implementation:
Dict
and JavaScriptObject
forms via language-native stringification)."date"
and"datetime"
parsing (uses Apache Arrow's built-in parser, + many of the legacy formats from2.10.1
).View
config parsing and creation of the internalView
object.PerspectiveManager
(Python),WebSocketServer
(JavaScript), etc. state management for collections ofTable
s andView
s is now built-in to theClient
/Server
API.As a result, new features like improved
"datetime"
type string parsing behave consistently across languages by default.Docs
Since the language bindings in Perspective
2.10.1
were hand-written, so too was the documentation, a mix of Sphinx, JSDoc, TypeScript and Markdown, all self-hosted without proper versioning. As all language bindings are now written in Rust, and Rust has excellent documentation tooling built-in, Perspective now has consistent documentation across languages that is properly versioned.Performance
Some APIs, such as JSON ingestion, are now much faster even in single-threaded mode.
Performance improvements were not the goal of
3.0.0
development. Nevertheless, we diligently track Perspective's CPU performance across versions to protect against performance regressions, and the3.0.0
release has recorded an overall CPU time improvement on every method we benchmark, JSON ingestion benchmarks in particular.While Perspective lacks comprehensive concurrent benchmarks so far, ad-hoc testing of
3.0.0
seems to exhibit much better thread utilization than2.10.1
, especially for paths such as JSON ingestion where logic has been moved from Python to C++ (and the GIL is released). We expect to be able to iterate quickly with the new API design, and we've even merged some early multi-core optimization on the back of this work.No, like, what's actually changed, like in the API?
In all languages:
Python's
PerspectiveManager
, the browser'sPerspectiveWorker
, etc., have been replaced byServer
, an explicitly instanced engine API. AServer
hostsTable
s and shares no state with otherServer
s in the same process, aside from a global executor pool on platforms which support threading.A
Client
is needed to send commands to aServer
. Methods like JavaScript'sperspective.websocket()
now return an instance ofClient
, so for the most part the user experience here is unchanged from2.10.1
. However,Client
can be implemented for an arbitrary transport (like a Socket) with only a few methods, analogous to "send protobuf" and "receive protobuf". This will make the process of extending Perspective to new platforms (like Rust!) much easier.JSON (JavaScript) and
Dict
/List
(Python) has been streamlined. This was previously implemented internally through the legacy cell-by-cell batch update API, leading to bad performance and behavior drift. In3.0.0
, JSON is now parsed and generated entirely in C++ via RapidJSON. While the browser generally has excellent JSON parsing performance, the resulting JavaScript objects need further processing to be arranged as PerspectiveTable
. With this change, loading data in JSON format is substantially faster, moreso if you can pass the JSON data as a String data type rather than a JavaScriptObject
or PythonDict
(internally we'll now stringify the latter types and use RapidJSON to parse the, and this is still much faster than2.10.1
!).Previously, Perspective supported (
Date
,Datetime
) (JS), and (date
,datetime
,pandas.Timestamp
) (Python) in JSON/Dict
format. However, these types are not JSON serializable, which was a source of implementation inconsistency in2.10.1
. In3.0.0
, these types are no longer directly supported. See the language-specific notes on this data type below.In JSON input modes, Perspective used to perform platform-specific coercion of
string
types to"date"
and"datetime"
types. This is now standardized in3.0.0
, so some parsing behavior may be slightly differentPartial updates used to be supported on a row-by-row basis using JSON
null
vsundefined
values to differentiate "reset" and "ignore" update behavior, respectively. However, this was difficult to support consistently in formats like CSV and Apache Arrow, which lack a distinct "ignore" value like JavaScript'sundefined
. In3.0.0
, partial updates are still supported, but the entire column must now be omitted per update batch; if you need to apply a partial update with mixed missing columns per row, you'll need to split the batch manually before callingTable.update()
.Perspective's
ExprTK
integration has some extensions, e.g. for handling string columns & literals, which was implemented per-platform in the client (for some reason). This behavior has been rewritten in C++, so all Perspective servers behave the same now - but legacy applications with complex expressions may find some "valid" expression in theTable.view
andTable.validate_expr
commands no longer work. See Python-specific notes below.JavaScript:
All JavaScript Perspective packages are now ES Modules (
type: "module"
inpackage.json
). This requires<script type="module">
tags when importing the CDN versions, or a bundler that properly understands ES Modules. While we still only officially provide a bundler plugin foresbuild
, in3.0.0
it should be much easier to write bindings for new bundlers, as there is very little JavaScript left and the boostrap process without a bundler is much simpler.perspective.table()
constructor no longer supports schema inference for JSON columns withDate
andDatetime
values. These non-JSON compatible types can still be coerced into Perspective (or rather - the browser will auto coerce these to numeric types, which Perspective can coerce further), but a schema must be provided to the constructor to inform Perspective to do so, becauseJSON.stringify
will coerce these tonumber
which causes Perspective to infer them asinteger
. This change simplifies the API quite a bit as well as making it consistent in behavior between Python, JavaScript and Rust.perspective.worker()
andperspective.websocket()
are now asynchronous and must beawait
-ed.View.on_update()
,View.on_remove()
,View.on_delete()
now return callback ID values that must be provided to their reciprocalView.remove_update()
(etc., respectively).perspective.memory_usage()
is renamedperspective.system_info()
.<perspective-viewer>
must now be imported with<script type="module">
. In order to call methods on a<perspective-viewer>
custom element from a script, you must eitherimport "@finos/perspective-viewer";
orawait customElements.whenDefined("perspective-viewer");
to await the WebAssembly module compilation.Python:
Python wheels now target
abi3-py39
, adding support for Python3.12
and beyond, but deprecating3.8
and below.perspective-python
can still be built from source for these platforms.The
perspective
module no longer exports or instantiates a defaultServer
, instead you must create aServer
and synchronousClient
before you can create aTable
. We made this change to keep the API consistent across platforms, and to minimize confusion in contexts where you may accidentally create alternativeServer
instances, such as passing data to thePerspectiveWidget
Jupyter widget constructor. TODO A defaultServer
may be added to the root in3.1.0
.date
,datetime
andpandas.Timestamp
(inDict
s at least) are not supported by theTable
constructor anymore (as described above). Internally, Perspective usesjson.dumps()
to stringify input, which can be somewhat configured globally if you choose. In3.0.0
, it is best to leave JSON in string format if you can! Since JSON parsing now occurs in C++ rather than Python, it is now off-GIL and threadsafe, in addition to improved single-threaded performance.pandas.DataFrame
is no longer directly supported byperspective.table()
, but they may still be loaded internally ifpyarrow
is available in the environment. This load path is both dramatically less code and faster than 2.x, butpyarrow
is much more stringent about type coercion/inference than Perspective 2.x is. TODO We plan to remove this from the engine entirely.Table.validate_expr
had a different return format (aList
ofList
s, but really tuples) from the JavaScript version. This behavior now uses the JavaScript version'sList
ofDict
output. See docsIn Jupyter:
PerspectiveWidget
constructor kwargsserver
andclient
have been replaced withbinding_mode
, which can be either"server"
(the default) or"client-server"
. Previously,client=False, server=False
was a nonsensical case, and justclient=True
relied on custom Python data marshalling to JSON (causing non-idiomatic behavior & performance).When passing data (in Perspective-compatible format) to
PerspectiveWidget
, instead of an instance of aTable
, a newServer
andClient
are implicitly created internally (because the globalServer
instance has been removed). In order to access theTable
, you must callPerspectiveWidget.table
or the newly addedPerspective.client
properties.The long view
Befitting a release canonically the worst since launch, the Perspective
3.x.x
series is only going to get better!3.0.0
is not just a major internal change, it is a stable platform upon which we intend to rapidly innovate. We'll post more about our future plans soon!We think Perspective
3.0.0
's design unlocks the next stage of the project's evolution. It has reduced the accidental complexity related to language bindings and the client-server API, and allowed us to increase the breadth and depth of our test coverage with a single suite to cover all our implementations. We'll be able to add new features more quickly, and we'll have fewer bugs in fewer subsystems. In Perspective 3.0, we've been able to take advantage of better tech and simpler design to make 3.0 easier to integrate into your enterprise systems.Special Thanks!
A special thanks to the main architects/contributors for this release, @timbess, @sinistersnare and @tomjakubowski .
Full Changelog: v2.10.1...v3.0.0
This discussion was created from the release v3.0.0.
Beta Was this translation helpful? Give feedback.
All reactions