Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exec: decimal encoding support for Arrow representation #37044

Closed
jordanlewis opened this issue Apr 23, 2019 · 6 comments
Closed

exec: decimal encoding support for Arrow representation #37044

jordanlewis opened this issue Apr 23, 2019 · 6 comments
Labels
A-sql-vec SQL vectorized engine C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-queries SQL Queries Team

Comments

@jordanlewis
Copy link
Member

jordanlewis commented Apr 23, 2019

Decimal serialization is currently not supported for distributed vectorized flow execution. We just need to figure out a way to quickly encode apd.Decimals, which we can then send as a serialized arrow byte slice.

Jira issue: CRDB-4453

@asubiotto asubiotto self-assigned this Apr 23, 2019
@jordanlewis jordanlewis added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-sql-vec SQL vectorized engine labels May 2, 2019
@asubiotto
Copy link
Contributor

cc @mjibson any thoughts on what the best way to encode apd.Decimals would be?

@maddyblue
Copy link
Contributor

apd just got Compose and Decompose support which will be used in an upcoming Go release for database/sql serialization of decimals. You could use that for now.

It's not the fastest possibility (because it does some endian swapping), but it is already written and works with primitive types. For highest speed work I'd recommend copying those functions but removing the byte swapping.

@arctica
Copy link

arctica commented Nov 19, 2019

We have a big table with a lot of DECIMAL columns for timeseries purposes. Support for DECIMAL in vectorized execution could be very interesting.

Additionally we also have a TIMESTAMP column which prevents vectorized execution. Should I open an issue requesting support for this data type?

@asubiotto
Copy link
Contributor

asubiotto commented Nov 19, 2019

Hi @arctica,

Thanks for your interest in vectorized execution! Supporting DECIMAL is a priority for us although we're working on adding external storage to support spilling to disk for large queries first. We'll update this issue once we start working on DECIMAL support.

With respect to TIMESTAMP, we added support for it in #42416, which should be in the 19.2.1 release (to be released in the next few weeks). Note that this is barebones support (binary expressions with timestamps are still not supported, for example). There is an issue for that here: #42044

craig bot pushed a commit that referenced this issue Jan 14, 2020
43311: colserde: add simple serialization for DECIMALs r=yuzefovich a=yuzefovich

This commit adds simple serialization of DECIMALs using the provided
MarshalText method.

Touches: #37044.

Release note (sql change): Previously, DECIMALs could not be sent over
the network when the computation was performed by the vectorized engine.
This has been fixed, and the vectorized engine now fully supports
DECIMAL type.

43793: sql: properly enforce uniqueness of key columns for FK references r=lucy-zhang a=lucy-zhang

We introduced a bug in 19.2 that would allow, e.g., a unique index on `(a, b,
c)` to be used as an index that is supposed to enforce uniqueness for a foreign
key constraint pointing only to `(a, b)`. This PR reintroduces a check that the
indexed columns match the FK columns exactly.

Fixes #42680.

Release note (bug fix): Fixed a bug introduced in 19.2 that would allow foreign
keys to use a unique index on the referenced columns that indexed more columns
than were included in the columns used in the FK constraint, which allows
potentially violating uniqueness in the referenced columns themselves.

43956: build: post issues from local roachtests r=andreimatei a=tbg

This wasn't happening due to some missing env vars. I noticed this since
there are many failures of acceptance/version-upgrade not reflected via
created issues (I get email notifications from TC).

Release note: None

Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
Co-authored-by: Lucy Zhang <lucy-zhang@users.noreply.github.com>
Co-authored-by: Tobias Schottdorf <tobias.schottdorf@gmail.com>
@jlinder jlinder added the T-sql-queries SQL Queries Team label Jun 16, 2021
@jordanlewis
Copy link
Member Author

Did we solve this with flat decimal? @yuzefovich

@yuzefovich
Copy link
Member

We do support encoding decimals since #43311, but the speed of serialization of decimals is much to be desired (e.g. here are up-to-date numbers). I think we can close this issue and #44195 tracks the improvement to the speed of serialization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-vec SQL vectorized engine C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-queries SQL Queries Team
Projects
None yet
Development

No branches or pull requests

6 participants