-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cassandra serializes Dependencies when only dependencies.links is ever read #1008
Comments
@yurishkuro any thoughts on this? |
The other fields of this struct are also used. Spark job does this:
and Cassandra store looks at these timestamps when filtering iirc. That's not to say that it must stay this way, a cleaner approach would've been to extend the table schema and store those values as proper fields in the table. Is anything else dependent on the Link struct? If not, this could be a candidate for de-thrifting as well, e.g. by using native vector types that Cassandra supports (not sure how easy it is to write those from Spark). As far as transition from old schema to new, I would instead go with a simple migration script that would copy from thrift-based dependencies table into a new table with clean schema, without thrift. Then people can upgrade both the server and Spark job and still have all the historical data (e.g. we have no TTL on the dependencies table, we keep it forever). |
@yurishkuro nothing in the CQL deserializes thrift. While it is true that we store these dates, they aren't used anywhere except to make epochDayMillis, which could be done directly. I'd like to separate complete model overhaul from the topic of extra fields, if possible. |
@yurishkuro ps I opened #1009 on a thrift-less schema. If you don't mind, please jot your thoughts about what it would look like there, so the comments don't end up lost. cheers! |
This may be a historical thing, but the Dependencies type is only used by Cassandra, and only its links field is ever used. I'd recommend we remove the type and change the data stored in cassandra to just serialize links.
Here's a plan:
CassandraDependencyStore.storeDependencies(epochDayMillis: Long, links: Seq[DependencyLink])
which serializes only the link list as a byte buffer.Dependencies
,DependencyStore.storeDependencies(dependencies: Dependencies)
Once this code is out, zipkin-dependencies-spark could release a version which moves to the new method, which serializes only the links object. The instructions could imply a minimum zipkin-query-server version.
Note that this is different than changes around spans, which often imply affecting end-user apps. Currently, only zipkin-dependencies-spark is dependent on this data structure and directly affected by this change.
The text was updated successfully, but these errors were encountered: