Cassandra serializes Dependencies when only dependencies.links is ever read #1008

codefromthecrypt · 2016-03-01T04:01:35Z

This may be a historical thing, but the Dependencies type is only used by Cassandra, and only its links field is ever used. I'd recommend we remove the type and change the data stored in cassandra to just serialize links.

Here's a plan:

Add CassandraDependencyStore.storeDependencies(epochDayMillis: Long, links: Seq[DependencyLink]) which serializes only the link list as a byte buffer.
Deprecate Dependencies, DependencyStore.storeDependencies(dependencies: Dependencies)
- Note cassandra is currently the only user of storeDependencies anyway, and its only user, the spark job uses CassandraDependencyStore directly.
Change the deserializer to peek to tell if the dependencies are a list of links or a dependencies object (like Supports reading multiple spans per Kafka message #995). That means the query server can operate on both types without a schema change.

Once this code is out, zipkin-dependencies-spark could release a version which moves to the new method, which serializes only the links object. The instructions could imply a minimum zipkin-query-server version.

Note that this is different than changes around spans, which often imply affecting end-user apps. Currently, only zipkin-dependencies-spark is dependent on this data structure and directly affected by this change.

The text was updated successfully, but these errors were encountered:

codefromthecrypt · 2016-03-01T04:02:27Z

@yurishkuro any thoughts on this?

yurishkuro · 2016-03-01T04:41:15Z

The other fields of this struct are also used. Spark job does this:

  def saveToCassandra(sc: SparkContext, keyspace: String, dependencies: DependenciesInfo): Unit = {
    val thrift = dependencies.toDependencies(startTs = startTs, endTs = endTs).toThrift

and Cassandra store looks at these timestamps when filtering iirc. That's not to say that it must stay this way, a cleaner approach would've been to extend the table schema and store those values as proper fields in the table.

Is anything else dependent on the Link struct? If not, this could be a candidate for de-thrifting as well, e.g. by using native vector types that Cassandra supports (not sure how easy it is to write those from Spark).

As far as transition from old schema to new, I would instead go with a simple migration script that would copy from thrift-based dependencies table into a new table with clean schema, without thrift. Then people can upgrade both the server and Spark job and still have all the historical data (e.g. we have no TTL on the dependencies table, we keep it forever).

codefromthecrypt · 2016-03-01T04:45:52Z

@yurishkuro nothing in the CQL deserializes thrift. While it is true that we store these dates, they aren't used anywhere except to make epochDayMillis, which could be done directly.

I'd like to separate complete model overhaul from the topic of extra fields, if possible.

codefromthecrypt · 2016-03-01T04:49:07Z

@yurishkuro ps I opened #1009 on a thrift-less schema. If you don't mind, please jot your thoughts about what it would look like there, so the comments don't end up lost. cheers!

jorgheymans added the cassandra label May 22, 2020

jorgheymans mentioned this issue Jun 30, 2020

Thrift-less cassandra schema #1009

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cassandra serializes Dependencies when only dependencies.links is ever read #1008

Cassandra serializes Dependencies when only dependencies.links is ever read #1008

codefromthecrypt commented Mar 1, 2016

codefromthecrypt commented Mar 1, 2016

yurishkuro commented Mar 1, 2016

codefromthecrypt commented Mar 1, 2016

codefromthecrypt commented Mar 1, 2016

Cassandra serializes Dependencies when only dependencies.links is ever read #1008

Cassandra serializes Dependencies when only dependencies.links is ever read #1008

Comments

codefromthecrypt commented Mar 1, 2016

codefromthecrypt commented Mar 1, 2016

yurishkuro commented Mar 1, 2016

codefromthecrypt commented Mar 1, 2016

codefromthecrypt commented Mar 1, 2016