Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing individual map fields #284

Closed
gregorskii opened this issue Feb 25, 2020 · 2 comments
Closed

Parsing individual map fields #284

gregorskii opened this issue Feb 25, 2020 · 2 comments
Labels

Comments

@gregorskii
Copy link

gregorskii commented Feb 25, 2020

Hi there,

I am trying to use Avro to store data that is used by multiple systems. We are storing analytics data in Big Table as a textual Avro representation. We have defined a schema for how the data is stored. We can parse it using this library, and validate the stored data matches the schema, but when we read the payload in the stored data we cannot find how to parse the data to get a JSON representation of it.

We have this avsc:

{
  "type" : "record",
  "name" : "Metric",
  "namespace" : "com.project.classes.avro",
  "fields" : [ {
    "name" : "name",
    "type" : "string",
    "default" : "default"
  }, {
    "name" : "value",
    "type" : [ "null", {
      "type" : "map",
      "values" : [ "long", "double", "bytes" ]
    } ],
    "default" : null
  }, {
    "name" : "valueType",
    "type" : {
      "type" : "enum",
      "name" : "ValueType",
      "symbols" : [ "LONG", "DOUBLE", "BYTES" ]
    }
  } ]
}

This raw data in Big Table:

  "{\"name\":\"event_name\",\"value\":{\"map\":{\"total\":{\"long\":1},\"dimension1\":{\"long\":1},\"dimension2\":{\"long\":1},\"dimension4\":{\"long\":1},\"dimension5\":{\"long\":1},\"dimension6\":{\"long\":1},\"dimension7\":{\"long\":1}}},\"valueType\":\"LONG\"}"

And this result:

Metric {
  name: 'event_name',
  value:
   { total: Branch$ { long: 1 },
     'dimension1': Branch$ { long: 1 },
     'dimension2': Branch$ { long: 1 },
     'dimension3': Branch$ { long: 1 },
     'dimension4': Branch$ { long: 1 },
     'dimension5': Branch$ { long: 1 },
     'dimension6': Branch$ { long: 1 } },
     'dimension7': Branch$ { long: 1 } }
  valueType: 'LONG' }

We used this process:

const avsc = require('avsc');
const fs = require('fs');

let metricType;

try {
  const typeString = fs.readFileSync('./avro/Metric.avsc', 'utf8');
  metricType = avsc.Type.forSchema(JSON.parse(typeString));
} catch (error) {
  console.error(error);
}

const validate = (value) => {
  return metricType.isValid(value);
};

const getData = () => {
	const valueString = row.data.value; // the string representation above
	const value = metricType.fromString(valueString);
	if (validate(value)) return value;
}

Are we misunderstanding how to process the value array into regular json? We would expect:

{
  "name": "event_name",
  "value": {
    "total": 1,
    "dimension1": 1,
    "dimension2": 1,
    "dimension3": 1,
    "dimension4": 1,
    "dimension5": 1,
    "dimension6": 1,
    "dimension7": 1
  }
}

Thanks!

@mtth
Copy link
Owner

mtth commented Feb 28, 2020

The dimension values are wrapped inside an extra object to keep track of their type. JavaScript doesn't have typed numbers the same way other languages (e.g. Java) do: everything is a double. With the representation you expected, we wouldn't be able to know whether the 1 values are longs or doubles -- at least without additional context-specific information. This section of the API doc has more information in case you are curious.

Note that in your case, it looks like you can use additional information present in your record to infer the type. IIUC, the valueType enum indicates what branch is used in the dimensions of the same metric. You can use this in combination with logical types to achieve your ideal representation. This comment has an example of how to do so as well as a bit more context.

@mtth mtth added the question label Feb 28, 2020
@gregorskii
Copy link
Author

Thank you for the detailed reply. We figured this was the case. I will look into the notes you provided.

I tried getting logical types to work using a resolver, but without luck. I think I was trying to parse the whole value, not just the single doubles. I ended up using a map with the lower case valueType as a selector. I’ll try to get it working the correct way.

🙏

@mtth mtth closed this as completed May 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants