-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Datasource *sometimes* adds null
attribute to a feature when it lacks an attribute another feature has
#668
Comments
More info: I'm iterating through the datasource's features according to the example in the docs: var datasource = new mapnik.Datasource(datasourceOptions);
var features = datasource.featureset();
var feature = features.next();
while (feature) {..} Also, here's the sample GeoJSON that is producing these odd results: {
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"properties": {
"name": "foo",
"power": 6,
"mixed": 7,
"numbers": [1, 2, 3],
"astonishing": true
},
"geometry": {
"type": "Point",
"coordinates": [
84.72656249999999,
55.7765730186677
]
}
},
{
"type": "Feature",
"properties": {
"name": "bar",
"mixed": "7",
"power": 99
},
"geometry": {
"type": "LineString",
"coordinates": [
[
84.462890625,
55.32914440840507
],
[
86.0009765625,
55.751849391735284
],
[
86.044921875,
56.04749958329888
]
]
}
},
{
"type": "Feature",
"properties": {
"name": "baz"
},
"geometry": {
"type": "Point",
"coordinates": [
83.49609375,
56.48676175249086
]
}
},
{
"type": "Feature",
"properties": {
"name": "haha"
},
"geometry": {
"type": "Point",
"coordinates": [
85.341796875,
56.8249328650072
]
}
},
{
"type": "Feature",
"properties": {
"name": "hoho",
"numbers": [1, 2, 3],
"data": 6
},
"geometry": {
"type": "LineString",
"coordinates": [
[
85.8251953125,
55.50374985927514
],
[
86.4404296875,
55.677584411089505
],
[
86.7919921875,
55.65279803318956
]
]
}
},
{
"type": "Feature",
"properties": {
"name": "hehe",
"power": 86,
"data": {
"foo": "bar"
},
"onlyNull": null,
"description": "Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum."
},
"geometry": {
"type": "Polygon",
"coordinates": [
[
[
80.947265625,
57.397624055000456
],
[
80.5078125,
56.145549500679074
],
[
81.650390625,
56.145549500679074
],
[
81.650390625,
56.559482483762245
],
[
80.947265625,
57.397624055000456
]
]
]
}
}
]
} Attributes that are having Attributes that not not having |
@davidtheclark Mapnik (core) reads the first 5 features, by default, to assemble the schema of possible attributes. The default of 5 is conservative for performance reasons on very large feature collections. Can you try passing |
Seems like the same thing is happening when I add |
@davidtheclark - I'll take a look tomorrow, thanks for reporting. |
Great! To be clear: what I'd like would be for no null-valued attributes to be inserted into the features. That way I can distinguish between an attribute that was actually set to null in the feature and an attribute that was not a part of (undefined for) the feature.
|
Using for f in ds.all_features():
... print(f["astonishing"],f["mixed"],f["numbers"],f["power"],f["name"])
...
True 7 [1,2,3] 6 foo
None 7 None 99 bar
None None None None baz
None None None None haha
None None [1,2,3] None hoho
None None None 86 hehe ^ looks reasonable to me - features that don't have particular attribute return >>> for f in ds.all_features():
... f.to_geojson()
...
'{"type":"Feature","id":1,"geometry":{"type":"Point","coordinates":[84.7265625,55.7765730186677]},"properties":{"astonishing":true,"mixed":7,"name":"foo","numbers":"[1,2,3]","power":6}}'
'{"type":"Feature","id":2,"geometry":{"type":"LineString","coordinates":[[84.462890625,55.3291444084051],[86.0009765625,55.7518493917353],[86.044921875,56.0474995832989]]},"properties":{"mixed":"7","name":"bar","power":99}}'
'{"type":"Feature","id":3,"geometry":{"type":"Point","coordinates":[83.49609375,56.4867617524909]},"properties":{"name":"baz"}}'
'{"type":"Feature","id":4,"geometry":{"type":"Point","coordinates":[85.341796875,56.8249328650072]},"properties":{"name":"haha"}}'
'{"type":"Feature","id":5,"geometry":{"type":"LineString","coordinates":[[85.8251953125,55.5037498592751],[86.4404296875,55.6775844110895],[86.7919921875,55.6527980331896]]},"properties":{"data":6,"name":"hoho","numbers":"[1,2,3]"}}'
'{"type":"Feature","id":6,"geometry":{"type":"Polygon","coordinates":[[[80.947265625,57.3976240550005],[80.5078125,56.1455495006791],[81.650390625,56.1455495006791],[81.650390625,56.5594824837622],[80.947265625,57.3976240550005]]]},"properties":{"data":"{foo:\\"bar\\"}","description":"Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.","name":"hehe","power":86}}' Note, "astonishing" appears only once as in original ^ I think some logic is faulty on JS ( |
After some digging it turned out that the best place to fix this is in https://github.com/mapnik/mapnik mapnik/mapnik@3397b8f#diff-7e10cfbdff7fca38fb1824d1ce771d88R42 ^ I modified |
@artemp to prepare for @davidtheclark testing this I tried node-mapnik master with latest mapnik master but I'm seeing:
Can you replicate these failures? |
@springmeyer - yes, confirming I'm seeing those as well ^. Back to debugging :) |
Doh ;) Or add “special” value type, lets call it “undefined” where value doesn’t exists. |
To distill even further: If we want to support "null" values (i think we do because they are valid in GeoJSON) then we can't rely on |
> d={"name":null}
{ name: null }
> d["name"]
null
> d["undefined"]
undefined
> In JavaScript null is a distinct type that can be assigned ^ |
To properly support Meanwhile, I added { "first": 123, "second":null } { "first": 123} var mapnik = require('mapnik');
mapnik.register_datasource('/opt/mapnik/lib/mapnik/input/geojson.input');
var ds = new mapnik.Datasource({type:'geojson',file:'test.json'});
var featureset = ds.featureset();
var feat = featureset.next();
while (feat)
{
var props = feat.properties();
console.log(props);
console.log("------------NO NULL----------");
props = feat.properties(false);
console.log(props);
console.log("-----------------------------");
feat = featureset.next();
} Sample output { astonishing: true,
data: null,
description: null,
mixed: 7,
name: 'foo',
null: null,
numbers: '[1,2,3]',
onlyNull: null,
power: 6 }
------------NO NULL----------
{ astonishing: true,
mixed: 7,
name: 'foo',
numbers: '[1,2,3]',
power: 6 }
-----------------------------
{ astonishing: null,
data: null,
description: null,
mixed: '7',
name: 'bar',
null: null,
numbers: null,
onlyNull: null,
power: 99 }
------------NO NULL----------
{ mixed: '7', name: 'bar', power: 99 }
-----------------------------
{ astonishing: null,
data: null,
description: null,
mixed: null,
name: 'baz',
null: null,
numbers: null,
onlyNull: null,
power: null }
------------NO NULL----------
{ name: 'baz' }
-----------------------------
{ astonishing: null,
data: null,
description: null,
mixed: null,
name: 'haha',
null: null,
numbers: null,
onlyNull: null,
power: null }
------------NO NULL----------
{ name: 'haha' }
-----------------------------
{ astonishing: null,
data: 6,
description: null,
mixed: null,
name: 'hoho',
null: null,
numbers: '[1,2,3]',
onlyNull: null,
power: null }
------------NO NULL----------
{ data: 6, name: 'hoho', numbers: '[1,2,3]' }
-----------------------------
{ astonishing: null,
data: '{foo:"bar"}',
description: 'Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.',
mixed: null,
name: 'hehe',
null: null,
numbers: null,
onlyNull: null,
power: 86 }
------------NO NULL----------
{ data: '{foo:"bar"}',
description: 'Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.',
name: 'hehe',
power: 86 }
----------------------------- |
Thanks for plugging away at this @artemp. |
@davidtheclark: quick update: @artemp and I worked today to get the latest Mapnik core version (needed for his fix for you) working against latest node-mapnik. Still a few things needed to test, but getting close. Once there we'll be ready to further test the patch here and get you a dev build. ETA later this week. Feel free to ping for an update anytime you are thinking about this. |
@springmeyer Any update on this? |
@davidtheclark - dug into this issue just now. There is some other bug lurking so it is not as simple as updating the Mapnik core version like I thought (which is now done). I'm seeing unpredictable results when the geojson is indexed (index created with |
@artemp as part of testing this (https://gist.github.com/springmeyer/5f9ad5831402587c258116f1e627895c) I also noticed that:
Is round tripping as |
@springmeyer - great catch before The issue was missing quoting for nested objects (mapnik/mapnik#3491) and it's not related to indexing so please verify your setup ^. We already have comprehensive tests covering all JSON properties types but the expected data was bogus (corrected in mapnik/mapnik@5c11fe4) |
Absolutely. |
@artemp @springmeyer Has there been any progress on this issue? Asking because I'm starting work on another project where this has come up. |
@davidtheclark In my mind what still stands here is the problem of null reporting. Mapnik creates (as it parses features) a "context". This "context" is a map of every property key found and an index to where it is. This allows for more efficient storage of strings to reduce memory in featuresets with a lot of features. The impact of this is that we loose the ability to decern So overall I think the question is whether round tripping is important enough to change the way Mapnik works. The perf/memory optimizations that originally motivated the use of a "context" in Mapnik only apply when catching features in memory. Most modern use cases of Mapnik don't actually do this anymore, so we could consider changing this behavior. Let's schedule a time to talk more about this in 2017? In the meantime ponder if you think you can workaround this fully or if you'd like it fixed. |
I don't see a strong reason why the distinction between a That being said...
... this becomes counter-productive when each feature has a handful of properties from a much larger set of possible properties. Think of fully expanding a sparse matrix. I don't know whether such use case is common, but it might be worth considering. Regarding feature memory usage, replacing This would make round-tripping possible, but I'd rather not store |
I'm noticing this oddity when I create a Datasource and iterate through the features.
Let's say feature FA in my GeoJSON has one attribute:
foo: 1
. And FB has one attribute:bar: 2
. When I create a Datasource and check these features within that structure, FA might have two attributes now:foo: 1, bar: null
; and same for FB:bar: 2, foo: null
.So it might seem that Mapnik is giving each feature all the attributes available but with
null
as the value if the feature did not actually have that feature in the original data.But this is not happening consistently. In some sample data I've been using, I've found that Mapnik adds the
null
attributes only sometimes. Whether or not this happens does not seem connected with data type, or the number of times the attribute shows up.I'm not sure what to make of this, or whether I should file this upstream in Mapnik itself.
The text was updated successfully, but these errors were encountered: