-
Notifications
You must be signed in to change notification settings - Fork 598
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can't store entities with embedded entities which have properties > 1500 bytes #1916
Comments
Thanks for reporting. @pcostell is it possible in Datastore to exclude individual properties within an Entity from indexing? |
FWIW I've had a play with the datastore datastore.insert({
key: datastore.key('Stuff'),
data: [{
name: 'description',
value: [{
name: 'text',
value: Buffer.alloc(1501, 'a').toString(),
excludeFromIndexes: true
}, {
name: 'long',
value: true
}]
}]
}, handler); obviously though that wouldn't work as an interface as it would break any properties that have array values e.g. this would break: datastore.insert({
key: datastore.key('Stuff'),
data: [{
name: 'foo',
value: [ 1, 2, 3 ]
}]
}, handler); For reference, this is the hacked reduction (previously here) function to make this work: function encodeValues(acc, data) {
var value;
if (is.array(data.value)) {
value = {
entityValue: {
properties: data.value.reduce(encodeValues, {})
}
};
} else {
value = entity.encodeValue(data.value);
if (is.boolean(data.excludeFromIndexes)) {
var excluded = data.excludeFromIndexes;
var values = value.arrayValue && value.arrayValue.values;
if (values) {
values = values.map(propAssign('excludeFromIndexes', excluded));
} else {
value.excludeFromIndexes = data.excludeFromIndexes;
}
}
}
acc[data.name] = value;
return acc;
} |
Excluding a top-level entity value will exclude it and all its sub-values from indexing. However, it still must be a valid entity. This means that properties that cannot be indexed should be marked accordingly. In the Datastore proto API, this is done by passing excludeFromIndexes at the appropriate levels. This is important from an API perspective because it means an entity value is a valid entity (and could be moved to a top-level entity if desired). |
@pcostell so am I right in thinking that if a top level value was marked as to be excluded from indexing, it would be safe for this SDK to mark all sub-values to be excluded? Given that even if not explicitly excluded they will still be as they're nested inside a value that was excluded. |
That would be safe, but it would represent a departure from the design of the API. |
I'm guessing it would be hard to introduce a syntax that would allow excludeFromIndexes on sub properties, while maintaining backwards compatibility @stephenplusplus? Until that's achieved, would passing down exclude when set on the top level property be viable? I've got the beginnings of it in a branch, happy to make a PR |
Yes, this would be breaking, however we could try something like: datastore.insert({
+ raw: true,
key: datastore.key('Stuff'),
data: [
{
name: 'description',
value: [
{
name: 'text',
value: Buffer.alloc(1501, 'a').toString(),
excludeFromIndexes: true
},
{
name: 'long',
value: true
}
]
}
]
}, handler); If |
@stephenplusplus I don't think that would work for storing array values? What about the datastore.insert({
key: datastore.key('Stuff'),
data: {
description: {
text: Buffer.alloc(1501, 'a').toString(),
long: true
},
values: [ 1, 2, 3 ]
}
} the user could then transform into a raw value by: datastore.insert({
raw: true,
key: datastore.key('Stuff'),
data: Datastore.encodeValue({
description: Datastore.encodeValue({
text: Datastore.encodeValue(Buffer.alloc(1501, 'a').toString(), { exclude: true }),
long: Datastore.encodeValue(true)
}),
values: Datastore.encodeValue([ 1, 2, 3 ])
})
} Not sure if I've got the syntax right, but that's the idea. Also more likely the user would do it programmatically. |
Sorry, I've had my head in many APIs lately. Would you mind showing a bit of code that would highlight this problem? |
@stephenplusplus sure, so if I wanted to store the following JS Object: var personEntity = {
name: "Mary",
metadata: {
bio: "..." // Some long string > 1500 bytes
},
subjects: [{
name: "history",
credits: 2
}, {
name: "french",
credits: 4
}, {
name: "maths",
credits: 4
}]
} I need to have var rawPersonEntity = [{
name: "name",
value: "Mary"
}, {
name: "metadata",
value: [{
name: "bio",
value: "...", // Some long string > 1500 bytes
excludeFromIndexes: true
}]
}, {
name: "subjects",
value: [] // What do I do here? An array will indicate explicit syntax of { name, value }
}] |
Got it, thank you! Would it make sense to recurse the var rawPersonEntity = {
key: datastore.key('Person'),
data: [
{
name: 'name',
value: 'Mary'
},
{
name: 'metadata',
data: [
{
excludeFromIndexes: true,
name: 'bio',
value: '...'
}
]
},
{
name: 'subjects',
value: [
{
name: 'history',
credits: 2
},
// ...
]
}
]
} Otherwise, I think the var rawPersonEntity = {
key: datastore.key('Person'),
data: [
{
name: 'name',
value: 'Mary'
},
{
name: 'metadata',
raw: true,
value: [
{
excludeFromIndexes: true,
name: 'bio',
value: '...'
}
]
},
{
name: 'subjects',
raw: false, // (the default value, but explicitly set for demonstration)
value: [
{
name: 'history',
credits: 2
},
// ...
]
}
]
} Do either of these have any quirks, and if not, which is better? If neither, we can still keep trying :) |
@stephenplusplus thanks for the options - I think the It's pretty fringe edge case at this point - and I'm not sure if it's even possible to do, but if you had an array of objects, could you index them? e.g. var rawPersonEntity = {
key: datastore.key('Person'),
data: [
{
name: 'name',
value: 'Mary'
},
{
name: 'metadata',
data: [
{
excludeFromIndexes: true,
name: 'bio',
value: '...'
}
]
},
{
name: 'subjects',
value: [
{
excludeFromIndexes: true, // Is this something that can be done? Is there a reason why it would be done?
name: 'history',
credits: 2
},
// ...
]
}
]
} Other than that, I can't see any issue with them...Another option would be making an datastore.insert({
key: datastore.key('Person'),
data: datastore.value({
type: datastore.Entity, // Defaults to 'auto'
data: {
name: datastore.value({
type: String,
value: 'Mary'
}),
metadata: datastore.value({
type: datastore.Entity,
value: {
bio: datastore.value({
type: String,
excludeFromIndexes: true,
value: '...'
})
}
}),
subjects: datastore.value({
type: Array,
value: [
datastore.value({
type: datastore.Entity,
value: {
name: datastore.value({
type: String,
value: 'history',
}),
credits: datastore.value({
type: Number,
value: 2
})
}
},
datastore.value({
type: datastore.Entity,
value: {
name: datastore.value({
type: String,
value: 'french',
}),
credits: datastore.value({
type: Number,
value: 4
})
}
},
datastore.value({
type: datastore.Entity,
value: {
name: datastore.value({
type: String,
value: 'maths',
}),
credits: datastore.value({
type: Number,
value: 4
})
}
}
]
})
}
}); Ha - after writing that all out, I realise how incredibly over the top that looks, but it feels robust. Perhaps not as a candidate for quickly writing out simple entities, but perhaps as a way that will work well with a helper utility? e.g. for my use case, I'm converting incoming JSON to an entity, and I want to recursively step through it and add |
@stephenplusplus Is this a bug in the client, a bug in the service, or user support to workaround a documented limitation? |
Looking forward to seeing an improvement on this front. I also noticed a nested array type is forced to be indexed. Also I was hoping the API would let me store nested JSON but I can't figure out if it's possible to do with this API. Exposing something like a datastore.entry factory method would make sense. |
Any updates on this? Happy to make a PR if needed Also @kzahel it will allow nested JSON if you just give it a JSON object as a property, but just can't specify |
Looking back over this, how about using a Symbol to specify You could then use regular ol' syntax in the datastore.insert({
key: datastore.key('Stuff'),
data: {
description: {
text: Buffer.alloc(1501, 'a').toString(),
long: true,
[ datastore.EXCLUDE_FROM_INDEXES ]: true
},
values: [ 1, 2, 3 ]
}
} much like how I still think a factory method is more explicit, and cleaner, but symbol might be a bit easier to integrate with the current setup. |
I'm intrigued! And thank you for your offer to help. @callmehiphop what do you think about using a Symbol the way @bedeoverend demo'd above? |
Using the symbol is clever and makes sense, I think. I already have to use datastore.KEY symbol to get the actual key from a result. |
On this point (and this might be diverging from the original issue), could you now just use the e.g. datastore.insert({
[datastore.KEY]: datastore.key('Stuff'),
foo: 'bar'
}); Therefore flattening it out. I guess that'd be tricky for backwards compatibility? Technically you could do it: i.e. if (entity[datastore.KEY]) {
// Using new syntax
} else {
// Using old syntax
} But it's a bit magical... |
Using a symbol seems fine to me. I prefer the more verbose syntax first proposed over the |
We can't write string properties over 1500 chars. Is this expected? Is there some way to explicitly mark a property as 'text' instead of 'string'? |
The question is: how to write >1500 chars as a "text"? |
Could I ask what the status of this issue is? I have some entities that have embedded entities that have properties that are >1500 Bytes. When I try setting |
Just piping in that I'd opened an issue for this internally several days ago, but I'd routed it to the wrong place. Corrected, and looks like @stephenplusplus is aware. |
@stephenplusplus This issue may be related to a fixed issue in python: |
For me, it happens in the web console as well - is that related or am I missing something? |
I followed the following steps to exclude from index for indexed properties of an Embedded Entity type from datastore:
However when I look at the Entities overview page, each of the properties of the Embedded Entity are still indexed even after the update, is this the right behavior? From the docs for excluding indexes it mentions "The index entries for any existing entities with that property will continue to exist until the entities are updated or deleted." |
I just pumped in to this issue. We have data structure where under main entity there is embedded entity (or actually array of entities) that has a property that is longer than 1500 bytes. As we can't exclude this >1500b property from indexing, we can't store even the main entity. Is there any progress on this? Basically datastore is unusable for us at this point |
@jniemin I had the same issue and found 2 workarounds: One is utilizing the hierarchy supported in Datastore - one entity can be a child of another so if the embedded entity is complex - I suggest you take it out into a new entity where the parent of it is the original one. Another (sort of) workaround I used is just simplifying the embedded entity by taking some properties back to the parent itself - not ideal but in my case it made sense. |
Thanks for your patience, everyone. There are a couple of solutions we've talked about, but I'd like to get feedback on one more that might be the most simple. Can you see any "gotchas" using an "excludeFromIndexes" array on the top level to define where the unindexed properties are? (cc @bedeoverend) datastore.insert({
key: datastore.key('Person'),
excludeFromIndexes: [
'description',
'metadata.bio',
'subjects.name'
],
value: {
description: '...',
metadata: {
bio: '...'
},
subjects: [
{
name: '...',
credits: 2
}
]
}
}) This seems like it would also be more simple than the current "data"/verbose definition syntax for non-nested properties, i.e. // before
datastore.insert({
key: datastore.key('Person'),
data: [
{
name: 'description',
value: '...',
excludeFromIndexes: true
}
]
})
// after
datastore.insert({
key: datastore.key('Person'),
excludeFromIndexes: ['description'],
// `data` remains a plain JavaScript object:
data: {
description: '...'
}
}) |
Summoning @JayGoldberg for feedback also. We can implement this if there are no objections to @stephenplusplus's proposed solution. |
Small amendment to the above suggestion, since datastore.insert({
key: datastore.key('Person'),
excludeFromIndexes: [
'description',
'metadata.bio',
- 'subjects.name'
+ 'subjects[].name'
],
value: {
description: '...',
metadata: {
bio: '...'
},
subjects: [
{
name: '...',
credits: 2
}
]
}
}) |
Is that amendment necessary? It is unambiguous even in your previous construction. It is not clear to me how this information gets sent up to Datastore. If that is the correct Datastore format, then obviously it makes sense. If we are recursively going and adding a property, though, there's no reason we need the |
I think it follows standard embedded property notation, to differentiate between "property of an object" and "property of an object inside of an array". I've also seen this in API response documentation on the various Google API official docs sites. So no, not necessary, but should be more familiar. Willing to be downvoted on that assumption. |
PR sent in #2497. Please take a look! |
@stephenplusplus thanks for this! Looks good to me, I like the general concept. So two potential things:
{
data: [{
foo: '...'
}, {
bar: '...'
}]
} Should this solution also try address not indexing one particular entity's property? e.g. a syntax like: Other than that, I think it looks good 👍 |
I am pretty sure that trying to do
I do agree that some concept of |
Prior art, got it. I am convinced. |
Fair enough - I guess there could be the scenario of similar / same schemas, but wanting to exclude only one item from being indexed. But, I think that's a big edge case, and I think it would be a bad pattern regardless, so happy to ignore that.
Sounds good to me 👍 |
@lukesneeringer @stephenplusplus Thanks for the "fix", but can this please be updated in the documentation? Personally, I think it's weird that I have to exclude every single subproperty from indexing but the bigger issue is that the documentation currently says that "If you exclude this value from indexing, then all subproperties are also excluded from indexing". This is very misleading (at least it was for me) and it took quite a while to find this issue. EDIT: Another question that came up. How do you exclude properties that are not written in camel case?
|
Here's the issue for the catch-all feature request: #2510
This worked for me: datastore.save({
key: ...,
data: {
test: {
'This is a key': 'This is a string > 1500 bytes ...'
}
},
excludeFromIndexes: [
'test.This is a key'
]
}, ...) |
Seems that excluding a top level property doesn't exclude an embedded entities properties from being indexed - on top of that, I don't believe there's a way to exclude an embedded entities individual properties, that would solve my problem too.
Steps to reproduce
Just use this script (I'm running the datastore emulator)
I'd expect there to be no error - the embedded entity (
{ text: '...' }
) shouldn't be indexed I believe? But it throw an error saying:Error: The value of property "text" is longer than 1500 bytes.
Environment
The text was updated successfully, but these errors were encountered: