Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate and add _id field in Doc if not passed #32

Merged
merged 8 commits into from
Jan 17, 2023
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,12 @@
/**
* Shred an incoming JSON document into the data we need to store in the DB, and then de-shred.
*
* <p>This will be based on the ideas in the python lab, and extended to do things like make better
* decisions about when to use a hash and when to use the actual value. i.e. a hash of "a" is a lot
* longer than "a".
* <p>Implementation is based on the ideas from the earlier prototype, and extended to do things
* like make better decisions about when to use a hash and when to use the actual value. i.e. a hash
* of "a" is a lot longer than "a".
*
* <p>Note that currently document id ({@code _id}) is auto-generated using UUID random method if
* incoming JSON does not contain it (otherwise passed-in {@code _id} is used as-is).
*/
@ApplicationScoped
public class Shredder {
Expand All @@ -44,34 +47,40 @@ public WritableShreddedDocument shred(JsonNode doc, Optional<UUID> txId) {
ErrorCode.SHRED_BAD_DOCUMENT_TYPE.getMessage(), doc.getNodeType()));
}

ObjectNode docOb = (ObjectNode) doc;

// We will extract id if there is one; stored separately, removed so that we won't process
// it, add path or such (since it has specific meaning separate from general properties)
JsonNode idNode = docOb.remove("_id");
String id = null;
// We will extract id if there is one; stored separately, but also included in JSON document
// before storing in persistence. Need to make copy to avoid modifying input doc
ObjectNode docWithoutId = ((ObjectNode) doc).objectNode().setAll((ObjectNode) doc);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to remove id and add it back again to new document? Can just do docOb.get("_id") and later do docOb.put("_id", id)? So don't need to create new ObjectNode (docWithoutId and docWithId)

Copy link
Contributor Author

@tatu-at-datastax tatu-at-datastax Jan 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. Forgot to add a note on that: while not mandatory, was thinking that we want to order things so that _id is always the first property. Just a preference, but I think this is what happens with Mongo.

But there is also the additional complexity in traversal: code later on assumes that _id is not passed so there won't be index fields for it. So if it was not removed, more checks would be needed not to expose _id via callbacks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added note: underlying ObjectNode uses LinkedHashMap to preserve ordering. While this is bit of an implementation detail it is very unlikely to change for Jackson (esp. 2.x) since changing that would be major compatibility change in practice.

JsonNode idNode = docWithoutId.remove("_id");
String id;

// Cannot require existence as id often auto-generated when inserting: caller has to verify
// if existence required
if (idNode != null) {
if (!idNode.isTextual()) {
// We will use `_id`, if passed (but must be JSON String or Number); if not passed,
// need to generate
if (idNode == null) {
id = UUID.randomUUID().toString();
} else {
if (!idNode.isTextual() && !idNode.isNumber()) {
throw new DocsException(
ErrorCode.SHRED_BAD_DOCID_TYPE,
String.format(
"%s: Document Id must be a JSON String, instead got %s",
"%s: Document Id must be a JSON String or JSON Number, instead got %s",
ErrorCode.SHRED_BAD_DOCID_TYPE.getMessage(), idNode.getNodeType()));
}
id = idNode.asText();
}

// We will re-serialize document; gets rid of pretty-printing (if any);
// unifies escaping.
// NOTE! For now we will remove `_id` if one was included; we may alternatively
// want to keep-or-insert it, based on what we really want.
final String docJson = docOb.toString();
// NOTE! Since we removed "_id" if it existed (and generated if it didn't),
// need to add back. Moreover we want it as the FIRST field anyway so need
// to reconstruct document.
ObjectNode docWithId = docWithoutId.objectNode(); // simple constructor, not linked
docWithId.put("_id", id);
docWithId.setAll(docWithoutId);

final String docJson = docWithId.toString();
final WritableShreddedDocument.Builder b =
WritableShreddedDocument.builder(new DocValueHasher(), id, txId, docJson);
traverse(doc, b, JsonPath.rootBuilder());
traverse(docWithoutId, b, JsonPath.rootBuilder());
return b.build();
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -133,7 +133,7 @@ public void findOneById() {
}
}
""";
String expected = "{\"username\": \"user1\"}";
String expected = "{\"_id\":\"doc1\", \"username\":\"user1\"}";
given()
.header(HttpConstants.AUTHENTICATION_TOKEN_HEADER_NAME, getAuthToken())
.contentType(ContentType.JSON)
Expand All @@ -155,7 +155,7 @@ public void findOneByColumn() {
}
}
""";
String expected = "{\"username\": \"user1\"}";
String expected = "{\"_id\":\"doc1\", \"username\":\"user1\"}";
given()
.header(HttpConstants.AUTHENTICATION_TOKEN_HEADER_NAME, getAuthToken())
.contentType(ContentType.JSON)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;
import java.util.UUID;
import javax.inject.Inject;
import org.junit.jupiter.api.Nested;
import org.junit.jupiter.api.Test;
Expand All @@ -32,7 +33,7 @@ public class ShredderTest {
@Nested
class OkCases {
@Test
public void simpleShredFromPathExample() throws Exception {
public void shredSimpleWithId() throws Exception {
final String inputJson =
"""
{ "_id" : "abc",
Expand Down Expand Up @@ -64,11 +65,9 @@ public void simpleShredFromPathExample() throws Exception {
assertThat(doc.arrayEquals()).hasSize(1);
assertThat(doc.arrayContains()).hasSize(2);

// Also, the document should be the same, except for removed _id:
// Also, the document should be the same, including _id:
JsonNode jsonFromShredded = objectMapper.readTree(doc.docJson());
ObjectNode origJsonMinusId = (ObjectNode) inputDoc.deepCopy();
origJsonMinusId.remove("_id");
assertThat(jsonFromShredded).isEqualTo(origJsonMinusId);
assertThat(jsonFromShredded).isEqualTo(inputDoc);

// Sub-documents (Object values): none in this example
assertThat(doc.subDocEquals()).hasSize(0);
Expand All @@ -84,6 +83,44 @@ public void simpleShredFromPathExample() throws Exception {
.isEqualTo(Collections.singletonMap(JsonPath.from("name"), "Bob"));
assertThat(doc.queryNullValues()).isEqualTo(Collections.singleton(JsonPath.from("nullable")));
}

@Test
public void shredSimpleWithoutId() throws Exception {
final String inputJson =
"""
{
"age" : 39,
"name" : "Chuck"
}
""";
final JsonNode inputDoc = objectMapper.readTree(inputJson);
WritableShreddedDocument doc = shredder.shred(inputDoc);

assertThat(doc.id()).isNotEmpty();
// should be auto-generated UUID:
assertThat(UUID.fromString(doc.id())).isNotNull();
List<JsonPath> expPaths = Arrays.asList(JsonPath.from("age"), JsonPath.from("name"));

assertThat(doc.existKeys()).isEqualTo(new HashSet<>(expPaths));
assertThat(doc.arraySize()).isEmpty();
assertThat(doc.arrayEquals()).isEmpty();
assertThat(doc.arrayContains()).isEmpty();
assertThat(doc.subDocEquals()).hasSize(0);

// Also, the document should be the same, except for added _id:
ObjectNode jsonFromShredded = (ObjectNode) objectMapper.readTree(doc.docJson());
JsonNode idNode = jsonFromShredded.remove("_id");
assertThat(idNode).isNotNull();
assertThat(idNode.textValue()).isEqualTo(doc.id());
assertThat(jsonFromShredded).isEqualTo(inputDoc);

// Then atomic value containers
assertThat(doc.queryBoolValues()).isEmpty();
assertThat(doc.queryNullValues()).isEmpty();
assertThat(doc.queryNumberValues())
.isEqualTo(Map.of(JsonPath.from("age"), BigDecimal.valueOf(39)));
assertThat(doc.queryTextValues()).isEqualTo(Map.of(JsonPath.from("name"), "Chuck"));
}
}

@Nested
Expand All @@ -108,7 +145,7 @@ public void docBadDocIdType() {
assertThat(t)
.isNotNull()
.hasMessage(
"Bad type for '_id' property: Document Id must be a JSON String, instead got ARRAY")
"Bad type for '_id' property: Document Id must be a JSON String or JSON Number, instead got ARRAY")
.hasFieldOrPropertyWithValue("errorCode", ErrorCode.SHRED_BAD_DOCID_TYPE);
}
}
Expand Down