-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TASK-7134 - Re-implement Aggregations Stats for all Catalog Browsers #100
Merged
Changes from all commits
Commits
Show all changes
53 commits
Select commit
Hold shift + click to select a range
18c46e5
datastore: add facet support in mongodb datastore, #TASK-7151, #TASK-…
jtarraga 5495d80
datastore: improve code, #TASK-7151, #TASK-7134
jtarraga 304603e
datastore: implement the MongoDB to FacetField converter, #TASK-7151,…
jtarraga b537e6f
datastore: fix MongoDB document to FacetField converter, #TASK-7151, …
jtarraga 17f83b2
datastore: change long to Long in FacetField, #TASK-7151, #TASK-7134
jtarraga 865b94a
datastore: set range format to field[start..end]:step, #TASK-7151, #T…
jtarraga 880f2c6
datastore: use JsonInclude.Include.NON_NULL, #TASK-7151, #TASK-7134
jtarraga 75dc002
datastore: fix pom.xml, #TASK-7151, #TASK-7134
jtarraga 25cbd91
datastore: restore FacetField to previous change, #TASK-7151, #TASK-7134
jtarraga 9f0d9b9
datastore: change count to Number, #TASK-7151, #TASK-7134
jtarraga 0f3a24d
test: add JUnit tests for facets, #TASK-7151, #TASK-7134
jtarraga f2b080c
mongodb: rename converter, use Long instead Number, #TASK-7151, #TASL…
jtarraga ea3906c
mongodb: support lists using accumulators, #TASK-7151, #TASK-7134
jtarraga e68c30e
mongodb: fix sonnar issues, #TASK-7151, #TASK-7134
jtarraga 84d1f92
mondodb: add 'sum' to aggregation operators enum
imedina 26c9628
mondodb: fix 'sum' aggregation operator
imedina 10a7f0c
mondodb: fix 'sum' aggregation operator
imedina 7943e1b
mondodb: fix 'sum' aggregation operator
imedina e8159f3
mondodb: fix check style
imedina 005c45e
datastore: fix the accumulator 'sum' in MongoDB facets, #TASK-7151, #…
jtarraga 31424d8
mongodb: aggregation test. To be reverted.
imedina 57f2138
mongodb: aggregation test 2. To be reverted.
imedina 3cca26f
mongodb: aggregation test 3. To be reverted.
imedina e177dd7
mongodb: aggregation test 4. To be reverted.
imedina b15ed9a
mongodb: revert all tests
imedina a7c86e0
mongodb: fix aggregation regex
imedina eb5b519
mongodb: aggregation style improvement
imedina a073e84
mongodb: fix aggregation regex
imedina 3f9386f
mongodb: fix aggregation
imedina 13b3e59
mongodb: fix aggregation parse
imedina dd39812
datastore: implement the facet following the example:bioformat:sum(si…
jtarraga de98cda
Merge branch 'TASK-7134' of https://github.com/opencb/java-common-lib…
jtarraga ac66d66
datastore: fix facet 'format:count(size)' to behaviour as 'count(form…
jtarraga 421d5ce
datastore: improve MongoDB facets for arrays by using unwind, #TASK-7…
jtarraga 1184be3
datastore: fix MongoDB facet parser, #TASK-7151, #TASK-7134
jtarraga 7255b42
datastore: fix the converter by replacing '.' by '.' in the facet…
jtarraga 0d6f430
datastore: support facets for 'dates' in MongoDB, #TASK-7151, #TASK-7134
jtarraga 0dde11a
datastore: improve MongoDB facet exception message, #TASK-7151, #TASK…
jtarraga fc5dd92
datastore: fix checkstyle, #TASK-7151, #TASK-7134
jtarraga 8010403
datastore: improve facets for dates, #TASK-7151, #TASK-7134
jtarraga 9de45bb
datastore: rename the separator '_and_' to '_' in MongoDB facet resul…
jtarraga 6413bed
datastore: use '__' as separator, #TASK-7151, #TASK-7134
jtarraga 52ae7ee
datastore: fix sonnar issues, #TASK-7151, #TASK-7134
jtarraga 9243ebb
datastore: fix MongoDB facets when combining multiple fields, #TASK-7…
jtarraga d3666fc
datastore: add more JUnit tests for MongoDB facets, #TASK-7151, #TASK…
jtarraga dc4fc6d
datastore: sort dates facets, #TASK-7151, #TASK-7134
jtarraga 35a6e20
datastore: fix checkstyle, #TASK-7151, #TASK-7134
jtarraga 6373935
datastore: use date format '01 Jan 2025', #TASK-7151, #TASK-7134
jtarraga 7185175
Merge branch 'develop' into TASK-7134
jtarraga 5da23b3
datastore: sort facets results in descending order (counts), #TASK
jtarraga 6278507
datastore: improve MongoDB facets for range, #TASK-7151, #TASK-7134
jtarraga 7b69623
datastore: improve MongoDB facets for ranges by filling with zeros, #…
jtarraga e0b5d3f
datastore: change Long to long, and fix JUnit tests, #TASK-7151, #TAS…
jtarraga File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
228 changes: 228 additions & 0 deletions
228
...main/java/org/opencb/commons/datastore/mongodb/MongoDBDocumentToFacetFieldsConverter.java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,228 @@ | ||
package org.opencb.commons.datastore.mongodb; | ||
|
||
import org.apache.commons.lang3.StringUtils; | ||
import org.bson.Document; | ||
import org.opencb.commons.datastore.core.ComplexTypeConverter; | ||
import org.opencb.commons.datastore.core.FacetField; | ||
|
||
import java.math.BigDecimal; | ||
import java.math.RoundingMode; | ||
import java.util.*; | ||
|
||
import static org.opencb.commons.datastore.mongodb.GenericDocumentComplexConverter.TO_REPLACE_DOTS; | ||
import static org.opencb.commons.datastore.mongodb.MongoDBQueryUtils.Accumulator.*; | ||
import static org.opencb.commons.datastore.mongodb.MongoDBQueryUtils.*; | ||
|
||
public class MongoDBDocumentToFacetFieldsConverter implements ComplexTypeConverter<List<FacetField>, Document> { | ||
|
||
private static final Map<String, String> MONTH_MAP = new HashMap<>(); | ||
|
||
static { | ||
MONTH_MAP.put("01", "Jan"); | ||
MONTH_MAP.put("02", "Feb"); | ||
MONTH_MAP.put("03", "Mar"); | ||
MONTH_MAP.put("04", "Apr"); | ||
MONTH_MAP.put("05", "May"); | ||
MONTH_MAP.put("06", "Jun"); | ||
MONTH_MAP.put("07", "Jul"); | ||
MONTH_MAP.put("08", "Aug"); | ||
MONTH_MAP.put("09", "Sep"); | ||
MONTH_MAP.put("10", "Oct"); | ||
MONTH_MAP.put("11", "Nov"); | ||
MONTH_MAP.put("12", "Dec"); | ||
} | ||
|
||
@Override | ||
public List<FacetField> convertToDataModelType(Document document) { | ||
if (document == null || document.entrySet().size() == 0) { | ||
return Collections.emptyList(); | ||
} | ||
|
||
String facetFieldName; | ||
List<FacetField> facets = new ArrayList<>(); | ||
for (Map.Entry<String, Object> entry : document.entrySet()) { | ||
String key = entry.getKey(); | ||
List<Document> documentValues = (List<Document>) entry.getValue(); | ||
if (key.endsWith(COUNTS_SUFFIX) || key.endsWith(FACET_ACC_SUFFIX) || key.endsWith(YEAR_SUFFIX) || key.endsWith(MONTH_SUFFIX) | ||
|| key.endsWith(DAY_SUFFIX)) { | ||
facetFieldName = key.split(SEPARATOR)[0].replace(TO_REPLACE_DOTS, "."); | ||
|
||
List<FacetField.Bucket> buckets = new ArrayList<>(documentValues.size()); | ||
long total = 0; | ||
for (Document documentValue : documentValues) { | ||
|
||
long counter = documentValue.getInteger(count.name()); | ||
String bucketValue = ""; | ||
Object internalIdValue = documentValue.get(INTERNAL_ID); | ||
if (internalIdValue instanceof String) { | ||
bucketValue = (String) internalIdValue; | ||
} else if (internalIdValue instanceof Boolean | ||
|| internalIdValue instanceof Integer | ||
|| internalIdValue instanceof Long | ||
|| internalIdValue instanceof Double) { | ||
bucketValue = internalIdValue.toString(); | ||
} else if (internalIdValue instanceof Document) { | ||
bucketValue = StringUtils.join(((Document) internalIdValue).values(), SEPARATOR); | ||
if (key.endsWith(COUNTS_SUFFIX)) { | ||
facetFieldName = key.substring(0, key.indexOf(COUNTS_SUFFIX)); | ||
} | ||
} | ||
|
||
List<FacetField> bucketFacetFields = null; | ||
if (key.endsWith(FACET_ACC_SUFFIX)) { | ||
String[] split = key.split(SEPARATOR); | ||
String name = split[2]; | ||
String aggregationName = split[1]; | ||
Double value; | ||
if (documentValue.get(aggregationName) instanceof Integer) { | ||
value = 1.0d * documentValue.getInteger(aggregationName); | ||
} else if (documentValue.get(aggregationName) instanceof Long) { | ||
value = 1.0d * documentValue.getLong(aggregationName); | ||
} else { | ||
value = documentValue.getDouble(aggregationName); | ||
} | ||
List<Double> aggregationValues = Collections.singletonList(value); | ||
FacetField facetField = new FacetField(name.replace(TO_REPLACE_DOTS, "."), aggregationName, aggregationValues); | ||
// Perhaps it’s redundant, as it is also set in the bucket | ||
facetField.setCount(counter); | ||
bucketFacetFields = Collections.singletonList(facetField); | ||
} | ||
|
||
buckets.add(new FacetField.Bucket(bucketValue, counter, bucketFacetFields)); | ||
total += counter; | ||
} | ||
FacetField facetField = new FacetField(facetFieldName, total, buckets); | ||
facetField.setAggregationName(count.name()); | ||
if (key.endsWith(YEAR_SUFFIX) || key.endsWith(MONTH_SUFFIX) || key.endsWith(DAY_SUFFIX)) { | ||
Collections.sort(buckets, Comparator.comparing(FacetField.Bucket::getValue)); | ||
if (key.endsWith(MONTH_SUFFIX)) { | ||
for (FacetField.Bucket b : buckets) { | ||
String[] split = b.getValue().split(SEPARATOR); | ||
b.setValue(MONTH_MAP.get(split[1]) + " " + split[0]); | ||
} | ||
} else if (key.endsWith(DAY_SUFFIX)) { | ||
for (FacetField.Bucket b : buckets) { | ||
String[] split = b.getValue().split(SEPARATOR); | ||
b.setValue(split[2] + " " + MONTH_MAP.get(split[1]) + " " + split[0]); | ||
} | ||
} | ||
// Remove the data field and keep year, month and day | ||
List<String> labels = new ArrayList<>(Arrays.asList(key.split(SEPARATOR))); | ||
labels.remove(0); | ||
facetField.setAggregationName(StringUtils.join(labels, SEPARATOR).toLowerCase(Locale.ROOT)); | ||
} | ||
facets.add(facetField); | ||
} else if (key.endsWith(RANGES_SUFFIX)) { | ||
List<FacetField.Bucket> buckets = new ArrayList<>(documentValues.size()); | ||
int total = 0; | ||
|
||
String[] split = key.split(SEPARATOR); | ||
double start = Double.parseDouble(split[1].replace(TO_REPLACE_DOTS, ".")); | ||
double end = Double.parseDouble(split[2].replace(TO_REPLACE_DOTS, ".")); | ||
double step = Double.parseDouble(split[3].replace(TO_REPLACE_DOTS, ".")); | ||
|
||
int other = 0; | ||
for (double i = start; i <= end; i += step) { | ||
int bucketCount = getBucketCountFromRanges(i, documentValues); | ||
FacetField.Bucket bucket = new FacetField.Bucket(String.valueOf(roundToTwoSignificantDecimals(i)), bucketCount, null); | ||
buckets.add(bucket); | ||
total += bucketCount; | ||
} | ||
|
||
for (Document value : documentValues) { | ||
if (value.get(INTERNAL_ID) instanceof String && OTHER.equals(value.getString(INTERNAL_ID))) { | ||
other = value.getInteger(count.name()); | ||
} | ||
} | ||
facetFieldName = key.split(SEPARATOR)[0].replace(TO_REPLACE_DOTS, "."); | ||
if (other > 0) { | ||
FacetField.Bucket bucket = new FacetField.Bucket("Other", other, null); | ||
buckets.add(bucket); | ||
total += bucket.getCount(); | ||
} | ||
FacetField facetField = new FacetField(facetFieldName, total, buckets) | ||
.setStart(start) | ||
.setEnd(end) | ||
.setStep(step); | ||
facets.add(facetField); | ||
} else { | ||
Document documentValue = ((List<Document>) entry.getValue()).get(0); | ||
MongoDBQueryUtils.Accumulator accumulator = getAccumulator(documentValue); | ||
switch (accumulator) { | ||
case sum: | ||
case avg: | ||
case max: | ||
case min: | ||
case stdDevPop: | ||
case stdDevSamp: { | ||
List<Double> fieldValues = new ArrayList<>(); | ||
if (documentValue.get(accumulator.name()) instanceof Integer) { | ||
fieldValues.add(1.0d * documentValue.getInteger(accumulator.name())); | ||
} else if (documentValue.get(accumulator.name()) instanceof Long) { | ||
fieldValues.add(1.0d * documentValue.getLong(accumulator.name())); | ||
} else if (documentValue.get(accumulator.name()) instanceof List) { | ||
List<Number> list = (List<Number>) documentValue.get(accumulator.name()); | ||
for (Number number : list) { | ||
fieldValues.add(number.doubleValue()); | ||
} | ||
} else { | ||
fieldValues.add(documentValue.getDouble(accumulator.name())); | ||
} | ||
long count = 0; | ||
if (documentValue.containsKey("count")) { | ||
count = Long.valueOf(documentValue.getInteger("count")); | ||
} | ||
facetFieldName = documentValue.getString(INTERNAL_ID).replace(TO_REPLACE_DOTS, "."); | ||
facets.add(new FacetField(facetFieldName, count, accumulator.name(), fieldValues)); | ||
break; | ||
} | ||
default: { | ||
// Do nothing, exception is raised | ||
} | ||
} | ||
} | ||
} | ||
return facets; | ||
} | ||
|
||
private MongoDBQueryUtils.Accumulator getAccumulator(Document document) { | ||
for (Map.Entry<String, Object> entry : document.entrySet()) { | ||
try { | ||
MongoDBQueryUtils.Accumulator accumulator = MongoDBQueryUtils.Accumulator.valueOf(entry.getKey()); | ||
return accumulator; | ||
} catch (IllegalArgumentException e) { | ||
// Do nothing | ||
} | ||
} | ||
throw new IllegalArgumentException("No accumulators found in facet document: " + StringUtils.join(document.keySet(), ", ") | ||
+ "Valid accumulator functions: " + StringUtils.join(Arrays.asList(count, sum, max, min, avg, stdDevPop, stdDevSamp), ",")); | ||
} | ||
|
||
@Override | ||
public Document convertToStorageType(List<FacetField> facetFields) { | ||
throw new RuntimeException("Not yet implemented"); | ||
} | ||
|
||
private static double roundToTwoSignificantDecimals(double value) { | ||
if (value == 0) { | ||
return 0; | ||
} | ||
|
||
BigDecimal bd = new BigDecimal(value); | ||
int integerDigits = bd.precision() - bd.scale(); | ||
int scale = Math.max(0, 2 + integerDigits); | ||
return bd.setScale(scale, RoundingMode.HALF_UP).doubleValue(); | ||
} | ||
|
||
|
||
private int getBucketCountFromRanges(double inputRange, List<Document> documentValues) { | ||
for (Document document : documentValues) { | ||
if (!OTHER.equals(document.get(INTERNAL_ID))) { | ||
if (inputRange == document.getDouble(INTERNAL_ID)) { | ||
return document.getInteger(count.name()); | ||
} | ||
} | ||
} | ||
return 0; | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this doesn't need to be merged