-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor/rename metrics related classes for NaN support #1829
Conversation
@@ -115,12 +120,30 @@ public short getShort(int ordinal) { | |||
|
|||
@Override | |||
public int getInt(int ordinal) { | |||
return struct.get(ordinal, Integer.class); | |||
Object integer = struct.get(ordinal, Object.class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason for changing this class is discussed in this thread
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Yan for the patch.
} | ||
|
||
public static MetricsModes.MetricsMode getMetricsMode(Schema inputSchema, MetricsConfig metricsConfig, int fieldId) { | ||
String columnName = inputSchema.findColumnName(fieldId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inputschema can be null at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add a validation check to ensure they are not null
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. Let's add checks to fail when metricsConfig
or inputSchema
is null. This is a useful method so I think it is worth keeping it public, but we would ideally not fail with a NullPointerException
. This is not called in a tight loop, so it should be fine to add the checks on each call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, sorry I overwrote my own earlier change that addressed this when I renamed this file...
|
||
public static MetricsModes.MetricsMode getMetricsMode(Schema inputSchema, MetricsConfig metricsConfig, int fieldId) { | ||
String columnName = inputSchema.findColumnName(fieldId); | ||
return metricsConfig.columnMode(columnName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same for metrics config.
private MetricsUtil() { | ||
} | ||
|
||
public static Map<Integer, Long> getNanValueCounts( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: add javadoc.
* exceptions when they are accessed. | ||
*/ | ||
public class ParquetFieldMetrics extends FieldMetrics { | ||
public class NaNFieldMetrics extends FieldMetrics { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we expect to add lower and upper bounds to this class (since Parquet and ORC do not ignore NaN values), I think this should probably be called FloatFieldMetrics
. That way we don't need to rename when this adds support for lower and upper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good; I'll keep the javadoc as is since they currently reflect reality, and will update comments to mention tracking float fields when the class starts to track upper/lower bound
} | ||
|
||
@Override | ||
public ByteBuffer upperBound() { | ||
throw new IllegalStateException( | ||
"Shouldn't access upperBound() within ParquetFieldMetrics, as this metric is tracked by Parquet footer. "); | ||
"Shouldn't access upperBound() within NaNOnlyFieldMetrics, as this metric is tracked in file statistics. "); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This demonstrates the drawback to adding too much context to an exception message. The method and class should be in the stack trace and renames require updating this message. Let's keep the good error message, but remove the class and method names.
b7d88f2
to
4bb427f
Compare
c0c758d
to
37a8681
Compare
Thanks for updating this, @yyanyy! I'll merge it. |
StructInternalRow