-
Notifications
You must be signed in to change notification settings - Fork 708
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add partitioned sources for Parquet thrift / scrooge #1590
Changes from 2 commits
e5dcc2d
057eb56
8863cd8
4140d57
a2d3468
5488d10
363fed0
b02c281
09a8b75
82d1387
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -29,16 +29,19 @@ import com.twitter.scalding.source.{ DailySuffixSource, HourlySuffixSource } | |
import java.io.Serializable | ||
import org.apache.thrift.{ TBase, TFieldIdEnum } | ||
|
||
import scala.reflect.ClassTag | ||
|
||
object ParquetThrift extends Serializable { | ||
type ThriftBase = TBase[_ <: TBase[_, _], _ <: TFieldIdEnum] | ||
} | ||
|
||
trait ParquetThriftBase[T] extends LocalTapSource with HasFilterPredicate with HasColumnProjection { | ||
|
||
def mf: Manifest[T] | ||
implicit def ct: ClassTag[T] | ||
|
||
def config: ParquetValueScheme.Config[T] = { | ||
val config = new ParquetValueScheme.Config[T].withRecordClass(mf.runtimeClass.asInstanceOf[Class[T]]) | ||
def config(implicit ct: ClassTag[T]): ParquetValueScheme.Config[T] = { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why are we accepting There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Shall remove this, added this on one of my prior passes over the code when I was seeing some compile errors. Not needed anymore. |
||
val clazz = implicitly[ClassTag[T]].runtimeClass.asInstanceOf[Class[T]] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can't we just do There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm yeah, that's cleaner. |
||
val config = new ParquetValueScheme.Config[T].withRecordClass(clazz) | ||
val configWithFp = withFilter match { | ||
case Some(fp) => config.withFilterPredicate(fp) | ||
case None => config | ||
|
@@ -110,13 +113,13 @@ trait ParquetThrift[T <: ParquetThrift.ThriftBase] extends ParquetThriftBaseFile | |
*/ | ||
class DailySuffixParquetThrift[T <: ParquetThrift.ThriftBase]( | ||
path: String, | ||
dateRange: DateRange)(implicit override val mf: Manifest[T]) | ||
dateRange: DateRange)(implicit override val ct: ClassTag[T]) | ||
extends DailySuffixSource(path, dateRange) with ParquetThrift[T] | ||
|
||
class HourlySuffixParquetThrift[T <: ParquetThrift.ThriftBase]( | ||
path: String, | ||
dateRange: DateRange)(implicit override val mf: Manifest[T]) | ||
dateRange: DateRange)(implicit override val ct: ClassTag[T]) | ||
extends HourlySuffixSource(path, dateRange) with ParquetThrift[T] | ||
|
||
class FixedPathParquetThrift[T <: ParquetThrift.ThriftBase](paths: String*)(implicit override val mf: Manifest[T]) | ||
class FixedPathParquetThrift[T <: ParquetThrift.ThriftBase](paths: String*)(implicit override val ct: ClassTag[T]) | ||
extends FixedPathSource(paths: _*) with ParquetThrift[T] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for reads, is it required that P be a String or a TupleN of Strings? Seems like it'd have to be?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can also be a string (if its just "%s"). If you see the unit tests, there P ends up being a String.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wonder if we should clarify that in the docs? Could add an example for the %s case cause it isn't a tuple?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so P must be either: String or tuple of Strings right? That'd be good to clarify.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I'll update the docs to indicate this and also include an example
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible, maybe a geek out, to have a macro that checks the format string at compile time?
this would allow it to be really clear that the string matches the tuple type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can look into this. Not super familiar with Scala macros so this can be a learning opportunity :-). Currently we do check this (at runtime though) - in TemplatePartition
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if you do something like:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, it would be nice to just call the code cascading uses to verify it at compile time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I'll dig into this