Project with some basic examples of the Flink Java DataStream API.
The recommended IDE for executing this project is IntelliJ.
In order to execute this project correctly in IntelliJ, the following option must be selected:
With this, the IDE will have into account the dependencies included in the maven profile with id add-dependencies-for-IDEA
,
included in the pom.xml.
- Generating Watermarks (official documentation)
- Builtin Watermark Generators (official documentation)
Both timestamps and watermarks are specified as milliseconds since the Java epoch of 1970-01-01T00:00:00Z.
Specifying a TimestampAssigner is optional and in most cases you don’t actually want to specify one.
For example, when using Kafka or Kinesis you would get timestamps directly from the Kafka/Kinesis records.
- Introducing Stream Windows in Apache Flink (blog post)
- Windows (official documentation)
- Streaming Analytics (official documentation)
ReduceFunction and AggregateFunction can be executed more efficiently (see State Size section) because Flink can incrementally aggregate the elements for each window as they arrive.
A ProcessWindowFunction gets an Iterable for all the elements contained in a window and additional meta information about the window to which the elements belong.
Therefore, a windowed transformation with a ProcessWindowFunction cannot be executed as efficiently as the other cases because Flink has to buffer all elements for a window internally before invoking the function.
WindowFunction is an older version of ProcessWindowFunction that provides less contextual information and does not have some advances features, such as per-window keyed state. This interface will be deprecated at some point.