-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-7056][Streaming] Make the Write Ahead Log pluggable #5645
Changes from 14 commits
7dd2d4b
09bc6fe
754fbf8
837c4f5
bce5e75
84ce469
86abcb1
9310cbf
e0d19fb
1a32a4b
d7cd15b
b65e155
bde26b1
569a416
c2bc738
2c431fd
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,60 @@ | ||
/* | ||
* Licensed to the Apache Software Foundation (ASF) under one or more | ||
* contributor license agreements. See the NOTICE file distributed with | ||
* this work for additional information regarding copyright ownership. | ||
* The ASF licenses this file to You under the Apache License, Version 2.0 | ||
* (the "License"); you may not use this file except in compliance with | ||
* the License. You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
|
||
package org.apache.spark.streaming.util; | ||
|
||
import java.nio.ByteBuffer; | ||
import java.util.Iterator; | ||
|
||
/** | ||
* This abstract class represents a write ahead log (aka journal) that is used by Spark Streaming | ||
* to save the received data (by receivers) and associated metadata to a reliable storage, so that | ||
* they can be recovered after driver failures. See the Spark documentation for more information | ||
* on how to plug in your own custom implementation of a write ahead log. | ||
*/ | ||
@org.apache.spark.annotation.DeveloperApi | ||
public abstract class WriteAheadLog { | ||
/** | ||
* Write the record to the log and return a record handle, which contains all the information | ||
* necessary to read back the written record. The time is used to the index the record, | ||
* such that it can be cleaned later. Note that implementations of this abstract class must | ||
* ensure that the written data is durable and readable (using the record handle) by the | ||
* time this function returns. | ||
*/ | ||
abstract public WriteAheadLogRecordHandle write(ByteBuffer record, long time); | ||
|
||
/** | ||
* Read a written record based on the given record handle. | ||
*/ | ||
abstract public ByteBuffer read(WriteAheadLogRecordHandle handle); | ||
|
||
/** | ||
* Read and return an iterator of all the records that have been written but not yet cleaned up. | ||
*/ | ||
abstract public Iterator<ByteBuffer> readAll(); | ||
|
||
/** | ||
* Clean all the records that are older than the threshold time. It can wait for | ||
* the completion of the deletion. | ||
*/ | ||
abstract public void clean(long threshTime, boolean waitForCompletion); | ||
|
||
/** | ||
* Close this log and release any resources. | ||
*/ | ||
abstract public void close(); | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
/* | ||
* Licensed to the Apache Software Foundation (ASF) under one or more | ||
* contributor license agreements. See the NOTICE file distributed with | ||
* this work for additional information regarding copyright ownership. | ||
* The ASF licenses this file to You under the Apache License, Version 2.0 | ||
* (the "License"); you may not use this file except in compliance with | ||
* the License. You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
|
||
package org.apache.spark.streaming.util; | ||
|
||
/** | ||
* This abstract class represents a handle that refers to a record written in a | ||
* {@link org.apache.spark.streaming.util.WriteAheadLog WriteAheadLog}. | ||
* It must contain all the information necessary for the record to be read and returned by | ||
* an implemenation of the WriteAheadLog class. | ||
* | ||
* @see org.apache.spark.streaming.util.WriteAheadLog | ||
*/ | ||
@org.apache.spark.annotation.DeveloperApi | ||
public abstract class WriteAheadLogRecordHandle implements java.io.Serializable { | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -16,57 +16,61 @@ | |
*/ | ||
package org.apache.spark.streaming.rdd | ||
|
||
import java.nio.ByteBuffer | ||
|
||
import scala.reflect.ClassTag | ||
import scala.util.control.NonFatal | ||
|
||
import org.apache.hadoop.conf.Configuration | ||
import org.apache.commons.io.FileUtils | ||
|
||
import org.apache.spark._ | ||
import org.apache.spark.rdd.BlockRDD | ||
import org.apache.spark.storage.{BlockId, StorageLevel} | ||
import org.apache.spark.streaming.util.{HdfsUtils, WriteAheadLogFileSegment, WriteAheadLogRandomReader} | ||
import org.apache.spark.streaming.util._ | ||
|
||
/** | ||
* Partition class for [[org.apache.spark.streaming.rdd.WriteAheadLogBackedBlockRDD]]. | ||
* It contains information about the id of the blocks having this partition's data and | ||
* the segment of the write ahead log that backs the partition. | ||
* @param index index of the partition | ||
* @param blockId id of the block having the partition data | ||
* @param segment segment of the write ahead log having the partition data | ||
* @param walRecordHandle Handle of the record in a write ahead log having the partition data | ||
*/ | ||
private[streaming] | ||
class WriteAheadLogBackedBlockRDDPartition( | ||
val index: Int, | ||
val blockId: BlockId, | ||
val segment: WriteAheadLogFileSegment) | ||
val walRecordHandle: WriteAheadLogRecordHandle) | ||
extends Partition | ||
|
||
|
||
/** | ||
* This class represents a special case of the BlockRDD where the data blocks in | ||
* the block manager are also backed by segments in write ahead logs. For reading | ||
* the block manager are also backed by data in write ahead logs. For reading | ||
* the data, this RDD first looks up the blocks by their ids in the block manager. | ||
* If it does not find them, it looks up the corresponding file segment. | ||
* If it does not find them, it looks up the corresponding data in the write ahead log. | ||
* | ||
* @param sc SparkContext | ||
* @param blockIds Ids of the blocks that contains this RDD's data | ||
* @param segments Segments in write ahead logs that contain this RDD's data | ||
* @param storeInBlockManager Whether to store in the block manager after reading from the segment | ||
* @param walRecordHandles Record handles in write ahead logs that contain this RDD's data | ||
* @param storeInBlockManager Whether to store in the block manager after reading | ||
* from the WAL record | ||
* @param storageLevel storage level to store when storing in block manager | ||
* (applicable when storeInBlockManager = true) | ||
*/ | ||
private[streaming] | ||
class WriteAheadLogBackedBlockRDD[T: ClassTag]( | ||
@transient sc: SparkContext, | ||
@transient blockIds: Array[BlockId], | ||
@transient segments: Array[WriteAheadLogFileSegment], | ||
@transient walRecordHandles: Array[WriteAheadLogRecordHandle], | ||
storeInBlockManager: Boolean, | ||
storageLevel: StorageLevel) | ||
extends BlockRDD[T](sc, blockIds) { | ||
|
||
require( | ||
blockIds.length == segments.length, | ||
blockIds.length == walRecordHandles.length, | ||
s"Number of block ids (${blockIds.length}) must be " + | ||
s"the same as number of segments (${segments.length}})!") | ||
s"the same as number of WAL record handles (${walRecordHandles.length}})!") | ||
|
||
// Hadoop configuration is not serializable, so broadcast it as a serializable. | ||
@transient private val hadoopConfig = sc.hadoopConfiguration | ||
|
@@ -75,13 +79,13 @@ class WriteAheadLogBackedBlockRDD[T: ClassTag]( | |
override def getPartitions: Array[Partition] = { | ||
assertValid() | ||
Array.tabulate(blockIds.size) { i => | ||
new WriteAheadLogBackedBlockRDDPartition(i, blockIds(i), segments(i)) | ||
new WriteAheadLogBackedBlockRDDPartition(i, blockIds(i), walRecordHandles(i)) | ||
} | ||
} | ||
|
||
/** | ||
* Gets the partition data by getting the corresponding block from the block manager. | ||
* If the block does not exist, then the data is read from the corresponding segment | ||
* If the block does not exist, then the data is read from the corresponding record | ||
* in write ahead log files. | ||
*/ | ||
override def compute(split: Partition, context: TaskContext): Iterator[T] = { | ||
|
@@ -96,10 +100,29 @@ class WriteAheadLogBackedBlockRDD[T: ClassTag]( | |
logDebug(s"Read partition data of $this from block manager, block $blockId") | ||
iterator | ||
case None => // Data not found in Block Manager, grab it from write ahead log file | ||
val reader = new WriteAheadLogRandomReader(partition.segment.path, hadoopConf) | ||
val dataRead = reader.read(partition.segment) | ||
reader.close() | ||
logInfo(s"Read partition data of $this from write ahead log, segment ${partition.segment}") | ||
var dataRead: ByteBuffer = null | ||
var writeAheadLog: WriteAheadLog = null | ||
try { | ||
val dummyDirectory = FileUtils.getTempDirectoryPath() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why here need to use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So the default WAL is file based so a log directory is needed for it to work. However, the log directory is really not needed reading a particular record. But to read a single record you have to create a FileBasedWriteAheadLog object, which needs a log directory. Hence I am providing a dummy directory for this. I know that this is a little awkward. This is the cost of defining a single interface for both writing and reading single records. Earlier there were two independent classes (WALWriter and WALRandomReader) that was used for these two purposes, which has different requirements. But since I am trying make single interface that can be used for all reading and writing, the log directory must be provided in the constructor of the default file-based WAL. This results in the awkwardness. I dont quite like it myself, but it may practically be okay as long as we ensure that the FileBasedWAL does not create unnecessary directories/files when only reading a single record. I can add a test to ensure that. |
||
writeAheadLog = WriteAheadLogUtils.createLogForReceiver( | ||
SparkEnv.get.conf, dummyDirectory, hadoopConf) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also IIUC here if the journal system if not hadoop based, hadoopConf may not be available. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. hadoopConf is always available through the SparkContext. Irrespective of whether Hadoop file system is used, a Hadoop conf is created by the SparkContext which is passed on to this location. If the WAL is not the default FileBasedWAL, then this parameter is just ignored (see the method There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What I'm thinking is that do we need to have this parameter for the interface, can we hide this into file-based WAL implementations. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The log directory needs to be passed through the
IMO that duplicates code everywhere and uglier that this dummy dir approach. And also, this does not handle |
||
dataRead = writeAheadLog.read(partition.walRecordHandle) | ||
} catch { | ||
case NonFatal(e) => | ||
throw new SparkException( | ||
s"Could not read data from write ahead log record ${partition.walRecordHandle}", e) | ||
} finally { | ||
if (writeAheadLog != null) { | ||
writeAheadLog.close() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. May be reset writeAheadLog to null after close to avoid unexpected behavior :).
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. |
||
} | ||
} | ||
if (dataRead == null) { | ||
throw new SparkException( | ||
s"Could not read data from write ahead log record ${partition.walRecordHandle}, " + | ||
s"read returned null") | ||
} | ||
logInfo(s"Read partition data of $this from write ahead log, record handle " + | ||
partition.walRecordHandle) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would be better to use string interpolation here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since it has to be on the next line, using s"$..." is strictly more number of chars :) |
||
if (storeInBlockManager) { | ||
blockManager.putBytes(blockId, dataRead, storageLevel) | ||
logDebug(s"Stored partition data of $this into block manager with level $storageLevel") | ||
|
@@ -111,14 +134,20 @@ class WriteAheadLogBackedBlockRDD[T: ClassTag]( | |
|
||
/** | ||
* Get the preferred location of the partition. This returns the locations of the block | ||
* if it is present in the block manager, else it returns the location of the | ||
* corresponding segment in HDFS. | ||
* if it is present in the block manager, else if FileBasedWriteAheadLogSegment is used, | ||
* it returns the location of the corresponding file segment in HDFS . | ||
*/ | ||
override def getPreferredLocations(split: Partition): Seq[String] = { | ||
val partition = split.asInstanceOf[WriteAheadLogBackedBlockRDDPartition] | ||
val blockLocations = getBlockIdLocations().get(partition.blockId) | ||
blockLocations.getOrElse( | ||
HdfsUtils.getFileSegmentLocations( | ||
partition.segment.path, partition.segment.offset, partition.segment.length, hadoopConfig)) | ||
blockLocations.getOrElse { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It might make sense to add location info to the WALRecordHandle interface itself. This way, systems that are not HDFS, but still benefit from preferred locations can use it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's a good point. I wasnt super sure of whether it is a good idea to have it in the interface in this version. We can add it later and maintain binary compatibility as the RecordHandle is an abstract class. Also It is still a developer API s. For now, I am going to merge this in to unblock #5732 . |
||
partition.walRecordHandle match { | ||
case fileSegment: FileBasedWriteAheadLogSegment => | ||
HdfsUtils.getFileSegmentLocations( | ||
fileSegment.path, fileSegment.offset, fileSegment.length, hadoopConfig) | ||
case _ => | ||
Seq.empty | ||
} | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel dirty seeing nulls in scala
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why allocate two (at least) objects when it is completely obvious that they are not going to be used. The null does not get exposed to anything outside the function, and hence is okay to have.
If you look at rest of the Spark source code, we dont strictly adhere to Scala-way of doing things, rather balance code understandability (limit the levels of functional nesting) and efficiency (while loops instead of for when perf matters) with Scala styles.