[KPIP-4] Support to submit batch job #2304

turboFei · 2022-04-08T15:36:50Z

turboFei
Apr 8, 2022
Collaborator

Motivation

Apache Kyuubi is a distributed multi-tenant JDBC server for large-scale data processing and analytics.
It now supports spark-sql and spark-scala query.

We want to enabled kyuubi to support submit batch job.

So that we can provide the spark job submission service instead of delivering the spark binary per Spark release.

Core domain objects and relationships

BatchesResource

The ApiRequestContext for batch job submission.

KyuubiBatchSession

The session for batch use case.

BatchJobSubmission

The operation for batch job submission.

SparkBatchProcessBuilder

The process builder for spark job submission.

API Design

GET /batches

Returns all the active batch sessions.

Request Parameters

Name	Description	Type
batchType	The batch type, such as spark/flink, if no batchType is specified, return all types	string
from	The start index to fetch sessions	int
size	Number of sessions to fetch	int

Response Body

Name	Description	Type
from	The start index of fetched sessions	int
total	Number of sessions fetched	int
batches	Batch list	list

POST /batches

Request Body

Name	Description	Type
batchType	The batch job type, such as Spark, Flink	String
resource	The resource containing the application to execute	path (required)
className	Application Java/Spark main class	string(required)
name	The name of this session	string
conf	Configuration properties	Map of key=val
args	Command line arguments for the application	list of strings

Response Body

The created Batch object.

GET /batches/{batchId}

Returns the batch session information.

Response Body

The Batch.

DELETE /batches/${batchId}

Name	Description	Type
killApp	Whether to kill the running app	Boolean, optional, false by defaults
hive.server2.proxy.user	the proxy user to impersonate	String, optional

Kills the Batch job.

GET /batches/${batchId}/localLog

Gets the local log lines from this batch.

Request Parameters

Name	Description	Type
from	Offset	int
size	Max number of log lines to return	int

Response Body

Name	Description	Type
id	The batch id	string
from	Offset from start of log	int
size	Number of log lines	int
log	The log lines	list of strings

Batch

Name	Description	Type
id	The session id	string
batchType	The batch type	string
batchInfo	The detailed application info	Map of key=val, such as id, kyuubi instance url, batch url.
kyuubiInstance	The kyuubi instance connection url	string
state	The batch state	string

Implementation

In this section, we describe the implementation of Kyuubi BatchesResource.

Batch Job Submission

Batch job conf

We need predefine some common parameters and some batch job conf ignore list.

We can define some config items prefixed with kyuubi.batchConf.<engine>. for batch job
SESSION_BATCH_CONF_IGNORE_LIST, for example, we might only enable customer to submit batch job with yarn-cluster mode, we can ignore thespark.master and spark.yarn.deployMode specified by user

Batch Job ProcessBuilder(For spark currently)

SparkBatchProcessBuilder
FlinkBatchProcessBuilder

For Spark,

$SPARK_HOME/bin/spark-submit … –conf spark.yarn.tags=${spark.yarn.tags},${batchId} …

batchId is unique(UUID), lookup the unique YARN_TAG from RM to get an applicationReport.

KyuubiBatchSessionImpl

It is to align with the current session-operation architecture, one KyuubiBatchSessionImpl on one BatchJobSubmission operation.

We will validate the batch job submission conf in the KyuubiBatchSessionImpl.

BatchSession HA across kyuubi server instances

This part designs the HA only for BatchSession across kyuubi server instances, not for interactive KyuubiSession, i.e. users can access a batch session through any of the Kyuubi instances at any time. It only covers Yarn mode and with Zookeeper as the batch session store.

At first, the batch session information across all kyuubi instances should be kept in sync and coherent. So, we need to use a shared state store - Zookeeper. Once a batch session is created on a Kyuubi instance, it should be accessible to Kyuubi users through all other kyuubi instances. Batch session information can change at any time during its lifetime. A change in the batch session information on one kyuubi instance should be visible through all other kyuubi instances.

In case a Kyuubi instance fails, other instances should keep the normal behavior. Once the kyuubi instance recovers from failure, it should read batch session information and recover them.
During the recovery phase, the instance should not accept any new request until the recovery process ends.

Limitation

The resource/jars/files used for batch jobs should be on global file systems, such as HDFS, S3.

turboFei · 2022-04-08T16:26:55Z

turboFei
Apr 8, 2022
Collaborator Author

The livy Multi node HA for batch sessions

0 replies

yaooqinn · 2022-04-08T16:41:09Z

yaooqinn
Apr 8, 2022
Collaborator

kpip 4，let’s send a discuss mail to dev to bring more attention？

1 reply

turboFei Apr 8, 2022
Collaborator Author

sure

yaooqinn · 2022-04-11T07:19:53Z

yaooqinn
Apr 11, 2022
Collaborator

Is batch id a JSON string {"pubid":"xxx", "secretId":xxx, "optionalType":xxx}?

0 replies

yaooqinn · 2022-04-11T07:27:03Z

yaooqinn
Apr 11, 2022
Collaborator

curl -X GET /batches/state -d
{
"publicId":UUID,
"secretID":UUID,
"protocol": emum -- optional
}

curl -X GET /batches/log -d
{
"publicId":UUID,
"secretID":UUID,
"protocol": emum -- optional
}

maybe, we use such API is good for Kyuubi BE

11 replies

turboFei Apr 11, 2022
Collaborator Author

should we only show the batch sessions owned by current user for GET /batches?

yaooqinn Apr 11, 2022
Collaborator

What is the result of GET /batches?

Should we show the secretId in the response?

If it is just for listing, it shouldn't
If it used for management, we need some authN/authZ enhance on it

For normal operations, like open session, get state, get log, both pid and sid are used via specific client and server pair, and only the pid is expose the other system like log, events etc

turboFei Apr 11, 2022
Collaborator Author

How about that?

For restful api, we combine the pubId and secretId into $pubId:$secretId.

And for the zookeeper state store persistence, we just use publicId as the key.

turboFei Apr 11, 2022
Collaborator Author

is it fine to expose the secretId? because we have enabled the authentication.

yaooqinn Apr 11, 2022
Collaborator

I am not against exposing the secretId through certain APIs, I just don't like the concatenation

yaooqinn · 2022-04-11T08:44:12Z

yaooqinn
Apr 11, 2022
Collaborator

https://127:0.0.1:10010/batches/40f6d6d2-cadb-4afc-8386-a18aea782a84:db4e4db6-1477-4941-b3d2-ae8201bcefba/state

if we follow this format and if we introduce operations in a session,

would it be https://127:0.0.1:10010/batches/40f6d6d2-cadb-4afc-8386-a18aea782a84:db4e4db6-1477-4941-b3d2-ae8201bcefba/operation/a0cc1c55-26db-4926-b9ad-9b79ccfb1ce0:9aa9f227-8add-460f-90fa-47ccbbd4c9a8/state?

1 reply

turboFei Apr 11, 2022
Collaborator Author

I see.

It should be

curl -X GET /batches/state -d
{"publicId":40f6d6d2-cadb-4afc-8386-a18aea782a84,"secretID":db4e4db6-1477-4941-b3d2-ae8201bcefba}

curl -X GET /batches/log -d
{
"id": {"publicId":40f6d6d2-cadb-4afc-8386-a18aea782a84,"secretID":db4e4db6-1477-4941-b3d2-ae8201bcefba},
"from": offsetFrom,
"size": size
}

yaooqinn · 2022-04-11T11:03:07Z

yaooqinn
Apr 11, 2022
Collaborator

does the batches listing api support filter, such as batchType?

1 reply

turboFei Apr 11, 2022
Collaborator Author

it should support

ulysses-you · 2022-04-11T12:44:37Z

ulysses-you
Apr 11, 2022
Collaborator

files and jars support remote filesystem ? or only at Kyuubi Server local ?

4 replies

zhaomin1423 Apr 11, 2022
Collaborator

I think that support remote filesystem is necessary，the kyuubi server local should be invisible for common users.

turboFei Apr 11, 2022
Collaborator Author

Spark supports remote filesystem main resource/jars/files.
You can place them on hdfs or s3.

pan3793 Apr 12, 2022
Collaborator

It's better to support http(s).

yaooqinn Apr 12, 2022
Collaborator

what is this question about？ do we need extra effort to support these？

turboFei · 2022-04-11T15:28:04Z

turboFei
Apr 11, 2022
Collaborator Author

because the batchId can not be a pathParameter.

So GET /batches conflicts with GET /batches -d '{"pubId": $pubId, "secretId":$secretId }'.

How about refactor the API to

GET /batches
POST /batches
GET /batches/batch -d '{"pubId": $pubId, "secretId":$secretId }'
GET /batches/batch/state -d '{"pubId": $pubId, "secretId":$secretId }'
GET /batches/batch/log -d '{"pubId": $pubId, "secretId":$secretId }'
DELETE /batches/batch -d '{"pubId": $pubId, "secretId":$secretId }'

@yaooqinn

10 replies

turboFei Apr 12, 2022
Collaborator Author

if then it's behavior like a login. how about GET /batches/batch?pubId=xxx&secretId=xxx

But it seems more complex, because users have to parse the pubId and secretId from json at first.

yaooqinn Apr 12, 2022
Collaborator

cannt we combine pubId and secreId to a base64 string ?

what's your point? a base 64 string for transportation or show?

yanghua Apr 12, 2022
Collaborator

if then it's behavior like a login. how about GET /batches/batch?pubId=xxx&secretId=xxx

seems this style is more generic and common?

turboFei Apr 12, 2022
Collaborator Author

if then it's behavior like a login. how about GET /batches/batch?pubId=xxx&secretId=xxx

seems this style is more generic and common?

But it involve more effort for user.

Maybe user just want a batchId instead of pubId and secretId.

yanghua Apr 12, 2022
Collaborator

IMO, the URI does not introduce a lot of additional effort compared with a more readable style.

yaooqinn · 2022-04-12T05:47:40Z

yaooqinn
Apr 12, 2022
Collaborator

On second thought, the batch id could be designed as a public id, as we can buffer the secret id at the server-side as a static string. when batch id is transformed to handles, we always use this static secret id

4 replies

turboFei Apr 12, 2022
Collaborator Author

For me, I think it is ok, a publicId can identify a session and it make the api design more brief.

yaooqinn Apr 12, 2022
Collaborator

let us follow this way

turboFei Apr 12, 2022
Collaborator Author

cc @yanghua

how about this?

yanghua Apr 12, 2022
Collaborator

sounds good

turboFei · 2022-04-13T16:03:52Z

turboFei
Apr 13, 2022
Collaborator Author

I also want to enable batch job submission with kyuubi beeline after the restful api finished.

TODO:

extends the kyuubi beeline options(draft version)
- --batch-job-submission to enable batch job submission mode, maybe need involve a new config item likes kyuubi.session.batch.enabled
- reuse existing -e and -f to specify the request body
- put above property into KyuubiConnection properties or openConf
extends the KyuubiConnection
- When opening KyuubiBatchSessionImpl, return the BatchJobSubmission operation handle just like return the LaunchEngine operation handle.
- Wait the BatchJobSubmission completion then close the KyuubiConnection

1 reply

turboFei Apr 13, 2022
Collaborator Author

cc @yaooqinn @pan3793 @yanghua @ulysses-you

gabrywu · 2022-04-29T12:26:33Z

gabrywu
Apr 29, 2022

What's a batch job? A sequence of SQLs

7 replies

gabrywu Apr 30, 2022

got it, strange name

turboFei Apr 30, 2022
Collaborator Author

yeah, is there a better name?

gabrywu Apr 30, 2022

application?just job?jar? task ? batch means many, when I see it, lots of,many jobs come to my mind

turboFei Apr 30, 2022
Collaborator Author

For me, when I see batch, I may think of streaming(batch & streaming).

I may think of ETL.

turboFei Apr 30, 2022
Collaborator Author

Batch processing vs interactive analysis.

The normal kyuubi session is interactive session, but KyuubiBatchSession is not.

So, batch is also a feature for the KyuubiBatchSession.

[KPIP-4] Support to submit batch job #2304

turboFei Apr 8, 2022 Collaborator

Motivation

Core domain objects and relationships

BatchesResource

KyuubiBatchSession

BatchJobSubmission

SparkBatchProcessBuilder

API Design

GET /batches

Request Parameters

Response Body

POST /batches

Request Body

Response Body

GET /batches/{batchId}

Response Body

DELETE /batches/${batchId}

GET /batches/${batchId}/localLog

Request Parameters

Response Body

Batch

Implementation

Batch Job Submission

Batch job conf

Batch Job ProcessBuilder(For spark currently)

KyuubiBatchSessionImpl

BatchSession HA across kyuubi server instances

Limitation

Replies: 11 comments · 40 replies

turboFei Apr 8, 2022 Collaborator Author

yaooqinn Apr 8, 2022 Collaborator

turboFei Apr 8, 2022 Collaborator Author

yaooqinn Apr 11, 2022 Collaborator

yaooqinn Apr 11, 2022 Collaborator

turboFei Apr 11, 2022 Collaborator Author

yaooqinn Apr 11, 2022 Collaborator

turboFei Apr 11, 2022 Collaborator Author

turboFei Apr 11, 2022 Collaborator Author

yaooqinn Apr 11, 2022 Collaborator

yaooqinn Apr 11, 2022 Collaborator

turboFei Apr 11, 2022 Collaborator Author

yaooqinn Apr 11, 2022 Collaborator

turboFei Apr 11, 2022 Collaborator Author

ulysses-you Apr 11, 2022 Collaborator

zhaomin1423 Apr 11, 2022 Collaborator

turboFei Apr 11, 2022 Collaborator Author

pan3793 Apr 12, 2022 Collaborator

yaooqinn Apr 12, 2022 Collaborator

turboFei Apr 11, 2022 Collaborator Author

turboFei Apr 12, 2022 Collaborator Author

yaooqinn Apr 12, 2022 Collaborator

yanghua Apr 12, 2022 Collaborator

turboFei Apr 12, 2022 Collaborator Author

yanghua Apr 12, 2022 Collaborator

yaooqinn Apr 12, 2022 Collaborator

turboFei Apr 12, 2022 Collaborator Author

yaooqinn Apr 12, 2022 Collaborator

turboFei Apr 12, 2022 Collaborator Author

yanghua Apr 12, 2022 Collaborator

turboFei Apr 13, 2022 Collaborator Author

turboFei Apr 13, 2022 Collaborator Author

gabrywu Apr 29, 2022

gabrywu Apr 30, 2022

turboFei Apr 30, 2022 Collaborator Author

gabrywu Apr 30, 2022

turboFei Apr 30, 2022 Collaborator Author

turboFei Apr 30, 2022 Collaborator Author

turboFei
Apr 8, 2022
Collaborator

Replies: 11 comments 40 replies

turboFei
Apr 8, 2022
Collaborator Author

yaooqinn
Apr 8, 2022
Collaborator

turboFei Apr 8, 2022
Collaborator Author

yaooqinn
Apr 11, 2022
Collaborator

yaooqinn
Apr 11, 2022
Collaborator

turboFei Apr 11, 2022
Collaborator Author

yaooqinn Apr 11, 2022
Collaborator

turboFei Apr 11, 2022
Collaborator Author

turboFei Apr 11, 2022
Collaborator Author

yaooqinn Apr 11, 2022
Collaborator

yaooqinn
Apr 11, 2022
Collaborator

turboFei Apr 11, 2022
Collaborator Author

yaooqinn
Apr 11, 2022
Collaborator

turboFei Apr 11, 2022
Collaborator Author

ulysses-you
Apr 11, 2022
Collaborator

zhaomin1423 Apr 11, 2022
Collaborator

turboFei Apr 11, 2022
Collaborator Author

pan3793 Apr 12, 2022
Collaborator

yaooqinn Apr 12, 2022
Collaborator

turboFei
Apr 11, 2022
Collaborator Author

turboFei Apr 12, 2022
Collaborator Author

yaooqinn Apr 12, 2022
Collaborator

yanghua Apr 12, 2022
Collaborator

turboFei Apr 12, 2022
Collaborator Author

yanghua Apr 12, 2022
Collaborator

yaooqinn
Apr 12, 2022
Collaborator

turboFei Apr 12, 2022
Collaborator Author

yaooqinn Apr 12, 2022
Collaborator

turboFei Apr 12, 2022
Collaborator Author

yanghua Apr 12, 2022
Collaborator

turboFei
Apr 13, 2022
Collaborator Author

turboFei Apr 13, 2022
Collaborator Author

gabrywu
Apr 29, 2022

turboFei Apr 30, 2022
Collaborator Author

turboFei Apr 30, 2022
Collaborator Author

turboFei Apr 30, 2022
Collaborator Author