Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ODP-2049 - CVE | HSBC | Standalone Binaries for Spark #35

Merged
merged 28 commits into from
Sep 13, 2024

Commits on Sep 4, 2024

  1. Configuration menu
    Copy the full SHA
    8d0bc50 View commit details
    Browse the repository at this point in the history
  2. Merge pull request #28 from acceldata-io/ODP-2187

    ODP-2187 Upgrade snakeyaml version to 2.0
    senthh authored Sep 4, 2024
    Configuration menu
    Copy the full SHA
    02b3f43 View commit details
    Browse the repository at this point in the history
  3. [SPARK-35579][SQL] Bump janino to 3.1.7

    ### What changes were proposed in this pull request?
    
    upgrade janino to 3.1.7 from 3.0.16
    
    ### Why are the changes needed?
    
    - The proposed version contains bug fix in janino by maropu.
       - janino-compiler/janino#148
    - contains `getBytecodes` method which can be used to simplify the way to get bytecodes from ClassBodyEvaluator in CodeGenerator#updateAndGetCompilationStats method. (by LuciferYang)
       - apache#32536
    
    ### Does this PR introduce _any_ user-facing change?
    
    No
    
    ### How was this patch tested?
    
    Existing UTs
    
    Closes apache#37202 from singhpk234/upgrade/bump-janino.
    
    Authored-by: Prashant Singh <psinghvk@amazon.com>
    Signed-off-by: Sean Owen <srowen@gmail.com>
    
    (cherry picked from commit 29ed337)
    Prashant Singh authored and senthh committed Sep 4, 2024
    Configuration menu
    Copy the full SHA
    ed20b70 View commit details
    Browse the repository at this point in the history
  4. [SPARK-40633][BUILD] Upgrade janino to 3.1.9

    ### What changes were proposed in this pull request?
    This pr aims upgrade janino from 3.1.7 to 3.1.9
    
    ### Why are the changes needed?
    This version bring some improvement and bug fix, and janino 3.1.9 will no longer test Java 12, 15, 16 because these STS versions have been EOL:
    
    - janino-compiler/janino@v3.1.7...v3.1.9
    
    ### Does this PR introduce _any_ user-facing change?
    No
    
    ### How was this patch tested?
    - Pass GitHub Actions
    - Manual test this pr with Scala 2.13, all test passed
    
    Closes apache#38075 from LuciferYang/SPARK-40633.
    
    Lead-authored-by: yangjie01 <yangjie01@baidu.com>
    Co-authored-by: YangJie <yangjie01@baidu.com>
    Signed-off-by: Sean Owen <srowen@gmail.com>
    
    (cherry picked from commit 49e102b)
    LuciferYang authored and senthh committed Sep 4, 2024
    Configuration menu
    Copy the full SHA
    fc00bef View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    8d830cf View commit details
    Browse the repository at this point in the history
  6. Merge pull request #29 from acceldata-io/ODP-2167

    ODP-2167 Upgrade janino version
    senthh authored Sep 4, 2024
    Configuration menu
    Copy the full SHA
    62589f6 View commit details
    Browse the repository at this point in the history

Commits on Sep 5, 2024

  1. Configuration menu
    Copy the full SHA
    d373fea View commit details
    Browse the repository at this point in the history
  2. Merge pull request #30 from acceldata-io/ODP-2190

    ODP-2190 Upgrade guava version to 32.1.3-jre
    senthh authored Sep 5, 2024
    Configuration menu
    Copy the full SHA
    e9220a5 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    d5ee21c View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    f783536 View commit details
    Browse the repository at this point in the history

Commits on Sep 6, 2024

  1. Merge pull request #31 from acceldata-io/ODP-2193

    ODP-2193 | ODP-2194 Upgrade jettison version to 1.5.4 and wildfly-openssl to 1.1.3
    senthh authored Sep 6, 2024
    Configuration menu
    Copy the full SHA
    2aa244d View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    4095f60 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    b6f0a73 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    f776c04 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    13a9e8a View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    eb5d103 View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    9695174 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    00b03f8 View commit details
    Browse the repository at this point in the history

Commits on Sep 9, 2024

  1. Configuration menu
    Copy the full SHA
    7ca39ca View commit details
    Browse the repository at this point in the history
  2. Merge pull request #34 from acceldata-io/ODP-2175_1

    ODP-2175|SPARK-47018 Upgrade libthrift version and hive version
    senthh authored Sep 9, 2024
    Configuration menu
    Copy the full SHA
    b34fee6 View commit details
    Browse the repository at this point in the history
  3. [SPARK-39688][K8S] getReusablePVCs should handle accounts with no P…

    …VC permission
    
    ### What changes were proposed in this pull request?
    
    This PR aims to handle `KubernetesClientException` in `getReusablePVCs` method to handle gracefully the cases where accounts has no PVC permission including `listing`.
    
    ### Why are the changes needed?
    
    To prevent a regression in Apache Spark 3.4.
    
    ### Does this PR introduce _any_ user-facing change?
    
    No.
    
    ### How was this patch tested?
    
    Pass the CIs with the newly added test case.
    
    Closes apache#37095 from dongjoon-hyun/SPARK-39688.
    
    Authored-by: Dongjoon Hyun <dongjoon@apache.org>
    Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
    
    (cherry picked from commit 79f133b)
    dongjoon-hyun authored and senthh committed Sep 9, 2024
    Configuration menu
    Copy the full SHA
    dcd700c View commit details
    Browse the repository at this point in the history
  4. [SPARK-40458][K8S] Bump Kubernetes Client Version to 6.1.1

    ### What changes were proposed in this pull request?
    
    Bump kubernetes-client version from 5.12.3 to 6.1.1 and clean up all the deprecations.
    
    ### Why are the changes needed?
    
    To keep up with kubernetes-client [changes](fabric8io/kubernetes-client@v5.12.3...v6.1.1).
    As this is an upgrade where the main version changed I have cleaned up all the deprecations.
    
    ### Does this PR introduce _any_ user-facing change?
    
    No.
    
    ### How was this patch tested?
    
    #### Unit tests
    
    #### Manual tests for submit and application management
    
    Started an application in a non-default namespace (`bla`):
    
    ```
    ➜  spark git:(SPARK-40458) ✗ ./bin/spark-submit \
        --master k8s://http://127.0.0.1:8001 \
        --deploy-mode cluster \
        --name spark-pi \
        --class org.apache.spark.examples.SparkPi \
        --conf spark.executor.instances=5 \
        --conf spark.kubernetes.namespace=bla \
        --conf spark.kubernetes.container.image=docker.io/kubespark/spark:3.4.0-SNAPSHOT_064A99CC-57AF-46D5-B743-5B12692C260D \
        local:///opt/spark/examples/jars/spark-examples_2.12-3.4.0-SNAPSHOT.jar 200000
    ```
    
    Check that we cannot find it in the default namespace even with glob without the namespace definition:
    
    ```
    ➜  spark git:(SPARK-40458) ✗ minikube kubectl -- config set-context --current --namespace=default
    Context "minikube" modified.
    ➜  spark git:(SPARK-40458) ✗ ./bin/spark-submit --status "spark-pi-*" --master k8s://http://127.0.0.1:8001
    Submitting a request for the status of submission spark-pi-* in k8s://http://127.0.0.1:8001.
    No applications found.
    ```
    
    Then check we can find it by specifying the namespace:
    ```
    ➜  spark git:(SPARK-40458) ✗ ./bin/spark-submit --status "bla:spark-pi-*" --master k8s://http://127.0.0.1:8001
    Submitting a request for the status of submission bla:spark-pi-* in k8s://http://127.0.0.1:8001.
    Application status (driver):
             pod name: spark-pi-4c4e70837c86ae1a-driver
             namespace: bla
             labels: spark-app-name -> spark-pi, spark-app-selector -> spark-c95a9a0888214c01a286eb7ba23980a0, spark-role -> driver, spark-version -> 3.4.0-SNAPSHOT
             pod uid: 0be8952e-3e00-47a3-9082-9cb45278ed6d
             creation time: 2022-09-27T01:19:06Z
             service account name: default
             volumes: spark-local-dir-1, spark-conf-volume-driver, kube-api-access-wxnqw
             node name: minikube
             start time: 2022-09-27T01:19:06Z
             phase: Running
             container status:
                     container name: spark-kubernetes-driver
                     container image: kubespark/spark:3.4.0-SNAPSHOT_064A99CC-57AF-46D5-B743-5B12692C260D
                     container state: running
                     container started at: 2022-09-27T01:19:07Z
    ```
    
    Changing the namespace to `bla` with `kubectl`:
    
    ```
    ➜  spark git:(SPARK-40458) ✗  minikube kubectl -- config set-context --current --namespace=bla
    Context "minikube" modified.
    ```
    
    Checking we can find it without specifying the namespace (and glob):
    ```
    ➜  spark git:(SPARK-40458) ✗  ./bin/spark-submit --status "spark-pi-*" --master k8s://http://127.0.0.1:8001
    Submitting a request for the status of submission spark-pi-* in k8s://http://127.0.0.1:8001.
    Application status (driver):
             pod name: spark-pi-4c4e70837c86ae1a-driver
             namespace: bla
             labels: spark-app-name -> spark-pi, spark-app-selector -> spark-c95a9a0888214c01a286eb7ba23980a0, spark-role -> driver, spark-version -> 3.4.0-SNAPSHOT
             pod uid: 0be8952e-3e00-47a3-9082-9cb45278ed6d
             creation time: 2022-09-27T01:19:06Z
             service account name: default
             volumes: spark-local-dir-1, spark-conf-volume-driver, kube-api-access-wxnqw
             node name: minikube
             start time: 2022-09-27T01:19:06Z
             phase: Running
             container status:
                     container name: spark-kubernetes-driver
                     container image: kubespark/spark:3.4.0-SNAPSHOT_064A99CC-57AF-46D5-B743-5B12692C260D
                     container state: running
                     container started at: 2022-09-27T01:19:07Z
    ```
    
    Killing the app:
    ```
    ➜  spark git:(SPARK-40458) ✗  ./bin/spark-submit --kill "spark-pi-*" --master k8s://http://127.0.0.1:8001
    Submitting a request to kill submission spark-pi-* in k8s://http://127.0.0.1:8001. Grace period in secs: not set.
    Deleting driver pod: spark-pi-4c4e70837c86ae1a-driver.
    ```
    
    Closes apache#37990 from attilapiros/SPARK-40458.
    
    Authored-by: attilapiros <piros.attila.zsolt@gmail.com>
    Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
    
    (cherry picked from commit fa88651)
    attilapiros authored and senthh committed Sep 9, 2024
    Configuration menu
    Copy the full SHA
    eb05549 View commit details
    Browse the repository at this point in the history
  5. [SPARK-36462][K8S] Add the ability to selectively disable watching or…

    … polling
    
    ### What changes were proposed in this pull request?
    
    Add the ability to selectively disable watching or polling
    
    Updated version of apache#34264
    
    ### Why are the changes needed?
    
    Watching or polling for pod status on Kubernetes can place additional load on etcd, with a large number of executors and large number of jobs this can have negative impacts and executors register themselves with the driver under normal operations anyways.
    
    ### Does this PR introduce _any_ user-facing change?
    
    Two new config flags.
    
    ### How was this patch tested?
    
    New unit tests + manually tested a forked version of this on an internal cluster with both watching and polling disabled.
    
    Closes apache#36433 from holdenk/SPARK-36462-allow-spark-on-kube-to-operate-without-watchers.
    
    Lead-authored-by: Holden Karau <holden@pigscanfly.ca>
    Co-authored-by: Holden Karau <hkarau@netflix.com>
    Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
    
    (cherry picked from commit 5bffb98)
    holdenk authored and senthh committed Sep 9, 2024
    Configuration menu
    Copy the full SHA
    db280be View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    1f69e5a View commit details
    Browse the repository at this point in the history

Commits on Sep 10, 2024

  1. [SPARK-41958][CORE][3.3] Disallow arbitrary custom classpath with pro…

    …xy user in cluster mode
    
    Backporting fix for SPARK-41958 to 3.3 branch from apache#39474
    Below description from original PR.
    
    --------------------------
    
    ### What changes were proposed in this pull request?
    
    This PR proposes to disallow arbitrary custom classpath with proxy user in cluster mode by default.
    
    ### Why are the changes needed?
    
    To avoid arbitrary classpath in spark cluster.
    
    ### Does this PR introduce _any_ user-facing change?
    
    Yes. User should reenable this feature by `spark.submit.proxyUser.allowCustomClasspathInClusterMode`.
    
    ### How was this patch tested?
    
    Manually tested.
    
    Closes apache#39474 from Ngone51/dev.
    
    Lead-authored-by: Peter Toth <peter.tothgmail.com>
    Co-authored-by: Yi Wu <yi.wudatabricks.com>
    Signed-off-by: Hyukjin Kwon <gurwls223apache.org>
    
    (cherry picked from commit 909da96)
    
    ### What changes were proposed in this pull request?
    
    ### Why are the changes needed?
    
    ### Does this PR introduce _any_ user-facing change?
    
    ### How was this patch tested?
    
    Closes apache#41428 from degant/spark-41958-3.3.
    
    Lead-authored-by: Degant Puri <depuri@microsoft.com>
    Co-authored-by: Peter Toth <peter.toth@gmail.com>
    Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
    2 people authored and senthh committed Sep 10, 2024
    Configuration menu
    Copy the full SHA
    bd710aa View commit details
    Browse the repository at this point in the history

Commits on Sep 12, 2024

  1. Configuration menu
    Copy the full SHA
    62f9e94 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    b483b9a View commit details
    Browse the repository at this point in the history

Commits on Sep 13, 2024

  1. Configuration menu
    Copy the full SHA
    14731b1 View commit details
    Browse the repository at this point in the history