Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pyspark.sql.utils.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: gs://<bucket-name>-data_spark_metadata #9518

Closed
jmilagroso opened this issue Jul 8, 2021 · 2 comments
Labels

Comments

@jmilagroso
Copy link

jmilagroso commented Jul 8, 2021

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request.
  • Please do not leave +1 or me too comments, they generate extra noise for issue followers and do not help prioritize the request.
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment.
  • If an issue is assigned to the modular-magician user, it is either in the process of being autogenerated, or is planned to be autogenerated soon. If an issue is assigned to a user, that user is claiming responsibility for the issue. If an issue is assigned to hashibot, a community member has claimed the issue already.

Terraform Version

Terraform v1.0.1 on linux_amd64

Affected Resource(s)

  • google_dataproc_job

Terraform Configuration Files

required_providers {
    google = {
      source  = "hashicorp/google"
      version = "3.74.0"
    }
  }
}

resource "google_dataproc_cluster" "mycluster" {
  name     = "mycluster"
  project  = "my-project-id"
  region   = "us-central1"
  graceful_decommission_timeout = "120s"


  cluster_config {

    master_config {
      num_instances = 1
      machine_type  = "n2-standard-2"
      disk_config {
        boot_disk_type    = "pd-ssd"
        boot_disk_size_gb = 30
      }
    }

    worker_config {
      num_instances = 2
      machine_type  = "n2-standard-2"
      disk_config {
        boot_disk_size_gb = 30
        num_local_ssds    = 1
      }
    }

    preemptible_worker_config {
      num_instances = 0
    }

    software_config {
      image_version = "2.0-debian10"
      override_properties = {
        "dataproc:dataproc.allow.zero.workers" = "true"
      }
    }
  }
}

resource "google_dataproc_job" "pyspark" {
  project      = google_dataproc_cluster.mycluster.project
  region       = google_dataproc_cluster.mycluster.region
  force_delete = true
  placement {
    cluster_name = google_dataproc_cluster.mycluster.name
  }

  pyspark_config {
    main_python_file_uri = "gs://<PROJECT-ID>-scripts/main.py"

    args = [
        "app arg here"
    ]

    properties = {
      "spark.logConf" = "true"
    }
  }
}

output "pyspark_status" {
  value = google_dataproc_job.pyspark.status[0].state
}

Debug Output

Panic Output

https://gist.github.com/jmilagroso/5d4681f911b0514fbcc72676edb9a5a6

Expected Behavior

No error

Actual Behavior

Throws pyspark.sql.utils.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: gs://<bucket-name>-data_spark_metadata

Steps to Reproduce

  1. terraform apply
  2. Go to Google Cloud Console, Dataproc dashboard
  3. See Jobs

Important Factoids

Submitting the PySpark Job using gcloud cli is SUCCESSFUL and does not manifest any ERROR:

cloud dataproc jobs submit pyspark gs://<bucket-name>-scripts/main.py \
    --cluster=mycluster \
    --region=us-central1 \
    -- my_app_arg_here 

References

https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/dataproc_job

@jmilagroso jmilagroso added the bug label Jul 8, 2021
@jmilagroso jmilagroso changed the title pyspark.sql.utils.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: gs://<PROJECT-ID>-data_spark_metadata pyspark.sql.utils.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: gs://<bucket-name>-data_spark_metadata Jul 8, 2021
@jmilagroso
Copy link
Author

Closing this, upon creating other python script (for pyspark job) was able to simulate and attained expected result.

@github-actions
Copy link

github-actions bot commented Aug 8, 2021

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

1 participant