Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document issues at scale #20

Open
eric-czech opened this issue Oct 22, 2020 · 9 comments
Open

Document issues at scale #20

eric-czech opened this issue Oct 22, 2020 · 9 comments

Comments

@eric-czech
Copy link
Collaborator

eric-czech commented Oct 22, 2020

This is a broad issue that I'll use to document some of the problems that have come up when scaling out the UKB pipeline.

For convenience, here is a list of issues that have been filed as a result of these problems:

Some of these are discussed more below. Others were filed directly in upstream repos.

@eric-czech
Copy link
Collaborator Author

eric-czech commented Oct 22, 2020

Over the course of a day, here are a few problems I discovered when trying to ramp up to ~20 nodes in a cluster to run jobs. This including deleting and creating clusters serveral times throughout the day. It isn't clear to me which of these are definitely due to the larger cluster and which are just spurious. I'll log them in case they continue to occur.

  • When scaling from 5 to 20 nodes, kubernetes became unreachable for ~15 minutes. I saw this during that time period:
(base) eczech@ukb-experiments:~/repos/ukb-gwas-pipeline-nealelab$ kubectl get services
The connection to the server 34.75.250.168 was refused - did you specify the right host or port?

Normally the cluster is immediately accessible after a resize.

  • A simple filtering task (i.e. read zarr, apply filter, rechunk and write zarr) failed at the end of a script with AttributeError: 'FSMap' object has no attribute 'missing_exceptions'. This same script worked with no failures on a 5 node cluster and on a second attempt in a 20 node cluster.
Full Trace
2020-10-21 14:06:27,368 | __main__ | INFO | Filter summary:
Traceback (most recent call last):
  File "scripts/gwas_dev.py", line 300, in 
    #     logger.info("Done")
  File "/home/eczech/miniconda3/envs/gwas-dev/lib/python3.8/site-packages/fire/core.py", line 138, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home/eczech/miniconda3/envs/gwas-dev/lib/python3.8/site-packages/fire/core.py", line 463, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/home/eczech/miniconda3/envs/gwas-dev/lib/python3.8/site-packages/fire/core.py", line 672, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "scripts/gwas_dev.py", line 232, in run_qc_1
    logger.info('Applying QC filters')
  File "scripts/gwas_dev.py", line 112, in apply_filters
    v = v.compute()
  File "/home/eczech/miniconda3/envs/gwas-dev/lib/python3.8/site-packages/xarray/core/dataarray.py", line 834, in compute
    return new.load(**kwargs)
  File "/home/eczech/miniconda3/envs/gwas-dev/lib/python3.8/site-packages/xarray/core/dataarray.py", line 808, in load
    ds = self._to_temp_dataset().load(**kwargs)
  File "/home/eczech/miniconda3/envs/gwas-dev/lib/python3.8/site-packages/xarray/core/dataset.py", line 654, in load
    evaluated_data = da.compute(*lazy_data.values(), **kwargs)
  File "/home/eczech/miniconda3/envs/gwas-dev/lib/python3.8/site-packages/dask/base.py", line 452, in compute
    results = schedule(dsk, keys, **kwargs)
  File "/home/eczech/miniconda3/envs/gwas-dev/lib/python3.8/site-packages/distributed/client.py", line 2725, in get
    results = self.gather(packed, asynchronous=asynchronous, direct=direct)
  File "/home/eczech/miniconda3/envs/gwas-dev/lib/python3.8/site-packages/distributed/client.py", line 1986, in gather
    return self.sync(
  File "/home/eczech/miniconda3/envs/gwas-dev/lib/python3.8/site-packages/distributed/client.py", line 832, in sync
    return sync(
  File "/home/eczech/miniconda3/envs/gwas-dev/lib/python3.8/site-packages/distributed/utils.py", line 340, in sync
    raise exc.with_traceback(tb)
  File "/home/eczech/miniconda3/envs/gwas-dev/lib/python3.8/site-packages/distributed/utils.py", line 324, in f
    result[0] = yield future
  File "/home/eczech/miniconda3/envs/gwas-dev/lib/python3.8/site-packages/tornado/gen.py", line 735, in run
    value = future.result()
  File "/home/eczech/miniconda3/envs/gwas-dev/lib/python3.8/site-packages/distributed/client.py", line 1851, in _gather
    raise exception.with_traceback(traceback)
  File "/opt/conda/lib/python3.8/site-packages/dask/array/core.py", line 102, in getter
  File "/opt/conda/lib/python3.8/site-packages/numpy/core/_asarray.py", line 85, in asarray
  File "/opt/conda/lib/python3.8/site-packages/xarray/core/indexing.py", line 495, in __array__
  File "/opt/conda/lib/python3.8/site-packages/numpy/core/_asarray.py", line 85, in asarray
  File "/opt/conda/lib/python3.8/site-packages/xarray/core/indexing.py", line 560, in __array__
  File "/opt/conda/lib/python3.8/site-packages/numpy/core/_asarray.py", line 85, in asarray
  File "/opt/conda/lib/python3.8/site-packages/xarray/coding/variables.py", line 70, in __array__
  File "/opt/conda/lib/python3.8/site-packages/xarray/coding/variables.py", line 218, in _scale_offset_decoding
  File "/opt/conda/lib/python3.8/site-packages/xarray/coding/variables.py", line 70, in __array__
  File "/opt/conda/lib/python3.8/site-packages/xarray/coding/variables.py", line 138, in _apply_mask
  File "/opt/conda/lib/python3.8/site-packages/numpy/core/_asarray.py", line 85, in asarray
  File "/opt/conda/lib/python3.8/site-packages/xarray/core/indexing.py", line 560, in __array__
  File "/opt/conda/lib/python3.8/site-packages/xarray/backends/zarr.py", line 54, in __getitem__
  File "/opt/conda/lib/python3.8/site-packages/xarray/backends/zarr.py", line 51, in get_array
  File "/opt/conda/lib/python3.8/site-packages/zarr/hierarchy.py", line 338, in __getitem__
  File "/opt/conda/lib/python3.8/site-packages/zarr/core.py", line 124, in __init__
  File "/opt/conda/lib/python3.8/site-packages/zarr/core.py", line 141, in _load_metadata
  File "/opt/conda/lib/python3.8/site-packages/zarr/core.py", line 150, in _load_metadata_nosync
  File "/opt/conda/lib/python3.8/site-packages/fsspec/mapping.py", line 133, in __getitem__
AttributeError: 'FSMap' object has no attribute 'missing_exceptions'
  • Similar to the above, another script failed randomly at the end with the error aiohttp.client_exceptions.ServerDisconnectedError. This also worked on a second attempt.
Full Log
(gwas-dev) eczech@ukb-experiments:~/repos/ukb-gwas-pipeline-nealelab$ python scripts/gwas_dev.py run_qc_1 --input-path=rs-ukb/prep/gt-imputation/ukb_chr21.zarr --output-path=rs-ukb/prep/gt-imputation-qc/ukb_chr21.zarr
2020-10-21 15:46:30,108 | __main__ | INFO | Initialized script with dask client:

2020-10-21 15:46:30,109 | __main__ | INFO | Running stage 1 QC
2020-10-21 15:46:43,305 | __main__ | INFO | Loaded dataset:

Dimensions:                         (alleles: 2, genotypes: 3, samples: 487409, variants: 1261158)
Dimensions without coordinates: alleles, genotypes, samples, variants
Data variables:
    call_genotype_probability_mask  (variants, samples, genotypes) bool dask.array
    sample_id                       (samples) int32 dask.array
    sample_sex                      (samples) uint8 dask.array
    variant_allele                  (variants, alleles) |S101 dask.array
    variant_contig                  (variants) int64 dask.array
    variant_contig_name             (variants) |S2 dask.array
    variant_id                      (variants) |S115 dask.array
    variant_info                    (variants) float32 dask.array
    variant_maf                     (variants) float32 dask.array
    variant_minor_allele            (variants) |S101 dask.array
    variant_position                (variants) int32 dask.array
    variant_rsid                    (variants) |S115 dask.array
    call_genotype_probability       (variants, samples, genotypes) float16 dask.array
    call_dosage                     (variants, samples) float16 dask.array
    call_dosage_mask                (variants, samples) bool dask.array
Attributes:
    contig_index:  20
    contig_name:   21
    contigs:       ['21']
2020-10-21 15:46:43,305 | __main__ | INFO | Applying QC filters
2020-10-21 15:46:43,384 | __main__ | INFO | Filter summary:
2020-10-21 15:46:43,911 | __main__ | INFO |     high_info: {False: 896362, True: 364796}
2020-10-21 15:46:44,550 | __main__ | INFO | Filter summary:
2020-10-21 16:05:33,221 | __main__ | INFO |     nonzero_stddev: {True: 364796}
2020-10-21 16:05:34,199 | __main__ | INFO | Saving dataset to rs-ukb/prep/gt-imputation-qc/ukb_chr21.zarr:

Dimensions:                         (alleles: 2, genotypes: 3, samples: 487409, variants: 364796)
Dimensions without coordinates: alleles, genotypes, samples, variants
Data variables:
    call_genotype_probability_mask  (variants, samples, genotypes) bool dask.array
    sample_id                       (samples) int32 dask.array
    sample_sex                      (samples) uint8 dask.array
    variant_allele                  (variants, alleles) |S101 dask.array
    variant_contig                  (variants) int64 dask.array
    variant_contig_name             (variants) |S2 dask.array
    variant_id                      (variants) |S115 dask.array
    variant_info                    (variants) float32 dask.array
    variant_maf                     (variants) float32 dask.array
    variant_minor_allele            (variants) |S101 dask.array
    variant_position                (variants) int32 dask.array
    variant_rsid                    (variants) |S115 dask.array
    call_genotype_probability       (variants, samples, genotypes) float16 dask.array
    call_dosage                     (variants, samples) float16 dask.array
    call_dosage_mask                (variants, samples) bool dask.array
    variant_dosage_std              (variants) float32 dask.array
Attributes:
    contig_index:  20
    contig_name:   21
    contigs:       ['21']
Traceback (most recent call last):
  File "scripts/gwas_dev.py", line 304, in 
    fire.Fire()
  File "/home/eczech/miniconda3/envs/gwas-dev/lib/python3.8/site-packages/fire/core.py", line 138, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home/eczech/miniconda3/envs/gwas-dev/lib/python3.8/site-packages/fire/core.py", line 463, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/home/eczech/miniconda3/envs/gwas-dev/lib/python3.8/site-packages/fire/core.py", line 672, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "scripts/gwas_dev.py", line 227, in run_qc_1
    save_dataset(ds, output_path)
  File "scripts/gwas_dev.py", line 65, in save_dataset
    ds.to_zarr(store, mode="w", consolidated=True)
  File "/home/eczech/miniconda3/envs/gwas-dev/lib/python3.8/site-packages/xarray/core/dataset.py", line 1652, in to_zarr
    return to_zarr(
  File "/home/eczech/miniconda3/envs/gwas-dev/lib/python3.8/site-packages/xarray/backends/api.py", line 1369, in to_zarr
    writes = writer.sync(compute=compute)
  File "/home/eczech/miniconda3/envs/gwas-dev/lib/python3.8/site-packages/xarray/backends/common.py", line 155, in sync
    delayed_store = da.store(
  File "/home/eczech/miniconda3/envs/gwas-dev/lib/python3.8/site-packages/dask/array/core.py", line 981, in store
    result.compute(**kwargs)
  File "/home/eczech/miniconda3/envs/gwas-dev/lib/python3.8/site-packages/dask/base.py", line 167, in compute
    (result,) = compute(self, traverse=False, **kwargs)
  File "/home/eczech/miniconda3/envs/gwas-dev/lib/python3.8/site-packages/dask/base.py", line 452, in compute
    results = schedule(dsk, keys, **kwargs)
  File "/home/eczech/miniconda3/envs/gwas-dev/lib/python3.8/site-packages/distributed/client.py", line 2725, in get
    results = self.gather(packed, asynchronous=asynchronous, direct=direct)
  File "/home/eczech/miniconda3/envs/gwas-dev/lib/python3.8/site-packages/distributed/client.py", line 1986, in gather
    return self.sync(
  File "/home/eczech/miniconda3/envs/gwas-dev/lib/python3.8/site-packages/distributed/client.py", line 832, in sync
    return sync(
  File "/home/eczech/miniconda3/envs/gwas-dev/lib/python3.8/site-packages/distributed/utils.py", line 340, in sync
    raise exc.with_traceback(tb)
  File "/home/eczech/miniconda3/envs/gwas-dev/lib/python3.8/site-packages/distributed/utils.py", line 324, in f
    result[0] = yield future
  File "/home/eczech/miniconda3/envs/gwas-dev/lib/python3.8/site-packages/tornado/gen.py", line 735, in run
    value = future.result()
  File "/home/eczech/miniconda3/envs/gwas-dev/lib/python3.8/site-packages/distributed/client.py", line 1851, in _gather
    raise exception.with_traceback(traceback)
  File "/opt/conda/lib/python3.8/site-packages/distributed/protocol/pickle.py", line 75, in loads
  pass
File "/opt/conda/lib/python3.8/site-packages/zarr/core.py", line 1946, in __setstate__
File "/opt/conda/lib/python3.8/site-packages/zarr/core.py", line 124, in __init__
File "/opt/conda/lib/python3.8/site-packages/zarr/core.py", line 141, in _load_metadata
File "/opt/conda/lib/python3.8/site-packages/zarr/core.py", line 150, in _load_metadata_nosync
File "/opt/conda/lib/python3.8/site-packages/fsspec/mapping.py", line 132, in __getitem__
File "/opt/conda/lib/python3.8/site-packages/fsspec/asyn.py", line 227, in cat
File "/opt/conda/lib/python3.8/site-packages/gcsfs/core.py", line 826, in _cat_file
File "/opt/conda/lib/python3.8/site-packages/gcsfs/core.py", line 487, in _call
File "/opt/conda/lib/python3.8/site-packages/aiohttp/client.py", line 1012, in __aenter__
File "/opt/conda/lib/python3.8/site-packages/aiohttp/client.py", line 504, in _request
File "/opt/conda/lib/python3.8/site-packages/aiohttp/client_reqrep.py", line 847, in start
File "/opt/conda/lib/python3.8/site-packages/aiohttp/streams.py", line 591, in read
aiohttp.client_exceptions.ServerDisconnectedError
  • Had a job fail w/ an OOM (I think) even though it does nothing but filter and rechunk a dataset. This also worked on a second attempt.
Full Log
2020-10-21 21:39:06,490 | __main__ | INFO | Initialized script with dask client:

2020-10-21 21:39:06,490 | __main__ | INFO | Running stage 1 QC (input_path=rs-ukb/prep/gt-imputation-qc/ukb_chr22.zarr, output_path=rs-ukb/pipe/nealelab-gwas-uni-ancestry-v3/input/gt-imputation/ukb_chr22.zarr)
2020-10-21 21:39:06,748 | __main__ | INFO | Loaded dataset:

Dimensions:                         (alleles: 2, genotypes: 3, samples: 487409, variants: 365260)
Dimensions without coordinates: alleles, genotypes, samples, variants
Data variables:
    call_dosage                     (variants, samples) float16 dask.array
    call_dosage_mask                (variants, samples) bool dask.array
    call_genotype_probability       (variants, samples, genotypes) float16 dask.array
    call_genotype_probability_mask  (variants, samples, genotypes) bool dask.array
    sample_id                       (samples) int32 dask.array
    sample_sex                      (samples) uint8 dask.array
    variant_allele                  (variants, alleles) |S414 dask.array
    variant_contig                  (variants) int64 dask.array
    variant_contig_name             (variants) |S2 dask.array
    variant_dosage_std              (variants) float32 dask.array
    variant_id                      (variants) |S428 dask.array
    variant_info                    (variants) float32 dask.array
    variant_maf                     (variants) float32 dask.array
    variant_minor_allele            (variants) |S414 dask.array
    variant_position                (variants) int32 dask.array
    variant_rsid                    (variants) |S304 dask.array
Attributes:
    contig_index:  21
    contig_name:   22
    contigs:       ['22']
2020-10-21 21:39:06,748 | __main__ | INFO | Applying variant QC filters
2020-10-21 21:39:08,630 | __main__ | INFO | Filter summary (True ==> kept):
2020-10-21 21:39:09,108 | __main__ | INFO |     high_maf: {True: 213203, False: 152057}
2020-10-21 21:39:09,435 | __main__ | INFO | Filter summary (True ==> kept):
2020-10-21 21:45:10,070 | __main__ | INFO |     in_hwe: {True: 142510, False: 70693}
2020-10-21 21:45:10,326 | __main__ | INFO | Applying sample QC filters (sample_qc_path=rs-ukb/prep/main/ukb_sample_qc.zarr)
2020-10-21 21:45:10,905 | __main__ | INFO | Filter summary (True ==> kept):
2020-10-21 21:45:11,098 | __main__ | INFO |     no_aneuploidy: {True: 501854, False: 651}
2020-10-21 21:45:11,220 | __main__ | INFO |     in_pca: {True: 407127, False: 95378}
2020-10-21 21:45:11,305 | __main__ | INFO |     in_ethnic_groups: {True: 455793, False: 46712}
2020-10-21 21:45:11,322 | __main__ | INFO |     overall: {True: 365941, False: 136564}
2020-10-21 21:45:15,710 | __main__ | INFO | Saving dataset to rs-ukb/pipe/nealelab-gwas-uni-ancestry-v3/input/gt-imputation/ukb_chr22.zarr:

Dimensions:                            (alleles: 2, genotypes: 3, principal_components: 40, samples: 365941, variants: 142510)
Dimensions without coordinates: alleles, genotypes, principal_components, samples, variants
Data variables:
    variant_hwe_p_value                (variants) float64 dask.array
    call_dosage                        (variants, samples) float16 dask.array
    call_dosage_mask                   (variants, samples) bool dask.array
    call_genotype_probability          (variants, samples, genotypes) float16 dask.array
    call_genotype_probability_mask     (variants, samples, genotypes) bool dask.array
    sample_id                          (samples) int32 dask.array
    sample_sex                         (samples) uint8 dask.array
    variant_allele                     (variants, alleles) |S414 dask.array
    variant_contig                     (variants) int64 dask.array
    variant_contig_name                (variants) |S2 dask.array
    variant_dosage_std                 (variants) float32 dask.array
    variant_id                         (variants) |S428 dask.array
    variant_info                       (variants) float32 dask.array
    variant_maf                        (variants) float32 dask.array
    variant_minor_allele               (variants) |S414 dask.array
    variant_position                   (variants) int32 dask.array
    variant_rsid                       (variants) |S304 dask.array
    variant_genotype_counts            (variants, genotypes) int32 dask.array
    sample_qc_sex                      (samples) float64 dask.array
    sample_genetic_sex                 (samples) float64 dask.array
    sample_age_at_recruitment          (samples) float64 dask.array
    sample_principal_component         (samples, principal_components) float64 dask.array
    sample_ethnic_background           (samples) float64 dask.array
    sample_genotype_measurement_batch  (samples) float64 dask.array
    sample_genotype_measurement_plate  (samples) |S13 dask.array
    sample_genotype_measurement_well   (samples) |S3 dask.array
Traceback (most recent call last):
  File "scripts/gwas_dev.py", line 304, in 
    #     logger.info(f"Saving p-values to {output_path}")
  File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/0522a0a1/lib/python3.8/site-packages/fire/core.py", line 138, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/0522a0a1/lib/python3.8/site-packages/fire/core.py", line 463, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/0522a0a1/lib/python3.8/site-packages/fire/core.py", line 672, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "scripts/gwas_dev.py", line 247, in run_qc_2

File "scripts/gwas_dev.py", line 65, in save_dataset
for v in ds:
File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/0522a0a1/lib/python3.8/site-packages/xarray/core/dataset.py", line 1652, in to_zarr
return to_zarr(
File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/0522a0a1/lib/python3.8/site-packages/xarray/backends/api.py", line 1369, in to_zarr
writes = writer.sync(compute=compute)
File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/0522a0a1/lib/python3.8/site-packages/xarray/backends/common.py", line 155, in sync
delayed_store = da.store(
File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/0522a0a1/lib/python3.8/site-packages/dask/array/core.py", line 981, in store
result.compute(**kwargs)
File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/0522a0a1/lib/python3.8/site-packages/dask/base.py", line 167, in compute
(result,) = compute(self, traverse=False, **kwargs)
File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/0522a0a1/lib/python3.8/site-packages/dask/base.py", line 452, in compute
results = schedule(dsk, keys, **kwargs)
File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/0522a0a1/lib/python3.8/site-packages/distributed/client.py", line 2725, in get
results = self.gather(packed, asynchronous=asynchronous, direct=direct)
File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/0522a0a1/lib/python3.8/site-packages/distributed/client.py", line 1986, in gather
return self.sync(
File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/0522a0a1/lib/python3.8/site-packages/distributed/client.py", line 832, in sync
return sync(
File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/0522a0a1/lib/python3.8/site-packages/distributed/utils.py", line 340, in sync
raise exc.with_traceback(tb)
File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/0522a0a1/lib/python3.8/site-packages/distributed/utils.py", line 324, in f
result[0] = yield future
File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/0522a0a1/lib/python3.8/site-packages/tornado/gen.py", line 735, in run
value = future.result()
File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/0522a0a1/lib/python3.8/site-packages/distributed/client.py", line 1851, in _gather
raise exception.with_traceback(traceback)
distributed.scheduler.KilledWorker: ("('zarr-ba752cee0ca50d62669b188be8515903', 51, 69, 0)", <Worker 'tcp://10.0.15.3:41669', name: tcp://10.0.15.3:41669, memory: 0, processing: 745>)

  • At one point, I downsized a cluster for a while and then tried to resize it back to 20 nodes. Something must have gone wrong though with the helm deployment because I kept getting the error ModuleNotFoundError: No module named 'gcsfs'. This never happened on a fresh cluster.
Full Trace
Traceback (most recent call last):
  File "scripts/gwas_dev.py", line 304, in 
    fire.Fire()
  File "/home/eczech/miniconda3/envs/gwas-dev/lib/python3.8/site-packages/fire/core.py", line 138, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home/eczech/miniconda3/envs/gwas-dev/lib/python3.8/site-packages/fire/core.py", line 463, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/home/eczech/miniconda3/envs/gwas-dev/lib/python3.8/site-packages/fire/core.py", line 672, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "scripts/gwas_dev.py", line 227, in run_qc_1
    save_dataset(ds, output_path)
  File "scripts/gwas_dev.py", line 65, in save_dataset
    ds.to_zarr(store, mode="w", consolidated=True)
  File "/home/eczech/miniconda3/envs/gwas-dev/lib/python3.8/site-packages/xarray/core/dataset.py", line 1652, in to_zarr
    return to_zarr(
  File "/home/eczech/miniconda3/envs/gwas-dev/lib/python3.8/site-packages/xarray/backends/api.py", line 1369, in to_zarr
    writes = writer.sync(compute=compute)
  File "/home/eczech/miniconda3/envs/gwas-dev/lib/python3.8/site-packages/xarray/backends/common.py", line 155, in sync
    delayed_store = da.store(
  File "/home/eczech/miniconda3/envs/gwas-dev/lib/python3.8/site-packages/dask/array/core.py", line 981, in store
    result.compute(**kwargs)
  File "/home/eczech/miniconda3/envs/gwas-dev/lib/python3.8/site-packages/dask/base.py", line 167, in compute
    (result,) = compute(self, traverse=False, **kwargs)
  File "/home/eczech/miniconda3/envs/gwas-dev/lib/python3.8/site-packages/dask/base.py", line 452, in compute
    results = schedule(dsk, keys, **kwargs)
  File "/home/eczech/miniconda3/envs/gwas-dev/lib/python3.8/site-packages/distributed/client.py", line 2725, in get
    results = self.gather(packed, asynchronous=asynchronous, direct=direct)
  File "/home/eczech/miniconda3/envs/gwas-dev/lib/python3.8/site-packages/distributed/client.py", line 1986, in gather
    return self.sync(
  File "/home/eczech/miniconda3/envs/gwas-dev/lib/python3.8/site-packages/distributed/client.py", line 832, in sync
    return sync(
  File "/home/eczech/miniconda3/envs/gwas-dev/lib/python3.8/site-packages/distributed/utils.py", line 340, in sync
    raise exc.with_traceback(tb)
  File "/home/eczech/miniconda3/envs/gwas-dev/lib/python3.8/site-packages/distributed/utils.py", line 324, in f
    result[0] = yield future
  File "/home/eczech/miniconda3/envs/gwas-dev/lib/python3.8/site-packages/tornado/gen.py", line 735, in run
    value = future.result()
  File "/home/eczech/miniconda3/envs/gwas-dev/lib/python3.8/site-packages/distributed/client.py", line 1851, in _gather
    raise exception.with_traceback(traceback)
  File "/opt/conda/lib/python3.8/site-packages/distributed/protocol/pickle.py", line 75, in loads
    pass
ModuleNotFoundError: No module named 'gcsfs'

@eric-czech
Copy link
Collaborator Author

On several occasions when all work seems to bottleneck at a single node, I see an Info page like this:

Screen Shot 2020-11-17 at 10 33 20 AM

Clicking on that individual worker with all the pending tasks results in "500: Internal Server Error".

This workload (running GWAS regressions) eventually failed and it appears that all the workers reset, or at least the "logs" link above shows output as if they were all restarted after the failure. I'll see if the individual VMs have logs retained somewhere.

@eric-czech
Copy link
Collaborator Author

Nope, looks like all worker logs get cleared for some reason.

@eric-czech
Copy link
Collaborator Author

eric-czech commented Nov 19, 2020

For some reason this step failed on chr22 but not chr21 today. The error was consistent with OOM yet this step does nothing but filter and rechunk. It is probably the rechunking that is problematic -- why it is not consistently problematic is odd (the datasets for chr 21 and 22 are very similar, they have same number of samples and ~365k variants +/- 1k).

@eric-czech
Copy link
Collaborator Author

Allowing fsspec to overwrite zarr archives is a very bad idea. On more than one occasion, I have been puzzled as to why a client script appears to be idling with little to no cpu usage for ~15 minutes only to find that it is deleting a previously written zarr archive very slowly. These should always be deleted first with "gsutil -m" instead of relying on to_zarr(..., mode='w').

Perhaps fsspec is deleting files synchronously instead of in parallel like gsutil does.

@eric-czech
Copy link
Collaborator Author

eric-czech commented Nov 23, 2020

Today I tried to run more rechunking operations, but had an issue I have not seen before. I tried to run a GKE cluster with 5 64 vCPU (262144 MB RAM, 200G disk) nodes and a rechunking process on each one, for different contigs, but all jobs continuously failed with an error like "Exit code 255":

Screen Shot 2020-11-23 at 11 52 15 AM

I believe this is somehow being thrown from the Python process run as part of the snakemake CLI command. There is no information in the logs though, e.g.:

Screen Shot 2020-11-23 at 12 00 42 PM

This is the first time an error has occurred (other than OOM) without any useful logging, and in all cases it happens right after the bgen file is downloaded and before the conda environment is activated for the rechunking code. So far I have failed to reproduce this by running the same jobs, on the same VMs, and in the same cluster using smaller bgen files (chr XY runs as usual). I also tried running a job for a larger contig (e.g. 18) on a local Debian buster system (in the same conda env) but that also worked as expected.

Changes in use for this work: c7318b5

UPDATE

I tried this again on a cluster of 32 vCPUs instead. I still see the same failure for chromosome 19 and 20 (or presumably anything larger) but not 21 or 22. The problem likely has something to do with the file size. I'm not sure what code snakemake is using to do the file download, but the next logical step is probably to get onto a k8s cluster node, run the docker image manually, and then try to run the same command that snakemake issues. This might produce a system message or something else that is easier to catch in a terminal before digging into the code.

Command used to run snakemake script:

# image: snakemake/snakemake:v5.30.1
-c
cp -rf /source/. . && snakemake rs-ukb/prep/gt-imputation/ukb_chr21.2.ckpt --snakefile Snakefile --force -j --keep-target-files --keep-remote --latency-wait 5 --scheduler ilp --attempt 1 --force-use-threads --max-inventory-time 0 --wrapper-prefix https://github.com/snakemake/snakemake-wrappers/raw/ --allowed-rules bgen_to_zarr --nocolor --notemp --no-hooks --nolock --use-conda --default-remote-provider GS --default-remote-prefix rs-ukb

Note: I updated snakemake from 5.22 to 5.30 and google-cloud-sdk from 315.0 to 319.0 for these runs (the latest versions available).

UPDATE

See #25. The issue was disk space though it is still unclear why snakemake appears to be duplicating files on download.

@eric-czech eric-czech changed the title Document dask issues at scale Document issues at scale Nov 23, 2020
@eric-czech
Copy link
Collaborator Author

In the last couple days while rechunking bgen files, I encountered two jobs that failed for these reasons (after running for ~6 hrs):

# This job never made it past the file download:
Downloading from remote: rs-ukb/raw/gt-imputation/ukb_imp_chr9_v3.bgen
Traceback (most recent call last):
  File "/opt/conda/envs/snakemake/lib/python3.9/site-packages/snakemake/__init__.py", line 687, in snakemake
    success = workflow.execute(
  File "/opt/conda/envs/snakemake/lib/python3.9/site-packages/snakemake/workflow.py", line 1005, in execute
    success = scheduler.schedule()
  File "/opt/conda/envs/snakemake/lib/python3.9/site-packages/snakemake/scheduler.py", line 489, in schedule
    self.run(runjobs)
  File "/opt/conda/envs/snakemake/lib/python3.9/site-packages/snakemake/scheduler.py", line 500, in run
    executor.run_jobs(
  File "/opt/conda/envs/snakemake/lib/python3.9/site-packages/snakemake/executors/__init__.py", line 131, in run_jobs
    self.run(
  File "/opt/conda/envs/snakemake/lib/python3.9/site-packages/snakemake/executors/__init__.py", line 447, in run
    future = self.run_single_job(job)
  File "/opt/conda/envs/snakemake/lib/python3.9/site-packages/snakemake/executors/__init__.py", line 491, in run_single_job
    self.cached_or_run, job, run_wrapper, *self.job_args_and_prepare(job)
  File "/opt/conda/envs/snakemake/lib/python3.9/site-packages/snakemake/executors/__init__.py", line 452, in job_args_and_prepare
    job.prepare()
  File "/opt/conda/envs/snakemake/lib/python3.9/site-packages/snakemake/jobs.py", line 710, in prepare
    self.download_remote_input()
  File "/opt/conda/envs/snakemake/lib/python3.9/site-packages/snakemake/jobs.py", line 682, in download_remote_input
    f.download_from_remote()
  File "/opt/conda/envs/snakemake/lib/python3.9/site-packages/snakemake/io.py", line 584, in download_from_remote
    self.remote_object.download()
  File "/opt/conda/envs/snakemake/lib/python3.9/site-packages/google/api_core/retry.py", line 281, in retry_wrapped_func
    return retry_target(
  File "/opt/conda/envs/snakemake/lib/python3.9/site-packages/google/api_core/retry.py", line 184, in retry_target
    return target()
  File "/opt/conda/envs/snakemake/lib/python3.9/site-packages/snakemake/remote/GS.py", line 226, in download
    return download_blob(self.blob, self.local_file())
  File "/opt/conda/envs/snakemake/lib/python3.9/site-packages/google/api_core/retry.py", line 281, in retry_wrapped_func
    return retry_target(
  File "/opt/conda/envs/snakemake/lib/python3.9/site-packages/google/api_core/retry.py", line 184, in retry_target
    return target()
  File "/opt/conda/envs/snakemake/lib/python3.9/site-packages/snakemake/remote/GS.py", line 69, in download_blob
    blob.download_to_file(parser)
  File "/opt/conda/envs/snakemake/lib/python3.9/site-packages/google/cloud/storage/blob.py", line 1041, in download_to_file
    self._do_download(
  File "/opt/conda/envs/snakemake/lib/python3.9/site-packages/google/cloud/storage/blob.py", line 900, in _do_download
    response = download.consume(transport, timeout=timeout)
  File "/opt/conda/envs/snakemake/lib/python3.9/site-packages/google/resumable_media/requests/download.py", line 171, in consume
    self._write_to_stream(result)
  File "/opt/conda/envs/snakemake/lib/python3.9/site-packages/google/resumable_media/requests/download.py", line 120, in _write_to_stream
    raise common.DataCorruption(response, msg)
google.resumable_media.common.DataCorruption: Checksum mismatch while downloading:

  https://storage.googleapis.com/download/storage/v1/b/rs-ukb/o/raw%2Fgt-imputation%2Fukb_imp_chr9_v3.bgen?generation=1602861266282729&alt=media

The X-Goog-Hash header indicated an MD5 checksum of:

  J3RmHIDzGmBKkklx/ImWtg==

but the actual MD5 checksum of the downloaded contents was:

  XHqx7gai/Eij53rF8bMrKg==
# This job downloaded the file successfully but failed on GCS write:
<xarray.Dataset>
Dimensions:                         (alleles: 2, genotypes: 3, samples: 487409, variants: 4628348)
Dimensions without coordinates: alleles, genotypes, samples, variants
Data variables:
    variant_id                      (variants) |S151 dask.array<chunksize=(888859,), meta=np.ndarray>
    variant_rsid                    (variants) |S151 dask.array<chunksize=(888859,), meta=np.ndarray>
    variant_position                (variants) int32 dask.array<chunksize=(4628348,), meta=np.ndarray>
    variant_maf                     (variants) float32 dask.array<chunksize=(4628348,), meta=np.ndarray>
    variant_minor_allele            (variants) |S136 dask.array<chunksize=(986895,), meta=np.ndarray>
    variant_info                    (variants) float32 dask.array<chunksize=(4628348,), meta=np.ndarray>
    variant_allele                  (variants, alleles) |S136 dask.array<chunksize=(493447, 2), meta=np.ndarray>
    sample_id                       (samples) int32 4476413 3205773 ... 4315851
    sample_sex                      (samples) uint8 1 2 2 2 2 1 ... 1 2 2 1 2 1
    variant_contig                  (variants) int64 10 10 10 10 ... 10 10 10 10
    variant_contig_name             (variants) |S2 b'11' b'11' ... b'11' b'11'
    call_genotype_probability       (variants, samples, genotypes) float16 dask.array<chunksize=(250, 487409, 3), meta=np.ndarray>
    call_genotype_probability_mask  (variants, samples, genotypes) bool dask.array<chunksize=(250, 487409, 3), meta=np.ndarray>
Attributes:
    contigs:       ['11']
    contig_name:   11
    contig_index:  10
[##############                          ] | 35% Completed |  2hr 54min 11.3s
Traceback (most recent call last):
  File "scripts/convert_genetic_data.py", line 312, in <module>
    fire.Fire()
  File "/workdir/.snakemake/conda/3c6331d1/lib/python3.8/site-packages/fire/core.py", line 138, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/workdir/.snakemake/conda/3c6331d1/lib/python3.8/site-packages/fire/core.py", line 463, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/workdir/.snakemake/conda/3c6331d1/lib/python3.8/site-packages/fire/core.py", line 672, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "scripts/convert_genetic_data.py", line 296, in bgen_to_zarr
    ds = rechunk_dataset(
  File "scripts/convert_genetic_data.py", line 217, in rechunk_dataset
    res = fn(
  File "/workdir/.snakemake/conda/3c6331d1/lib/python3.8/site-packages/sgkit/io/bgen/bgen_reader.py", line 519, in rechunk_bgen
    rechunked.execute()
  File "/workdir/.snakemake/conda/3c6331d1/lib/python3.8/site-packages/rechunker/api.py", line 76, in execute
    self._executor.execute_plan(self._plan, **kwargs)
  File "/workdir/.snakemake/conda/3c6331d1/lib/python3.8/site-packages/rechunker/executors/dask.py", line 24, in execute_plan
    return plan.compute(**kwargs)
  File "/workdir/.snakemake/conda/3c6331d1/lib/python3.8/site-packages/dask/base.py", line 167, in compute
    (result,) = compute(self, traverse=False, **kwargs)
  File "/workdir/.snakemake/conda/3c6331d1/lib/python3.8/site-packages/dask/base.py", line 452, in compute
    results = schedule(dsk, keys, **kwargs)
  File "/workdir/.snakemake/conda/3c6331d1/lib/python3.8/site-packages/dask/threaded.py", line 76, in get
    results = get_async(
  File "/workdir/.snakemake/conda/3c6331d1/lib/python3.8/site-packages/dask/local.py", line 486, in get_async
    raise_exception(exc, tb)
  File "/workdir/.snakemake/conda/3c6331d1/lib/python3.8/site-packages/dask/local.py", line 316, in reraise
    raise exc
  File "/workdir/.snakemake/conda/3c6331d1/lib/python3.8/site-packages/dask/local.py", line 222, in execute_task
    result = _execute_task(task, data)
  File "/workdir/.snakemake/conda/3c6331d1/lib/python3.8/site-packages/dask/core.py", line 121, in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
  File "/workdir/.snakemake/conda/3c6331d1/lib/python3.8/site-packages/dask/array/core.py", line 3724, in store_chunk
    return load_store_chunk(x, out, index, lock, return_stored, False)
  File "/workdir/.snakemake/conda/3c6331d1/lib/python3.8/site-packages/dask/array/core.py", line 3713, in load_store_chunk
    out[index] = np.asanyarray(x)
  File "/workdir/.snakemake/conda/3c6331d1/lib/python3.8/site-packages/zarr/core.py", line 1115, in __setitem__
    self.set_basic_selection(selection, value, fields=fields)
  File "/workdir/.snakemake/conda/3c6331d1/lib/python3.8/site-packages/zarr/core.py", line 1210, in set_basic_selection
    return self._set_basic_selection_nd(selection, value, fields=fields)
  File "/workdir/.snakemake/conda/3c6331d1/lib/python3.8/site-packages/zarr/core.py", line 1501, in _set_basic_selection_nd
    self._set_selection(indexer, value, fields=fields)
  File "/workdir/.snakemake/conda/3c6331d1/lib/python3.8/site-packages/zarr/core.py", line 1550, in _set_selection
    self._chunk_setitem(chunk_coords, chunk_selection, chunk_value, fields=fields)
  File "/workdir/.snakemake/conda/3c6331d1/lib/python3.8/site-packages/zarr/core.py", line 1664, in _chunk_setitem
    self._chunk_setitem_nosync(chunk_coords, chunk_selection, value,
  File "/workdir/.snakemake/conda/3c6331d1/lib/python3.8/site-packages/zarr/core.py", line 1729, in _chunk_setitem_nosync
    self.chunk_store[ckey] = cdata
  File "/workdir/.snakemake/conda/3c6331d1/lib/python3.8/site-packages/fsspec/mapping.py", line 154, in __setitem__
    self.fs.pipe_file(key, value)
  File "/workdir/.snakemake/conda/3c6331d1/lib/python3.8/site-packages/fsspec/asyn.py", line 121, in wrapper
    return maybe_sync(func, self, *args, **kwargs)
  File "/workdir/.snakemake/conda/3c6331d1/lib/python3.8/site-packages/fsspec/asyn.py", line 100, in maybe_sync
    return sync(loop, func, *args, **kwargs)
  File "/workdir/.snakemake/conda/3c6331d1/lib/python3.8/site-packages/fsspec/asyn.py", line 71, in sync
    raise exc.with_traceback(tb)
  File "/workdir/.snakemake/conda/3c6331d1/lib/python3.8/site-packages/fsspec/asyn.py", line 55, in f
    result[0] = await future
  File "/workdir/.snakemake/conda/3c6331d1/lib/python3.8/site-packages/gcsfs/core.py", line 1007, in _pipe_file
    return await simple_upload(
  File "/workdir/.snakemake/conda/3c6331d1/lib/python3.8/site-packages/gcsfs/core.py", line 1523, in simple_upload
    j = await fs._call(
  File "/workdir/.snakemake/conda/3c6331d1/lib/python3.8/site-packages/gcsfs/core.py", line 487, in _call
    async with self.session.request(
  File "/workdir/.snakemake/conda/3c6331d1/lib/python3.8/site-packages/aiohttp/client.py", line 1117, in __aenter__
    self._resp = await self._coro
  File "/workdir/.snakemake/conda/3c6331d1/lib/python3.8/site-packages/aiohttp/client.py", line 544, in _request
    await resp.start(conn)
  File "/workdir/.snakemake/conda/3c6331d1/lib/python3.8/site-packages/aiohttp/client_reqrep.py", line 890, in start
    message, payload = await self._protocol.read()  # type: ignore
  File "/workdir/.snakemake/conda/3c6331d1/lib/python3.8/site-packages/aiohttp/streams.py", line 604, in read
    await self._waiter
aiohttp.client_exceptions.ClientOSError: [Errno 32] Broken pipe
[Tue Dec  1 23:36:26 2020]
Error in rule bgen_to_zarr:
    jobid: 0
    output: rs-ukb/prep/gt-imputation/ukb_chr11.ckpt
    conda-env: /workdir/.snakemake/conda/3c6331d1
    shell:
        python scripts/convert_genetic_data.py bgen_to_zarr --input-path-bgen=rs-ukb/raw/gt-imputation/ukb_imp_chr11_v3.bgen --input-path-variants=rs-ukb/raw/gt-imputation/ukb_mfi_chr11_v3.txt --input-path-samples=rs-ukb/raw/gt-imputation/ukb59384_imp_chr11_v3_s487296.sample --output-path=rs-ukb/prep/gt-imputation/ukb_chr11.zarr --contig-name=11 --contig-index=10 --remote=True && touch rs-ukb/prep/gt-imputation/ukb_chr11.ckpt
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /workdir/.snakemake/log/2020-12-01T201731.396448.snakemake.log

@eric-czech
Copy link
Collaborator Author

I have seen the first checksum error at least 4 or 5 times now, and it looks like it is a snakemake bug. I logged it at snakemake/snakemake#785.

The second error was logged at fsspec/gcsfs#315.

@eric-czech
Copy link
Collaborator Author

And here is another error that came out of nowhere today:

Downloading from remote: rs-ukb/raw/gt-imputation/ukb59384_imp_chr15_v3_s487296.sample
Finished download.
Downloading from remote: rs-ukb/raw/gt-imputation/ukb_imp_chr15_v3.bgen
Finished download.
Downloading from remote: rs-ukb/raw/gt-imputation/ukb_mfi_chr15_v3.txt
Finished download.
Activating conda environment: /workdir/.snakemake/conda/0a479a2e
2020-12-04 12:51:13,013 | __main__ | INFO | Loading BGEN dataset for contig Contig(name=15, index=14) from rs-ukb/raw/gt-imputation/ukb_imp_chr15_v3.bgen (chunks = (250, -1))
2020-12-04 12:52:04,036 | __main__ | INFO | Rechunking dataset for contig Contig(name=15, index=14) to gs://rs-ukb/prep/gt-imputation/ukb_chr15.zarr (chunks = (5216, 5792)):
<xarray.Dataset>
Dimensions:                         (alleles: 2, genotypes: 3, samples: 487409, variants: 2767971)
Dimensions without coordinates: alleles, genotypes, samples, variants
Data variables:
    variant_id                      (variants) |S186 dask.array<chunksize=(721600,), meta=np.ndarray>
    variant_rsid                    (variants) |S132 dask.array<chunksize=(922657,), meta=np.ndarray>
    variant_position                (variants) int32 dask.array<chunksize=(2767971,), meta=np.ndarray>
    variant_maf                     (variants) float32 dask.array<chunksize=(2767971,), meta=np.ndarray>
    variant_minor_allele            (variants) |S118 dask.array<chunksize=(922657,), meta=np.ndarray>
    variant_info                    (variants) float32 dask.array<chunksize=(2767971,), meta=np.ndarray>
    variant_allele                  (variants, alleles) |S172 dask.array<chunksize=(390167, 2), meta=np.ndarray>
    sample_id                       (samples) int32 4476413 3205773 ... 4315851
    sample_sex                      (samples) uint8 1 2 2 2 2 1 ... 1 2 2 1 2 1
    variant_contig                  (variants) int64 14 14 14 14 ... 14 14 14 14
    variant_contig_name             (variants) |S2 b'15' b'15' ... b'15' b'15'
    call_genotype_probability       (variants, samples, genotypes) float16 dask.array<chunksize=(250, 487409, 3), meta=np.ndarray>
    call_genotype_probability_mask  (variants, samples, genotypes) bool dask.array<chunksize=(250, 487409, 3), meta=np.ndarray>
Attributes:
    contigs:       ['15']
    contig_name:   15
    contig_index:  14
[####################                    ] | 51% Completed |  2hr  8min  8.3s
Traceback (most recent call last):
  File "scripts/convert_genetic_data.py", line 312, in <module>
    fire.Fire()
  File "/workdir/.snakemake/conda/0a479a2e/lib/python3.8/site-packages/fire/core.py", line 138, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/workdir/.snakemake/conda/0a479a2e/lib/python3.8/site-packages/fire/core.py", line 463, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/workdir/.snakemake/conda/0a479a2e/lib/python3.8/site-packages/fire/core.py", line 672, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "scripts/convert_genetic_data.py", line 296, in bgen_to_zarr
    ds = rechunk_dataset(
  File "scripts/convert_genetic_data.py", line 217, in rechunk_dataset
    res = fn(
  File "/workdir/.snakemake/conda/0a479a2e/lib/python3.8/site-packages/sgkit/io/bgen/bgen_reader.py", line 519, in rechunk_bgen
    rechunked.execute()
  File "/workdir/.snakemake/conda/0a479a2e/lib/python3.8/site-packages/rechunker/api.py", line 76, in execute
    self._executor.execute_plan(self._plan, **kwargs)
  File "/workdir/.snakemake/conda/0a479a2e/lib/python3.8/site-packages/rechunker/executors/dask.py", line 24, in execute_plan
    return plan.compute(**kwargs)
  File "/workdir/.snakemake/conda/0a479a2e/lib/python3.8/site-packages/dask/base.py", line 167, in compute
    (result,) = compute(self, traverse=False, **kwargs)
  File "/workdir/.snakemake/conda/0a479a2e/lib/python3.8/site-packages/dask/base.py", line 452, in compute
    results = schedule(dsk, keys, **kwargs)
  File "/workdir/.snakemake/conda/0a479a2e/lib/python3.8/site-packages/dask/threaded.py", line 76, in get
    results = get_async(
  File "/workdir/.snakemake/conda/0a479a2e/lib/python3.8/site-packages/dask/local.py", line 486, in get_async
    raise_exception(exc, tb)
  File "/workdir/.snakemake/conda/0a479a2e/lib/python3.8/site-packages/dask/local.py", line 316, in reraise
    raise exc
  File "/workdir/.snakemake/conda/0a479a2e/lib/python3.8/site-packages/dask/local.py", line 222, in execute_task
    result = _execute_task(task, data)
  File "/workdir/.snakemake/conda/0a479a2e/lib/python3.8/site-packages/dask/core.py", line 121, in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
  File "/workdir/.snakemake/conda/0a479a2e/lib/python3.8/site-packages/dask/array/core.py", line 3724, in store_chunk
    return load_store_chunk(x, out, index, lock, return_stored, False)
  File "/workdir/.snakemake/conda/0a479a2e/lib/python3.8/site-packages/dask/array/core.py", line 3713, in load_store_chunk
    out[index] = np.asanyarray(x)
  File "/workdir/.snakemake/conda/0a479a2e/lib/python3.8/site-packages/zarr/core.py", line 1115, in __setitem__
    self.set_basic_selection(selection, value, fields=fields)
  File "/workdir/.snakemake/conda/0a479a2e/lib/python3.8/site-packages/zarr/core.py", line 1210, in set_basic_selection
    return self._set_basic_selection_nd(selection, value, fields=fields)
  File "/workdir/.snakemake/conda/0a479a2e/lib/python3.8/site-packages/zarr/core.py", line 1501, in _set_basic_selection_nd
    self._set_selection(indexer, value, fields=fields)
  File "/workdir/.snakemake/conda/0a479a2e/lib/python3.8/site-packages/zarr/core.py", line 1550, in _set_selection
    self._chunk_setitem(chunk_coords, chunk_selection, chunk_value, fields=fields)
  File "/workdir/.snakemake/conda/0a479a2e/lib/python3.8/site-packages/zarr/core.py", line 1664, in _chunk_setitem
    self._chunk_setitem_nosync(chunk_coords, chunk_selection, value,
  File "/workdir/.snakemake/conda/0a479a2e/lib/python3.8/site-packages/zarr/core.py", line 1729, in _chunk_setitem_nosync
    self.chunk_store[ckey] = cdata
  File "/workdir/.snakemake/conda/0a479a2e/lib/python3.8/site-packages/fsspec/mapping.py", line 154, in __setitem__
    self.fs.pipe_file(key, value)
  File "/workdir/.snakemake/conda/0a479a2e/lib/python3.8/site-packages/fsspec/asyn.py", line 121, in wrapper
    return maybe_sync(func, self, *args, **kwargs)
  File "/workdir/.snakemake/conda/0a479a2e/lib/python3.8/site-packages/fsspec/asyn.py", line 100, in maybe_sync
    return sync(loop, func, *args, **kwargs)
  File "/workdir/.snakemake/conda/0a479a2e/lib/python3.8/site-packages/fsspec/asyn.py", line 71, in sync
    raise exc.with_traceback(tb)
  File "/workdir/.snakemake/conda/0a479a2e/lib/python3.8/site-packages/fsspec/asyn.py", line 55, in f
    result[0] = await future
  File "/workdir/.snakemake/conda/0a479a2e/lib/python3.8/site-packages/gcsfs/core.py", line 1007, in _pipe_file
    return await simple_upload(
  File "/workdir/.snakemake/conda/0a479a2e/lib/python3.8/site-packages/gcsfs/core.py", line 1523, in simple_upload
    j = await fs._call(
  File "/workdir/.snakemake/conda/0a479a2e/lib/python3.8/site-packages/gcsfs/core.py", line 525, in _call
    raise e
  File "/workdir/.snakemake/conda/0a479a2e/lib/python3.8/site-packages/gcsfs/core.py", line 507, in _call
    self.validate_response(status, contents, json, path, headers)
  File "/workdir/.snakemake/conda/0a479a2e/lib/python3.8/site-packages/gcsfs/core.py", line 1228, in validate_response
    raise HttpError(error)
gcsfs.utils.HttpError: Required

Filed as fsspec/gcsfs#316.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant