Skip to content
This repository has been archived by the owner on Sep 18, 2024. It is now read-only.

Job management error: Converting circular structure to JSON #3656

Closed
ipreetinder opened this issue May 19, 2021 · 4 comments
Closed

Job management error: Converting circular structure to JSON #3656

ipreetinder opened this issue May 19, 2021 · 4 comments

Comments

@ipreetinder
Copy link

Describe the issue:

Hi,

I am running NNI tool on AML and after some number of trial, I start to get this error

"Job management error: Converting circular structure to JSON"

NNI Manager.log
[2021-05-18 18:44:19] ERROR [ 'TypeError: Converting circular structure to JSON\n at JSON.stringify ()\n at NNIDataStore.storeTrialJobEvent (/home//miniconda3/lib/python3.8/site-packages/nni_node/core/nniDataStore.js:59:105)\n at NNIManager.requestTrialJobsStatus (/home//miniconda3/lib/python3.8/site-packages/nni_node/core/nnimanager.js:443:38)' ]

nnictl_stderr.log
{ TypeError: Converting circular structure to JSON
at JSON.stringify ()
at NNIDataStore.storeTrialJobEvent (/home//miniconda3/lib/python3.8/site-packages/nni_node/core/nniDataStore.js:59:105)
at NNIManager.requestTrialJobsStatus (/home//miniconda3/lib/python3.8/site-packages/nni_node/core/nnimanager.js:443:38)
name: '',
cause:
TypeError: Converting circular structure to JSON
at JSON.stringify ()
at NNIDataStore.storeTrialJobEvent (/home//miniconda3/lib/python3.8/site-packages/nni_node/core/nniDataStore.js:59:105)
at NNIManager.requestTrialJobsStatus (/home//miniconda3/lib/python3.8/site-packages/nni_node/core/nnimanager.js:443:38) }

NNI job stops after that and it does not run any more trial. It is expected to fail one or more trials. Can you please help in narrowing down why this is happening to all my jobs.

I am trying to run around 10000 trial. I can share the Conf file if needed.

Thanks

Environment:

  • NNI version: v2.2
  • Training service (local|remote|pai|aml|etc): AML
  • Client OS: WSL ubuntu
  • Server OS (for remote mode only): Linux
  • Python version: 3.8
  • PyTorch/TensorFlow version:
  • Is conda/virtualenv/venv used?: Conda
  • Is running in Docker?: yes

Configuration:

  • Experiment config (remember to remove secrets!):
    authorName: default
    experimentName: mainz_ner_gpu_using_trial_next
    trialConcurrency: 1
    maxTrialNum: 10000
    #choice: local, remote, pai
    trainingServicePlatform: aml
    searchSpacePath: search_space.json
    #choice: true, false
    useAnnotation: false
    advisor:
    #choice: Hyperband, BOHB
    #(BOHB should be installed through nnictl)
    builtinAdvisorName: BOHB
    classArgs:
    max_budget: 100
    min_budget: 10
    eta: 3
    optimize_mode: maximize
    trial:
    command: ./run_mainz.sh
    codeDir: .
    gpuNum: 1
    image: ipreetinder/alps-nni:latest
    amlConfig:
    subscriptionId: <>
    resourceGroup: <>
    workspaceName: <>
    computeTarget: <>
    useActiveGpu: true

  • Search space:

  • {
    "DROPOUT":{"_type":"uniform","_value":[0.0, 0.5]},
    "CLASSIF_DROPOUT":{"_type":"uniform","_value":[0.0, 0.5]},
    "GRADIENT_ACCUMULATE_STEP": {"_type":"choice","_value":[1, 2, 4, 8, 16, 32]},
    "NUM_WARMUP_STEPS":{"_type":"choice","_value":[0, 100, 500, 1000, 5000, 10000]},
    "START_LEARNING_RATE":{"_type":"loguniform","_value":[0.000001, 0.005]},
    "WEIGHT_DECAY":{"_type":"choice","_value":[0.0, 0.001, 0.0001, 0.00001]}
    }

Log message:

  • nnimanager.log:
  • dispatcher.log:
  • nnictl stdout and stderr:
    log.zip

How to reproduce it?:

@kvartet
Copy link
Contributor

kvartet commented May 26, 2021

will be fixed in NNI v2.3, please look forward to it~

@ipreetinder
Copy link
Author

Thanks for the update. When is NNI 2.3 getting released?

@kvartet
Copy link
Contributor

kvartet commented Jun 3, 2021

the target date is 6.9

@kvartet
Copy link
Contributor

kvartet commented Jun 10, 2021

had fixed in #3705, so I close this issue, free feel to reopen it.

@kvartet kvartet closed this as completed Jun 10, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants