You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Sep 18, 2024. It is now read-only.
I am running NNI tool on AML and after some number of trial, I start to get this error
"Job management error: Converting circular structure to JSON"
NNI Manager.log
[2021-05-18 18:44:19] ERROR [ 'TypeError: Converting circular structure to JSON\n at JSON.stringify ()\n at NNIDataStore.storeTrialJobEvent (/home//miniconda3/lib/python3.8/site-packages/nni_node/core/nniDataStore.js:59:105)\n at NNIManager.requestTrialJobsStatus (/home//miniconda3/lib/python3.8/site-packages/nni_node/core/nnimanager.js:443:38)' ]
nnictl_stderr.log
{ TypeError: Converting circular structure to JSON
at JSON.stringify ()
at NNIDataStore.storeTrialJobEvent (/home//miniconda3/lib/python3.8/site-packages/nni_node/core/nniDataStore.js:59:105)
at NNIManager.requestTrialJobsStatus (/home//miniconda3/lib/python3.8/site-packages/nni_node/core/nnimanager.js:443:38)
name: '',
cause:
TypeError: Converting circular structure to JSON
at JSON.stringify ()
at NNIDataStore.storeTrialJobEvent (/home//miniconda3/lib/python3.8/site-packages/nni_node/core/nniDataStore.js:59:105)
at NNIManager.requestTrialJobsStatus (/home//miniconda3/lib/python3.8/site-packages/nni_node/core/nnimanager.js:443:38) }
NNI job stops after that and it does not run any more trial. It is expected to fail one or more trials. Can you please help in narrowing down why this is happening to all my jobs.
I am trying to run around 10000 trial. I can share the Conf file if needed.
Describe the issue:
Hi,
I am running NNI tool on AML and after some number of trial, I start to get this error
"Job management error: Converting circular structure to JSON"
NNI Manager.log
[2021-05-18 18:44:19] ERROR [ 'TypeError: Converting circular structure to JSON\n at JSON.stringify ()\n at NNIDataStore.storeTrialJobEvent (/home//miniconda3/lib/python3.8/site-packages/nni_node/core/nniDataStore.js:59:105)\n at NNIManager.requestTrialJobsStatus (/home//miniconda3/lib/python3.8/site-packages/nni_node/core/nnimanager.js:443:38)' ]
nnictl_stderr.log
{ TypeError: Converting circular structure to JSON
at JSON.stringify ()
at NNIDataStore.storeTrialJobEvent (/home//miniconda3/lib/python3.8/site-packages/nni_node/core/nniDataStore.js:59:105)
at NNIManager.requestTrialJobsStatus (/home//miniconda3/lib/python3.8/site-packages/nni_node/core/nnimanager.js:443:38)
name: '',
cause:
TypeError: Converting circular structure to JSON
at JSON.stringify ()
at NNIDataStore.storeTrialJobEvent (/home//miniconda3/lib/python3.8/site-packages/nni_node/core/nniDataStore.js:59:105)
at NNIManager.requestTrialJobsStatus (/home//miniconda3/lib/python3.8/site-packages/nni_node/core/nnimanager.js:443:38) }
NNI job stops after that and it does not run any more trial. It is expected to fail one or more trials. Can you please help in narrowing down why this is happening to all my jobs.
I am trying to run around 10000 trial. I can share the Conf file if needed.
Thanks
Environment:
Configuration:
Experiment config (remember to remove secrets!):
authorName: default
experimentName: mainz_ner_gpu_using_trial_next
trialConcurrency: 1
maxTrialNum: 10000
#choice: local, remote, pai
trainingServicePlatform: aml
searchSpacePath: search_space.json
#choice: true, false
useAnnotation: false
advisor:
#choice: Hyperband, BOHB
#(BOHB should be installed through nnictl)
builtinAdvisorName: BOHB
classArgs:
max_budget: 100
min_budget: 10
eta: 3
optimize_mode: maximize
trial:
command: ./run_mainz.sh
codeDir: .
gpuNum: 1
image: ipreetinder/alps-nni:latest
amlConfig:
subscriptionId: <>
resourceGroup: <>
workspaceName: <>
computeTarget: <>
useActiveGpu: true
Search space:
{
"DROPOUT":{"_type":"uniform","_value":[0.0, 0.5]},
"CLASSIF_DROPOUT":{"_type":"uniform","_value":[0.0, 0.5]},
"GRADIENT_ACCUMULATE_STEP": {"_type":"choice","_value":[1, 2, 4, 8, 16, 32]},
"NUM_WARMUP_STEPS":{"_type":"choice","_value":[0, 100, 500, 1000, 5000, 10000]},
"START_LEARNING_RATE":{"_type":"loguniform","_value":[0.000001, 0.005]},
"WEIGHT_DECAY":{"_type":"choice","_value":[0.0, 0.001, 0.0001, 0.00001]}
}
Log message:
log.zip
How to reproduce it?:
The text was updated successfully, but these errors were encountered: