Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trouble running the repository #4

Open
sankalpachowdhury opened this issue Jul 17, 2021 · 1 comment
Open

Trouble running the repository #4

sankalpachowdhury opened this issue Jul 17, 2021 · 1 comment

Comments

@sankalpachowdhury
Copy link

Trying to run in google colab.
Tensorflow version = 2.5.0
I have mounted the dataset also.

I've also tried using Tf v1, which is also not working.

I've followed the conversion to tf v2 done by https://github.com/ajenningsfrankston/graph_kt.git which also throws multiple errors.
Here are the exact errors faced.
running using:

!python /content/graph_kt/main.py --dataset /content/data/assist09_3/assist09_3
--n_hop 3
--log_dir /content/logs
--checkpoint_dir /content/checkpoint
--skill_neighbor_num 4
--question_neighbor_num 4
--hist_neighbor_num 3
--next_neighbor_num 4
--model hsei
--lr 0.001
--att_bound 0.7
--sim_emb question_emb
--dropout_keep_probs [0.8,0.8,1]

output:

2021-07-17 18:06:30.791642: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
hsei
{'data_dir': 'data', 'log_dir': '/content/logs', 'train': 1, 'hidden_neurons': [200, 100], 'lr': 0.001, 'lr_decay': 0.92, 'checkpoint_dir': '/content/checkpoint', 'dropout_keep_probs': '[0.8,0.8,1]', 'aggregator': 'sum', 'model': 'hsei', 'l2_weight': 1e-08, 'limit_max_len': 200, 'limit_min_len': 3, 'dataset': '/content/data/assist09_3/assist09_3', 'field_size': 3, 'embedding_size': 100, 'max_step': 200, 'input_trans_size': 100, 'batch_size': 32, 'select_index': [0, 1, 2], 'num_epochs': 150, 'n_hop': 3, 'skill_neighbor_num': 4, 'question_neighbor_num': 4, 'hist_neighbor_num': 3, 'next_neighbor_num': 4, 'att_bound': 0.7, 'sim_emb': 'question_emb', 'tag': 1626545192.6224916}
original test seqs num:893
167
17737
2021-07-17 18:06:40.247804: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX512F
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-07-17 18:06:40.248722: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2021-07-17 18:06:40.277228: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-17 18:06:40.277817: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:00:04.0 name: Tesla T4 computeCapability: 7.5
coreClock: 1.59GHz coreCount: 40 deviceMemorySize: 14.75GiB deviceMemoryBandwidth: 298.08GiB/s
2021-07-17 18:06:40.277864: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-07-17 18:06:40.280330: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2021-07-17 18:06:40.280429: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
2021-07-17 18:06:40.282030: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10
2021-07-17 18:06:40.282367: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10
2021-07-17 18:06:40.284136: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.10
2021-07-17 18:06:40.284943: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11
2021-07-17 18:06:40.285182: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
2021-07-17 18:06:40.285302: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-17 18:06:40.285894: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-17 18:06:40.286410: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2021-07-17 18:06:40.286467: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-07-17 18:06:40.773872: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-07-17 18:06:40.773926: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264] 0
2021-07-17 18:06:40.773942: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0: N
2021-07-17 18:06:40.774118: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-17 18:06:40.774763: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-17 18:06:40.775317: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-17 18:06:40.775848: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 13837 MB memory) -> physical GPU (device: 0, name: Tesla T4, pci bus id: 0000:00:04.0, compute capability: 7.5)
hsei
WARNING:tensorflow:From /usr/local/lib/python3.7/dist-packages/tensorflow/python/ops/array_ops.py:5049: calling gather (from tensorflow.python.ops.array_ops) with validate_indices is deprecated and will be removed in a future version.
Instructions for updating:
The validate_indices argument has no effect. Indices are always validated on CPU and never validated on GPU.
/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/legacy_tf_layers/core.py:171: UserWarning: tf.layers.dense is deprecated and will be removed in a future version. Please use tf.keras.layers.Dense instead.
warnings.warn('tf.layers.dense is deprecated and '
/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base_layer_v1.py:1692: UserWarning: layer.apply is deprecated and will be removed in a future version. Please use layer.call method instead.
warnings.warn('layer.apply is deprecated and '
/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/layers/legacy_rnn/rnn_cell_impl.py:708: UserWarning: tf.nn.rnn_cell.BasicLSTMCell is deprecated and will be removed in a future version. This class is equivalent as tf.keras.layers.LSTMCell, and will be replaced by that in Tensorflow 2.0.
warnings.warn("tf.nn.rnn_cell.BasicLSTMCell is deprecated and will be "
/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/layers/legacy_rnn/rnn_cell_impl.py:909: UserWarning: tf.nn.rnn_cell.LSTMCell is deprecated and will be removed in a future version. This class is equivalent as tf.keras.layers.LSTMCell, and will be replaced by that in Tensorflow 2.0.
warnings.warn("tf.nn.rnn_cell.LSTMCell is deprecated and will be "
/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base_layer_v1.py:1700: UserWarning: layer.add_variable is deprecated and will be removed in a future version. Please use layer.add_weight method instead.
warnings.warn('layer.add_variable is deprecated and '
WARNING:tensorflow:From /usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/layers/legacy_rnn/rnn_cell_impl.py:987: calling Zeros.init (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/indexed_slices.py:449: UserWarning: Converting sparse IndexedSlices(IndexedSlices(indices=Tensor("gradients/GatherV2_8_grad/Reshape_1:0", shape=(None,), dtype=int32), values=Tensor("gradients/GatherV2_8_grad/Reshape:0", shape=(None, None, 100), dtype=float32), dense_shape=Tensor("gradients/GatherV2_8_grad/Cast:0", shape=(3,), dtype=int32))) to a dense Tensor of unknown shape. This may consume a large amount of memory.
"shape. This may consume a large amount of memory." % value)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/indexed_slices.py:449: UserWarning: Converting sparse IndexedSlices(IndexedSlices(indices=Tensor("gradients/GatherV2_7_grad/Reshape_1:0", shape=(None,), dtype=int32), values=Tensor("gradients/concat_2_grad/Slice_1:0", shape=(None, None, None), dtype=float32), dense_shape=Tensor("gradients/concat_2_grad/Shape:0", shape=(3,), dtype=int32))) to a dense Tensor of unknown shape. This may consume a large amount of memory.
"shape. This may consume a large amount of memory." % value)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/indexed_slices.py:449: UserWarning: Converting sparse IndexedSlices(IndexedSlices(indices=Tensor("gradients_1/GatherV2_8_grad/Reshape_1:0", shape=(None,), dtype=int32), values=Tensor("gradients_1/GatherV2_8_grad/Reshape:0", shape=(None, None, 100), dtype=float32), dense_shape=Tensor("gradients_1/GatherV2_8_grad/Cast:0", shape=(3,), dtype=int32))) to a dense Tensor of unknown shape. This may consume a large amount of memory.
"shape. This may consume a large amount of memory." % value)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/indexed_slices.py:449: UserWarning: Converting sparse IndexedSlices(IndexedSlices(indices=Tensor("gradients_1/GatherV2_7_grad/Reshape_1:0", shape=(None,), dtype=int32), values=Tensor("gradients_1/concat_2_grad/tuple/control_dependency:0", shape=(None, None, None), dtype=float32), dense_shape=Tensor("gradients_1/concat_2_grad/Shape:0", shape=(3,), dtype=int32))) to a dense Tensor of unknown shape. This may consume a large amount of memory.
"shape. This may consume a large amount of memory." % value)
initialize complete
2021-07-17 18:07:00.410532: I tensorflow/core/platform/profile_utils/cpu_utils.cc:114] CPU Frequency: 2000189999 Hz
0% 0/150 [00:00<?, ?it/s]epoch: 0
/content/graph_kt/data_process.py:253: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
target_answers = pad_sequences(np.array([[j[-1] - feature_size for j in i[1:]] for i in seqs]), maxlen=max_step - 1,
2021-07-17 18:07:08.694893: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2021-07-17 18:07:09.217026: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
0% 0/150 [00:08<?, ?it/s]
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py", line 1375, in _do_call
return fn(*args)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py", line 1360, in _run_fn
target_list, run_metadata)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py", line 1453, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: ConcatOp : Dimensions of inputs should match: shape[0] = [32,199,1,100] vs. shape[1] = [6400,199,3,100]
[[{{node concat_4}}]]
[[Sum_2/_201]]
(1) Invalid argument: ConcatOp : Dimensions of inputs should match: shape[0] = [32,199,1,100] vs. shape[1] = [6400,199,3,100]
[[{{node concat_4}}]]
0 successful operations.
0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/content/graph_kt/main.py", line 76, in
main()
File "/content/graph_kt/main.py", line 69, in main
train(args,train_dkt)
File "/content/graph_kt/train.py", line 50, in train
binary_pred, pred, loss = model.train(sess,features_answer_index,target_answers,seq_lens,hist_neighbor_index)
File "/content/graph_kt/model.py", line 393, in train
[self.binary_pred, self.pred, self.loss, self.train_op, self.flat_target_correctness], input_feed)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py", line 968, in run
run_metadata_ptr)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py", line 1191, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py", line 1369, in _do_run
run_metadata)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py", line 1394, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: ConcatOp : Dimensions of inputs should match: shape[0] = [32,199,1,100] vs. shape[1] = [6400,199,3,100]
[[node concat_4 (defined at content/graph_kt/model.py:157) ]]
[[Sum_2/_201]]
(1) Invalid argument: ConcatOp : Dimensions of inputs should match: shape[0] = [32,199,1,100] vs. shape[1] = [6400,199,3,100]
[[node concat_4 (defined at content/graph_kt/model.py:157) ]]
0 successful operations.
0 derived errors ignored.

Errors may have originated from an input operation.
Input Source operations connected to node concat_4:
Reshape_72 (defined at content/graph_kt/model.py:240)

Input Source operations connected to node concat_4:
Reshape_72 (defined at content/graph_kt/model.py:240)

Original stack trace for 'concat_4':
File "content/graph_kt/main.py", line 76, in
main()
File "content/graph_kt/main.py", line 69, in main
train(args,train_dkt)
File "content/graph_kt/train.py", line 18, in train
model = GIKT(args)
File "content/graph_kt/model.py", line 45, in init
self.build_model()
File "content/graph_kt/model.py", line 157, in build_model
Nh = tf.concat([tf.expand_dims(output_series, 2), self.hist_neighbors_features], 2) # [self.batch_size,max_step,M+1,feature_trans_size]]
File "usr/local/lib/python3.7/dist-packages/tensorflow/python/util/dispatch.py", line 206, in wrapper
return target(*args, **kwargs)
File "usr/local/lib/python3.7/dist-packages/tensorflow/python/ops/array_ops.py", line 1768, in concat
return gen_array_ops.concat_v2(values=values, axis=axis, name=name)
File "usr/local/lib/python3.7/dist-packages/tensorflow/python/ops/gen_array_ops.py", line 1228, in concat_v2
"ConcatV2", values=values, axis=axis, name=name)
File "usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 750, in _apply_op_helper
attrs=attr_protos, op_def=op_def)
File "usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/ops.py", line 3565, in _create_op_internal
op_def=op_def)
File "usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/ops.py", line 2045, in init
self._traceback = tf_stack.extract_stack_for_node(self._c_op)

Please Help me out.

@1191000814
Copy link

Hello, have you implemented it with tf2 ? can you give a link ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants