Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libdevice not found during training using default conda environment on Ubuntu 22.04.2 with a RTX A4000 #61

Closed
1 task
phcavelar opened this issue Jun 21, 2023 · 4 comments · Fixed by #62
Assignees
Labels
dependencies Pull requests related to dependencies

Comments

@phcavelar
Copy link

phcavelar commented Jun 21, 2023

Hello, just to let you know that when running molecule-generation train following the Readme.md, with the default conda environment, on Ubuntu 22.04.2 with a RTX A4000 fails by not finding libdevice, log below.

I've found that pinning Tensorflow to version 2.10 instead of 2.11 (latest version and installed automatically at time of writing) as per this stackoverflow question fixes it.

If you wish, I can open a PR to pin the TF version to be 2.10 or lower until this is fixed upstream as it was also cited as a solution for #56 , or else I'm at least posting this here so that other people can find this error and solution more easily.

Error Log
Avg weighted sum. of graph losses:  291.5334
Avg weighted sum. of prop losses:   0.5965
Avg node class. loss:                 71.0492
Avg first node class. loss:           40.7059
Avg edge selection loss:              1.7546
Avg edge type loss:                   4.0202
Avg attachment point selection loss:  1.1500
Avg KL divergence:                    6981316.0000
Property results: sa_score: MAE 10.77, MSE 3818.02 (norm MAE: 13.31) | clogp: MAE 23.54, MSE 13726.24 (norm MAE: 12.95) | mol_weight: MAE 393.53, MSE 168733.92 (norm MAE: 3.57).
   (Stored model metadata and weights to ~/data/moler/saved/GNN_Edge_MLP_MoLeR__2023-06-21_09-51-05_best.pkl).
2023-06-21 09:52:54.588760: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:446 : INTERNAL: libdevice not found at ./libdevice.10.bc
2023-06-21 09:52:54.595713: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:446 : INTERNAL: libdevice not found at ./libdevice.10.bc
2023-06-21 09:52:54.612998: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:446 : INTERNAL: libdevice not found at ./libdevice.10.bc
2023-06-21 09:52:54.618529: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:446 : INTERNAL: libdevice not found at ./libdevice.10.bc
2023-06-21 09:52:54.647620: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:446 : INTERNAL: libdevice not found at ./libdevice.10.bc
2023-06-21 09:52:54.663816: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:446 : INTERNAL: libdevice not found at ./libdevice.10.bc
2023-06-21 09:52:54.683986: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:446 : INTERNAL: libdevice not found at ./libdevice.10.bc
2023-06-21 09:52:54.702780: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:446 : INTERNAL: libdevice not found at ./libdevice.10.bc
2023-06-21 09:52:54.723474: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:446 : INTERNAL: libdevice not found at ./libdevice.10.bc
2023-06-21 09:52:54.741439: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:446 : INTERNAL: libdevice not found at ./libdevice.10.bc
Traceback (most recent call last):
  File "~/miniconda3/envs/moler-env/bin/molecule_generation", line 8, in <module>
    sys.exit(main())
  File "~/miniconda3/envs/moler-env/lib/python3.10/site-packages/molecule_generation/cli/cli.py", line 35, in main
    run_and_debug(lambda: commands[args.command].run_from_args(args), getattr(args, "debug", False))
  File "~/miniconda3/envs/moler-env/lib/python3.10/site-packages/dpu_utils/utils/debughelper.py", line 21, in run_and_debug
    func()
  File "~/miniconda3/envs/moler-env/lib/python3.10/site-packages/molecule_generation/cli/cli.py", line 35, in <lambda>
    run_and_debug(lambda: commands[args.command].run_from_args(args), getattr(args, "debug", False))
  File "~/miniconda3/envs/moler-env/lib/python3.10/site-packages/molecule_generation/cli/train.py", line 179, in run_from_args
    trained_model_path = train(
  File "~/miniconda3/envs/moler-env/lib/python3.10/site-packages/molecule_generation/cli/train.py", line 274, in train
    train_loss, train_speed, train_results = model.run_on_data_iterator(
  File "~/miniconda3/envs/moler-env/lib/python3.10/site-packages/molecule_generation/models/moler_base_model.py", line 244, in run_on_data_iterator
    task_metrics = self._run_step(batch_features, batch_labels, training)
  File "~/miniconda3/envs/moler-env/lib/python3.10/site-packages/tf2_gnn/models/graph_task_model.py", line 336, in _run_step
    return self._fast_run_step(batch_features_tuple, batch_labels_tuple, training)
  File "~/miniconda3/envs/moler-env/lib/python3.10/site-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "~/miniconda3/envs/moler-env/lib/python3.10/site-packages/tensorflow/python/eager/execute.py", line 52, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InternalError: Graph execution error:

Detected at node 'cond/StatefulPartitionedCall_122' defined at (most recent call last):
    File "~/miniconda3/envs/moler-env/bin/molecule_generation", line 8, in <module>
      sys.exit(main())
    File "~/miniconda3/envs/moler-env/lib/python3.10/site-packages/molecule_generation/cli/cli.py", line 35, in main
      run_and_debug(lambda: commands[args.command].run_from_args(args), getattr(args, "debug", False))
    File "~/miniconda3/envs/moler-env/lib/python3.10/site-packages/dpu_utils/utils/debughelper.py", line 21, in run_and_debug
      func()
    File "~/miniconda3/envs/moler-env/lib/python3.10/site-packages/molecule_generation/cli/cli.py", line 35, in <lambda>
      run_and_debug(lambda: commands[args.command].run_from_args(args), getattr(args, "debug", False))
    File "~/miniconda3/envs/moler-env/lib/python3.10/site-packages/molecule_generation/cli/train.py", line 179, in run_from_args
      trained_model_path = train(
    File "~/miniconda3/envs/moler-env/lib/python3.10/site-packages/molecule_generation/cli/train.py", line 252, in train
      _, _, initial_valid_results = model.run_on_data_iterator(
    File "~/miniconda3/envs/moler-env/lib/python3.10/site-packages/molecule_generation/models/moler_base_model.py", line 244, in run_on_data_iterator
      task_metrics = self._run_step(batch_features, batch_labels, training)
    File "~/miniconda3/envs/moler-env/lib/python3.10/site-packages/tf2_gnn/models/graph_task_model.py", line 336, in _run_step
      return self._fast_run_step(batch_features_tuple, batch_labels_tuple, training)
    File "~/miniconda3/envs/moler-env/lib/python3.10/site-packages/tf2_gnn/models/graph_task_model.py", line 363, in _fast_run_step
      tf.cond(training, true_fn=_training_update, false_fn=_no_op)
    File "~/miniconda3/envs/moler-env/lib/python3.10/site-packages/tf2_gnn/models/graph_task_model.py", line 357, in _training_update
      self._apply_gradients(zip(gradients, self.trainable_variables))
    File "~/miniconda3/envs/moler-env/lib/python3.10/site-packages/tf2_gnn/models/graph_task_model.py", line 324, in _apply_gradients
      self._optimizer.apply_gradients(gradient_variable_pairs)
    File "~/miniconda3/envs/moler-env/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1140, in apply_gradients
      return super().apply_gradients(grads_and_vars, name=name)
    File "~/miniconda3/envs/moler-env/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 634, in apply_gradients
      iteration = self._internal_apply_gradients(grads_and_vars)
    File "~/miniconda3/envs/moler-env/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1166, in _internal_apply_gradients
      return tf.__internal__.distribute.interim.maybe_merge_call(
    File "~/miniconda3/envs/moler-env/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1216, in _distributed_apply_gradients_fn
      distribution.extended.update(
    File "~/miniconda3/envs/moler-env/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1211, in apply_grad_to_update_var
      return self._update_step_xla(grad, var, id(self._var_key(var)))
Node: 'cond/StatefulPartitionedCall_122'
libdevice not found at ./libdevice.10.bc
	 [[{{node cond/StatefulPartitionedCall_122}}]] [Op:__inference__fast_run_step_84892]
Conda Environment before pip install

When I re-created the environment without the restriction this is the dependency list shown before installing molecule-generation:

# packages in environment at ~/miniconda3/envs/moler-env:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       2_gnu    conda-forge
absl-py                   1.4.0              pyhd8ed1ab_0    conda-forge
aiohttp                   3.8.4           py310h2372a71_1    conda-forge
aiosignal                 1.3.1              pyhd8ed1ab_0    conda-forge
astunparse                1.6.3              pyhd8ed1ab_0    conda-forge
async-timeout             4.0.2              pyhd8ed1ab_0    conda-forge
attrs                     23.1.0             pyh71513ae_1    conda-forge
blinker                   1.6.2              pyhd8ed1ab_0    conda-forge
boost                     1.78.0          py310hc4a4660_4    conda-forge
boost-cpp                 1.78.0               h6582d0a_3    conda-forge
brotli                    1.0.9                h166bdaf_8    conda-forge
brotli-bin                1.0.9                h166bdaf_8    conda-forge
brotlipy                  0.7.0           py310h5764c6d_1005    conda-forge
bzip2                     1.0.8                h7f98852_4    conda-forge
c-ares                    1.19.1               hd590300_0    conda-forge
ca-certificates           2023.5.7             hbcca054_0    conda-forge
cached-property           1.5.2                hd8ed1ab_1    conda-forge
cached_property           1.5.2              pyha770c72_1    conda-forge
cachetools                5.3.0              pyhd8ed1ab_0    conda-forge
cairo                     1.16.0            hbbf8b49_1016    conda-forge
certifi                   2023.5.7           pyhd8ed1ab_0    conda-forge
cffi                      1.15.1          py310h255011f_3    conda-forge
charset-normalizer        3.1.0              pyhd8ed1ab_0    conda-forge
click                     8.1.3           unix_pyhd8ed1ab_2    conda-forge
contourpy                 1.1.0           py310hd41b1e2_0    conda-forge
cryptography              41.0.1          py310h75e40e8_0    conda-forge
cuda-version              11.8                 h70ddcb2_2    conda-forge
cudatoolkit               11.8.0              h37601d7_11    conda-forge
cudnn                     8.8.0.121            h0800d71_1    conda-forge
cycler                    0.11.0             pyhd8ed1ab_0    conda-forge
expat                     2.5.0                hcb278e6_1    conda-forge
flatbuffers               23.3.3               hcb278e6_1    conda-forge
font-ttf-dejavu-sans-mono 2.37                 hab24e00_0    conda-forge
font-ttf-inconsolata      3.000                h77eed37_0    conda-forge
font-ttf-source-code-pro  2.038                h77eed37_0    conda-forge
font-ttf-ubuntu           0.83                 hab24e00_0    conda-forge
fontconfig                2.14.2               h14ed4e7_0    conda-forge
fonts-conda-ecosystem     1                             0    conda-forge
fonts-conda-forge         1                             0    conda-forge
fonttools                 4.40.0          py310h2372a71_0    conda-forge
freetype                  2.12.1               hca18f0e_1    conda-forge
frozenlist                1.3.3           py310h5764c6d_0    conda-forge
gast                      0.4.0              pyh9f0ad1d_0    conda-forge
gettext                   0.21.1               h27087fc_0    conda-forge
giflib                    5.2.1                h0b41bf4_3    conda-forge
google-auth               2.20.0             pyh1a96a4e_0    conda-forge
google-auth-oauthlib      0.4.6              pyhd8ed1ab_0    conda-forge
google-pasta              0.2.0              pyh8c360ce_0    conda-forge
greenlet                  2.0.2           py310hc6cd4ac_1    conda-forge
grpcio                    1.51.1          py310h4a5735c_1    conda-forge
h5py                      3.9.0           nompi_py310h367e799_100    conda-forge
hdf5                      1.14.0          nompi_hb72d44e_103    conda-forge
icu                       72.1                 hcb278e6_0    conda-forge
idna                      3.4                pyhd8ed1ab_0    conda-forge
importlib-metadata        6.7.0              pyha770c72_0    conda-forge
keras                     2.11.0             pyhd8ed1ab_0    conda-forge
keras-preprocessing       1.1.2              pyhd8ed1ab_0    conda-forge
keyutils                  1.6.1                h166bdaf_0    conda-forge
kiwisolver                1.4.4           py310hbf28c38_1    conda-forge
krb5                      1.20.1               h81ceb04_0    conda-forge
lcms2                     2.15                 haa2dc70_1    conda-forge
ld_impl_linux-64          2.40                 h41732ed_0    conda-forge
lerc                      4.0.0                h27087fc_0    conda-forge
libabseil                 20220623.0      cxx17_h05df665_6    conda-forge
libaec                    1.0.6                hcb278e6_1    conda-forge
libblas                   3.9.0           17_linux64_openblas    conda-forge
libbrotlicommon           1.0.9                h166bdaf_8    conda-forge
libbrotlidec              1.0.9                h166bdaf_8    conda-forge
libbrotlienc              1.0.9                h166bdaf_8    conda-forge
libcblas                  3.9.0           17_linux64_openblas    conda-forge
libcurl                   8.1.2                h409715c_0    conda-forge
libdeflate                1.18                 h0b41bf4_0    conda-forge
libedit                   3.1.20191231         he28a2e2_2    conda-forge
libev                     4.33                 h516909a_1    conda-forge
libexpat                  2.5.0                hcb278e6_1    conda-forge
libffi                    3.4.2                h7f98852_5    conda-forge
libgcc-ng                 13.1.0               he5830b7_0    conda-forge
libgfortran-ng            13.1.0               h69a702a_0    conda-forge
libgfortran5              13.1.0               h15d22d2_0    conda-forge
libglib                   2.76.3               hebfc3b9_0    conda-forge
libgomp                   13.1.0               he5830b7_0    conda-forge
libgrpc                   1.51.1               h4fad500_1    conda-forge
libiconv                  1.17                 h166bdaf_0    conda-forge
libjpeg-turbo             2.1.5.1              h0b41bf4_0    conda-forge
liblapack                 3.9.0           17_linux64_openblas    conda-forge
libnghttp2                1.52.0               h61bc06f_0    conda-forge
libnsl                    2.0.0                h7f98852_0    conda-forge
libopenblas               0.3.23          pthreads_h80387f5_0    conda-forge
libpng                    1.6.39               h753d276_0    conda-forge
libprotobuf               3.21.12              h3eb15da_0    conda-forge
libsqlite                 3.42.0               h2797004_0    conda-forge
libssh2                   1.11.0               h0841786_0    conda-forge
libstdcxx-ng              13.1.0               hfd8a6a1_0    conda-forge
libtiff                   4.5.1                h8b53f26_0    conda-forge
libuuid                   2.38.1               h0b41bf4_0    conda-forge
libwebp-base              1.3.0                h0b41bf4_0    conda-forge
libxcb                    1.15                 h0b41bf4_0    conda-forge
libzlib                   1.2.13               hd590300_5    conda-forge
markdown                  3.4.3              pyhd8ed1ab_0    conda-forge
markupsafe                2.1.3           py310h2372a71_0    conda-forge
matplotlib-base           3.7.1           py310he60537e_0    conda-forge
multidict                 6.0.4           py310h1fa729e_0    conda-forge
munkres                   1.1.4              pyh9f0ad1d_0    conda-forge
nccl                      2.18.3.1             h12f7317_0    conda-forge
ncurses                   6.4                  hcb278e6_0    conda-forge
numpy                     1.25.0          py310ha4c1d20_0    conda-forge
oauthlib                  3.2.2              pyhd8ed1ab_0    conda-forge
openjpeg                  2.5.0                hfec8fc6_2    conda-forge
openssl                   3.1.1                hd590300_1    conda-forge
opt_einsum                3.3.0              pyhd8ed1ab_1    conda-forge
packaging                 23.1               pyhd8ed1ab_0    conda-forge
pandas                    2.0.2           py310h7cbd5c2_0    conda-forge
pcre2                     10.40                hc3806b6_0    conda-forge
pillow                    9.5.0           py310h582fbeb_1    conda-forge
pip                       23.1.2             pyhd8ed1ab_0    conda-forge
pixman                    0.40.0               h36c2ea0_0    conda-forge
platformdirs              3.6.0              pyhd8ed1ab_0    conda-forge
pooch                     1.7.0              pyha770c72_3    conda-forge
protobuf                  4.21.12         py310heca2aa9_0    conda-forge
pthread-stubs             0.4               h36c2ea0_1001    conda-forge
pyasn1                    0.4.8                      py_0    conda-forge
pyasn1-modules            0.2.7                      py_0    conda-forge
pycairo                   1.24.0          py310hda9f760_0    conda-forge
pycparser                 2.21               pyhd8ed1ab_0    conda-forge
pyjwt                     2.7.0              pyhd8ed1ab_0    conda-forge
pyopenssl                 23.2.0             pyhd8ed1ab_1    conda-forge
pyparsing                 3.1.0              pyhd8ed1ab_0    conda-forge
pysocks                   1.7.1              pyha2e5f31_6    conda-forge
python                    3.10.11         he550d4f_0_cpython    conda-forge
python-dateutil           2.8.2              pyhd8ed1ab_0    conda-forge
python-flatbuffers        23.5.26            pyhd8ed1ab_0    conda-forge
python-tzdata             2023.3             pyhd8ed1ab_0    conda-forge
python_abi                3.10                    3_cp310    conda-forge
pytz                      2023.3             pyhd8ed1ab_0    conda-forge
pyu2f                     0.1.5              pyhd8ed1ab_0    conda-forge
rdkit                     2023.03.2       py310h399bcf7_0    conda-forge
re2                       2023.02.01           hcb278e6_0    conda-forge
readline                  8.2                  h8228510_1    conda-forge
reportlab                 3.6.13          py310h1a56a1c_0    conda-forge
requests                  2.31.0             pyhd8ed1ab_0    conda-forge
requests-oauthlib         1.3.1              pyhd8ed1ab_0    conda-forge
rsa                       4.9                pyhd8ed1ab_0    conda-forge
scipy                     1.10.1          py310ha4c1d20_3    conda-forge
setuptools                67.7.2             pyhd8ed1ab_0    conda-forge
six                       1.16.0             pyh6c4a22f_0    conda-forge
snappy                    1.1.10               h9fff704_0    conda-forge
sqlalchemy                2.0.16          py310h2372a71_0    conda-forge
tensorboard               2.11.2             pyhd8ed1ab_0    conda-forge
tensorboard-data-server   0.6.1           py310h600f1e7_4    conda-forge
tensorboard-plugin-wit    1.8.1              pyhd8ed1ab_0    conda-forge
tensorflow                2.11.1          cuda112py310he87a039_0    conda-forge
tensorflow-base           2.11.1          cuda112py310h4c92a00_0    conda-forge
tensorflow-estimator      2.11.1          cuda112py310h37add04_0    conda-forge
termcolor                 2.3.0              pyhd8ed1ab_0    conda-forge
tk                        8.6.12               h27826a3_0    conda-forge
typing-extensions         4.6.3                hd8ed1ab_0    conda-forge
typing_extensions         4.6.3              pyha770c72_0    conda-forge
tzdata                    2023c                h71feb2d_0    conda-forge
unicodedata2              15.0.0          py310h5764c6d_0    conda-forge
urllib3                   1.26.15            pyhd8ed1ab_0    conda-forge
werkzeug                  2.3.6              pyhd8ed1ab_0    conda-forge
wheel                     0.40.0             pyhd8ed1ab_0    conda-forge
wrapt                     1.15.0          py310h1fa729e_0    conda-forge
xorg-kbproto              1.0.7             h7f98852_1002    conda-forge
xorg-libice               1.1.1                hd590300_0    conda-forge
xorg-libsm                1.2.4                h7391055_0    conda-forge
xorg-libx11               1.8.6                h8ee46fc_0    conda-forge
xorg-libxau               1.0.11               hd590300_0    conda-forge
xorg-libxdmcp             1.1.3                h7f98852_0    conda-forge
xorg-libxext              1.3.4                h0b41bf4_2    conda-forge
xorg-libxrender           0.9.10            h7f98852_1003    conda-forge
xorg-renderproto          0.11.1            h7f98852_1002    conda-forge
xorg-xextproto            7.3.0             h0b41bf4_1003    conda-forge
xorg-xproto               7.0.31            h7f98852_1007    conda-forge
xz                        5.2.6                h166bdaf_0    conda-forge
yarl                      1.9.2           py310h2372a71_0    conda-forge
zipp                      3.15.0             pyhd8ed1ab_0    conda-forge
zlib                      1.2.13               hd590300_5    conda-forge
zstd                      1.5.2                h3eb15da_6    conda-forge
Conda environment after pip install

And after running pip install molecule-generation:

# packages in environment at ~/miniconda3/envs/moler-env:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       2_gnu    conda-forge
absl-py                   1.4.0              pyhd8ed1ab_0    conda-forge
aiohttp                   3.8.4           py310h2372a71_1    conda-forge
aiosignal                 1.3.1              pyhd8ed1ab_0    conda-forge
astunparse                1.6.3              pyhd8ed1ab_0    conda-forge
async-timeout             4.0.2              pyhd8ed1ab_0    conda-forge
attrs                     23.1.0             pyh71513ae_1    conda-forge
azure-core                1.27.1                   pypi_0    pypi
azure-identity            1.13.0                   pypi_0    pypi
azure-storage-blob        12.16.0                  pypi_0    pypi
blinker                   1.6.2              pyhd8ed1ab_0    conda-forge
boost                     1.78.0          py310hc4a4660_4    conda-forge
boost-cpp                 1.78.0               h6582d0a_3    conda-forge
brotli                    1.0.9                h166bdaf_8    conda-forge
brotli-bin                1.0.9                h166bdaf_8    conda-forge
brotlipy                  0.7.0           py310h5764c6d_1005    conda-forge
bzip2                     1.0.8                h7f98852_4    conda-forge
c-ares                    1.19.1               hd590300_0    conda-forge
ca-certificates           2023.5.7             hbcca054_0    conda-forge
cached-property           1.5.2                hd8ed1ab_1    conda-forge
cached_property           1.5.2              pyha770c72_1    conda-forge
cachetools                5.3.0              pyhd8ed1ab_0    conda-forge
cairo                     1.16.0            hbbf8b49_1016    conda-forge
certifi                   2023.5.7           pyhd8ed1ab_0    conda-forge
cffi                      1.15.1          py310h255011f_3    conda-forge
charset-normalizer        3.1.0              pyhd8ed1ab_0    conda-forge
click                     8.1.3           unix_pyhd8ed1ab_2    conda-forge
contourpy                 1.1.0           py310hd41b1e2_0    conda-forge
cryptography              41.0.1          py310h75e40e8_0    conda-forge
cuda-version              11.8                 h70ddcb2_2    conda-forge
cudatoolkit               11.8.0              h37601d7_11    conda-forge
cudnn                     8.8.0.121            h0800d71_1    conda-forge
cycler                    0.11.0             pyhd8ed1ab_0    conda-forge
docopt                    0.6.2                    pypi_0    pypi
dpu-utils                 0.6.1                    pypi_0    pypi
expat                     2.5.0                hcb278e6_1    conda-forge
flatbuffers               23.3.3               hcb278e6_1    conda-forge
font-ttf-dejavu-sans-mono 2.37                 hab24e00_0    conda-forge
font-ttf-inconsolata      3.000                h77eed37_0    conda-forge
font-ttf-source-code-pro  2.038                h77eed37_0    conda-forge
font-ttf-ubuntu           0.83                 hab24e00_0    conda-forge
fontconfig                2.14.2               h14ed4e7_0    conda-forge
fonts-conda-ecosystem     1                             0    conda-forge
fonts-conda-forge         1                             0    conda-forge
fonttools                 4.40.0          py310h2372a71_0    conda-forge
freetype                  2.12.1               hca18f0e_1    conda-forge
frozenlist                1.3.3           py310h5764c6d_0    conda-forge
gast                      0.4.0              pyh9f0ad1d_0    conda-forge
gettext                   0.21.1               h27087fc_0    conda-forge
giflib                    5.2.1                h0b41bf4_3    conda-forge
google-auth               2.20.0             pyh1a96a4e_0    conda-forge
google-auth-oauthlib      0.4.6              pyhd8ed1ab_0    conda-forge
google-pasta              0.2.0              pyh8c360ce_0    conda-forge
greenlet                  2.0.2           py310hc6cd4ac_1    conda-forge
grpcio                    1.51.1          py310h4a5735c_1    conda-forge
h5py                      3.9.0           nompi_py310h367e799_100    conda-forge
hdf5                      1.14.0          nompi_hb72d44e_103    conda-forge
icu                       72.1                 hcb278e6_0    conda-forge
idna                      3.4                pyhd8ed1ab_0    conda-forge
importlib-metadata        6.7.0              pyha770c72_0    conda-forge
isodate                   0.6.1                    pypi_0    pypi
joblib                    1.2.0                    pypi_0    pypi
keras                     2.11.0             pyhd8ed1ab_0    conda-forge
keras-preprocessing       1.1.2              pyhd8ed1ab_0    conda-forge
keyutils                  1.6.1                h166bdaf_0    conda-forge
kiwisolver                1.4.4           py310hbf28c38_1    conda-forge
krb5                      1.20.1               h81ceb04_0    conda-forge
lcms2                     2.15                 haa2dc70_1    conda-forge
ld_impl_linux-64          2.40                 h41732ed_0    conda-forge
lerc                      4.0.0                h27087fc_0    conda-forge
libabseil                 20220623.0      cxx17_h05df665_6    conda-forge
libaec                    1.0.6                hcb278e6_1    conda-forge
libblas                   3.9.0           17_linux64_openblas    conda-forge
libbrotlicommon           1.0.9                h166bdaf_8    conda-forge
libbrotlidec              1.0.9                h166bdaf_8    conda-forge
libbrotlienc              1.0.9                h166bdaf_8    conda-forge
libcblas                  3.9.0           17_linux64_openblas    conda-forge
libcurl                   8.1.2                h409715c_0    conda-forge
libdeflate                1.18                 h0b41bf4_0    conda-forge
libedit                   3.1.20191231         he28a2e2_2    conda-forge
libev                     4.33                 h516909a_1    conda-forge
libexpat                  2.5.0                hcb278e6_1    conda-forge
libffi                    3.4.2                h7f98852_5    conda-forge
libgcc-ng                 13.1.0               he5830b7_0    conda-forge
libgfortran-ng            13.1.0               h69a702a_0    conda-forge
libgfortran5              13.1.0               h15d22d2_0    conda-forge
libglib                   2.76.3               hebfc3b9_0    conda-forge
libgomp                   13.1.0               he5830b7_0    conda-forge
libgrpc                   1.51.1               h4fad500_1    conda-forge
libiconv                  1.17                 h166bdaf_0    conda-forge
libjpeg-turbo             2.1.5.1              h0b41bf4_0    conda-forge
liblapack                 3.9.0           17_linux64_openblas    conda-forge
libnghttp2                1.52.0               h61bc06f_0    conda-forge
libnsl                    2.0.0                h7f98852_0    conda-forge
libopenblas               0.3.23          pthreads_h80387f5_0    conda-forge
libpng                    1.6.39               h753d276_0    conda-forge
libprotobuf               3.21.12              h3eb15da_0    conda-forge
libsqlite                 3.42.0               h2797004_0    conda-forge
libssh2                   1.11.0               h0841786_0    conda-forge
libstdcxx-ng              13.1.0               hfd8a6a1_0    conda-forge
libtiff                   4.5.1                h8b53f26_0    conda-forge
libuuid                   2.38.1               h0b41bf4_0    conda-forge
libwebp-base              1.3.0                h0b41bf4_0    conda-forge
libxcb                    1.15                 h0b41bf4_0    conda-forge
libzlib                   1.2.13               hd590300_5    conda-forge
markdown                  3.4.3              pyhd8ed1ab_0    conda-forge
markupsafe                2.1.3           py310h2372a71_0    conda-forge
matplotlib-base           3.7.1           py310he60537e_0    conda-forge
molecule-generation       0.4.0                    pypi_0    pypi
more-itertools            9.1.0                    pypi_0    pypi
msal                      1.22.0                   pypi_0    pypi
msal-extensions           1.0.0                    pypi_0    pypi
multidict                 6.0.4           py310h1fa729e_0    conda-forge
munkres                   1.1.4              pyh9f0ad1d_0    conda-forge
nccl                      2.18.3.1             h12f7317_0    conda-forge
ncurses                   6.4                  hcb278e6_0    conda-forge
numpy                     1.25.0          py310ha4c1d20_0    conda-forge
oauthlib                  3.2.2              pyhd8ed1ab_0    conda-forge
openjpeg                  2.5.0                hfec8fc6_2    conda-forge
openssl                   3.1.1                hd590300_1    conda-forge
opt_einsum                3.3.0              pyhd8ed1ab_1    conda-forge
packaging                 23.1               pyhd8ed1ab_0    conda-forge
pandas                    2.0.2           py310h7cbd5c2_0    conda-forge
pcre2                     10.40                hc3806b6_0    conda-forge
pillow                    9.5.0           py310h582fbeb_1    conda-forge
pip                       23.1.2             pyhd8ed1ab_0    conda-forge
pixman                    0.40.0               h36c2ea0_0    conda-forge
platformdirs              3.6.0              pyhd8ed1ab_0    conda-forge
pooch                     1.7.0              pyha770c72_3    conda-forge
portalocker               2.7.0                    pypi_0    pypi
protobuf                  3.20.3                   pypi_0    pypi
pthread-stubs             0.4               h36c2ea0_1001    conda-forge
pyasn1                    0.4.8                      py_0    conda-forge
pyasn1-modules            0.2.7                      py_0    conda-forge
pycairo                   1.24.0          py310hda9f760_0    conda-forge
pycparser                 2.21               pyhd8ed1ab_0    conda-forge
pyjwt                     2.7.0              pyhd8ed1ab_0    conda-forge
pyopenssl                 23.2.0             pyhd8ed1ab_1    conda-forge
pyparsing                 3.1.0              pyhd8ed1ab_0    conda-forge
pysocks                   1.7.1              pyha2e5f31_6    conda-forge
python                    3.10.11         he550d4f_0_cpython    conda-forge
python-dateutil           2.8.2              pyhd8ed1ab_0    conda-forge
python-flatbuffers        23.5.26            pyhd8ed1ab_0    conda-forge
python-tzdata             2023.3             pyhd8ed1ab_0    conda-forge
python_abi                3.10                    3_cp310    conda-forge
pytz                      2023.3             pyhd8ed1ab_0    conda-forge
pyu2f                     0.1.5              pyhd8ed1ab_0    conda-forge
rdkit                     2023.03.2       py310h399bcf7_0    conda-forge
re2                       2023.02.01           hcb278e6_0    conda-forge
readline                  8.2                  h8228510_1    conda-forge
regex                     2023.6.3                 pypi_0    pypi
reportlab                 3.6.13          py310h1a56a1c_0    conda-forge
requests                  2.31.0             pyhd8ed1ab_0    conda-forge
requests-oauthlib         1.3.1              pyhd8ed1ab_0    conda-forge
rsa                       4.9                pyhd8ed1ab_0    conda-forge
scikit-learn              1.2.2                    pypi_0    pypi
scipy                     1.10.1          py310ha4c1d20_3    conda-forge
sentencepiece             0.1.99                   pypi_0    pypi
setsimilaritysearch       1.0.1                    pypi_0    pypi
setuptools                67.7.2             pyhd8ed1ab_0    conda-forge
six                       1.16.0             pyh6c4a22f_0    conda-forge
snappy                    1.1.10               h9fff704_0    conda-forge
sqlalchemy                2.0.16          py310h2372a71_0    conda-forge
tensorboard               2.11.2             pyhd8ed1ab_0    conda-forge
tensorboard-data-server   0.6.1           py310h600f1e7_4    conda-forge
tensorboard-plugin-wit    1.8.1              pyhd8ed1ab_0    conda-forge
tensorflow                2.11.1          cuda112py310he87a039_0    conda-forge
tensorflow-base           2.11.1          cuda112py310h4c92a00_0    conda-forge
tensorflow-estimator      2.11.1          cuda112py310h37add04_0    conda-forge
termcolor                 2.3.0              pyhd8ed1ab_0    conda-forge
tf2-gnn                   2.13.0                   pypi_0    pypi
threadpoolctl             3.1.0                    pypi_0    pypi
tk                        8.6.12               h27826a3_0    conda-forge
tqdm                      4.65.0                   pypi_0    pypi
typing-extensions         4.6.3                hd8ed1ab_0    conda-forge
typing_extensions         4.6.3              pyha770c72_0    conda-forge
tzdata                    2023c                h71feb2d_0    conda-forge
unicodedata2              15.0.0          py310h5764c6d_0    conda-forge
urllib3                   1.26.15            pyhd8ed1ab_0    conda-forge
werkzeug                  2.3.6              pyhd8ed1ab_0    conda-forge
wheel                     0.40.0             pyhd8ed1ab_0    conda-forge
wrapt                     1.15.0          py310h1fa729e_0    conda-forge
xorg-kbproto              1.0.7             h7f98852_1002    conda-forge
xorg-libice               1.1.1                hd590300_0    conda-forge
xorg-libsm                1.2.4                h7391055_0    conda-forge
xorg-libx11               1.8.6                h8ee46fc_0    conda-forge
xorg-libxau               1.0.11               hd590300_0    conda-forge
xorg-libxdmcp             1.1.3                h7f98852_0    conda-forge
xorg-libxext              1.3.4                h0b41bf4_2    conda-forge
xorg-libxrender           0.9.10            h7f98852_1003    conda-forge
xorg-renderproto          0.11.1            h7f98852_1002    conda-forge
xorg-xextproto            7.3.0             h0b41bf4_1003    conda-forge
xorg-xproto               7.0.31            h7f98852_1007    conda-forge
xz                        5.2.6                h166bdaf_0    conda-forge
yarl                      1.9.2           py310h2372a71_0    conda-forge
zipp                      315.0             pyhd8ed1ab_0    conda-forge
zlib                      1.2.13               hd590300_5    conda-forge
zstd                      1.5.2                h3eb15da_6    conda-forge

Tasks

@kmaziarz kmaziarz self-assigned this Jun 25, 2023
@kmaziarz kmaziarz added the dependencies Pull requests related to dependencies label Jun 25, 2023
@kmaziarz
Copy link
Collaborator

Does this happen only on very specific machines? Is it considered a bug in tensorflow that is expected to be fixed in a future version? I'm wondering what is the right course of action for us, given that environment.yml is also used in CI (and the versions are unpinned there to detect when newest tensorflow breaks our code)...

@phcavelar
Copy link
Author

I don't know how specific this would be to Ubuntu or to the GPU I was running on, since the Stackoverflow question doesn't specify which GPU they had when that problem started.

However, I saw that the only CI job that uses TF 2.11 installs the CPU version of Tensorflow:

     + tensorflow                       2.11.1  cpu_py310hd1aba9c_0      conda-forge       31kB

Which might be the reason why the CI pipeline isn't catching the problem. I understand that it might not be practical to run a CI job with a machine with a GPU, but, if you have the resources for it, it might something to consider, since the code in this repo is most likely going to be ran with a GPU and it is exactly my GPU setup that failed. This will allow you be able to catch these nasty CUDA-related bugs by doing so, at least so that people are aware of which versions might fail in the future.

For now, I think just having this thread might be enough to help anyone that stumbles upon this issue, but it'd be even better to put it as a warning on the readme as some people might not always look up past issues before opening a new one.

@kmaziarz
Copy link
Collaborator

I saw that the only CI job that uses TF 2.11 installs the CPU version of Tensorflow

Yes, I guess that's expected; I wouldn't expect the standard CI agents to have a GPU. That being said, I noticed that the Python 3.8 build seems to be installing CUDA libraries and GPU-enabled Tensorflow... I need to take a closer look to understand what's going on here.

Coming back to your issue, the Tensorflow install guide has a section called "Ubuntu 22.04" which seems to talk about the exact problem you're having? They mention a way to fix it which does not involve downgrading Tensorflow. If this is indeed a fix, then maybe I can mention that in the README.md explicitly (we already point to the Tensorflow website for guidelines on installation, just without specific details).

@kmaziarz
Copy link
Collaborator

(For me the instructions from the Tensorflow website resolve the issue, and training under 2.13.0 seems to be working)

kmaziarz added a commit that referenced this issue Aug 9, 2023
This PR expands on the Tensorflow troubleshooting section in
`README.md`, taking into account how installing the newest versions on
Ubuntu 22.04 requires extra care (fixes #61). On top of this, I also
relaxed the pin on `protobuf` version in `setup.py`; I'm not sure why
the lower bound was introduced, but some of the `tensorflow` versions
actually conflict with it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Pull requests related to dependencies
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants