-
Notifications
You must be signed in to change notification settings - Fork 784
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance issue #146
Comments
Thank you for your message, could you confirm that you are using a If this is not the case, then it may be that the
Are you talking about just training a regular MLP, or training a GraphNetwork with MLPs inside? Also, could you confirm that you are talking about using your CPU for training rather than GPU? Beyond the suggestion above, I cannot think of any reason for this to be unusually slow (beyond what is expected of CPU vs GPU) that is specific to the usage of the graph_nets library, and I would recommend to follow up with any general TF recommendations for running code as fast as possible on CPUs. |
Thank you for your comments. I do use compiled_update_step for training, thus the code is running in graph mode. Currently, I am training the model using only CPU. I found my training is executed only on one core, even though my computer has 16 cores. I am not sure why the code is not executed in parallel. I followed some suggestions from Intel, such as num_threads = 8 tf.config.threading.set_inter_op_parallelism_threads( This does not work. |
Thanks for your reply, could you confirm if:
|
I am referring to the graph neural network. My code is based on the example code in demo-tf2. I made some modifications to the EncodeProcessDecode class, where I used MLPs as update functions to nodes, edges, and globals embeddings. I also passed the signature argument to tf.function. input_signature = [ compiled_update_step = tf.function(update_step, input_signature=input_signature) Do you have any suggestions on how to make the code run on multiple CPU cores? |
Thanks for the clarification, could you confirm if you observe the same low CPU utilisation when running a single MLP in a similar setting, rather than a full Graph Network? |
I guess the reason that I cannot run the GNN on multiple CPU cores is that we cannot distribute the input data (GraphsTuple structure) to different cores. Did you ever run the demo examples on multiple cores or multiple GPU hardware? |
You mean that you have specific code for distributing the data for an MLP on CPU, but otherwise, if you don't distribute it explicitly, it does not use the multiple cores? Could you share an example of MLP code which successfully runs on multiple CPUs on your machine? I was assuming TensorFlow should be able to use multiple cores for things like matrix multiplication, without necessarily having to distribute the input data across CPU cores explicitly, but we never run on CPU for any serious training, so it is hard to say for sure. |
Here is an example of how to run CNN neural networks on multi-core CPU ( The previous example is for neural network training on multi-GPUs. I made some minor changes). This works because it created a distributed dataset. import sonnet as snt strategy = snt.distribute.Replicator( ["/device:CPU:{}".format(i) for i in range(1)], tf.distribute.ReductionToOneDevice("CPU:0")) NOTE: This is the batch size across all GPUs.batch_size = 100 * 4 def process_batch(images, labels): def cifar10(split): cifar10_train = cifar10("train").shuffle(10) learning_rate = 0.1 with strategy.scope(): #Training the model Aggregate the gradients from the full batch.replica_ctx = tf.distribute.get_replica_context() optimizer.apply(grads, model.trainable_variables) @tf.function def train_epoch(dataset): Loop over the entire training set.for images, labels in dataset: return total_loss / num_batches cifar10_train_dist = strategy.experimental_distribute_dataset(cifar10_train) for epoch in range(20): |
Thanks for the reply, yes this indeed is a way to run on multi device using replicator, but in principle it should not be necessary to use this to make use of multiple CPU cores. Unfortunately, it is not immediately obvious to do this with the library (You would need to build batches of batches, but because each batch has a different dimension, the second batch needs to be built with with something like I would recommend trying to follow up with TensorFlow directly if the solutions from this thread don't help. |
I am using the graph module to solve an ETA estimation problem. Could anybody tell me how to make the training faster by using all the available cores on my computer? I tried many things in TensorFlow 2.5, such as setting the number of thread by tf.config.threading.set_inter_op_parallelism_threads and tf.config.threading.set_intra_op_parallelism_threads, but nothing works. Training the MLP networks took a very long time.
The text was updated successfully, but these errors were encountered: