This is a good example to show how neural network transforms space
The target is to classify the data with 100% accuracy with minimum nodes/weights
The data is so simple for the human eyes but it is difficult for the machine to understand.
Your neural network will take a lot of epochs to get into the right direction, but infact in most cases it does not even reach the right direction. It is stuck at some local minimum. Hence you need to run the code more times to reach the random right initial point, where it converges.
To learn more you need to look at the various Optimization Algorithms: https://towardsdatascience.com/types-of-optimization-algorithms-used-in-neural-networks-and-ways-to-optimize-gradient-95ae5d39529f
and the loss functions: https://towardsdatascience.com/understanding-different-loss-functions-for-neural-networks-dd1ed0274718