Serializing Time Series Forecasts #268
Replies: 30 comments
-
How are you doing the inference? |
Beta Was this translation helpful? Give feedback.
-
Thank you for responding, and I hope this is what you're looking for: I have a pandas dataset, and I feed sixty time steps and ask for the next 30 for one column. Here is the code:
The predictions are really great, but getting them ready for the road is the challenge I'm having. I've tried different architectures, to no avail.
|
Beta Was this translation helpful? Give feedback.
-
that code is for the training, what about the inference? |
Beta Was this translation helpful? Give feedback.
-
Ah, yes of course.
|
Beta Was this translation helpful? Give feedback.
-
so, if you do |
Beta Was this translation helpful? Give feedback.
-
Hi @DonRomaniello, X, y, splits = get_regression_data('Covid3Month', split_data=False)
y_multistep = y.reshape(-1,1).repeat(3, 1) # repeat steps to simulate a 3 step forecast
tfms = [None, TSRegression()]
batch_tfms = TSStandardize(by_sample=True, by_var=True)
dls = get_ts_dls(X, y_multistep, splits=splits, tfms=tfms, batch_tfms=batch_tfms)
learn = ts_learner(dls, arch=TCN, metrics=[mae, rmse], cbs=[ShowGraph()])
learn.fit_one_cycle(2)
p, *_ = learn.get_X_preds(X)
print(p.shape) torch.Size([201, 3]) # output PATH = Path('./models/Regression.pkl')
PATH.parent.mkdir(parents=True, exist_ok=True)
learn.export(PATH)
del learn PATH = Path('./models/Regression.pkl')
learn = load_learner(PATH, cpu=True)
p2, *_ = learn.get_X_preds(X)
print(p2.shape)
torch.equal(p, p2) torch.Size([201, 3]) # output I'm not sure if you are following a different process, but his is working well. |
Beta Was this translation helpful? Give feedback.
-
@vrodriguezf, correct.
The code you shared is working for the example you provided, however when I applied it to my code it isn't having the same effect. |
Beta Was this translation helpful? Give feedback.
-
It’d be good if you can find the difference between your code and the one I shared. |
Beta Was this translation helpful? Give feedback.
-
The only thing I can think of is that I am using SlidingWindow and get_splits, but if the dataloaders stay the same shouldn't the model have similar predictions? |
Beta Was this translation helpful? Give feedback.
-
That shouldn’t have an impact on the saved learner. |
Beta Was this translation helpful? Give feedback.
-
Sure thing:
|
Beta Was this translation helpful? Give feedback.
-
I don’t see anything strange. I’m sorry but I don’t know how to help. |
Beta Was this translation helpful? Give feedback.
-
Thank you, and to be honest I am relieved that I wasn't missing something. I'll try manually creating the sliding windows and see if that does anything. |
Beta Was this translation helpful? Give feedback.
-
Actually, I wonder if this sheds any light: When I load the model and then try to fit_one_cycle, I get this:
|
Beta Was this translation helpful? Give feedback.
-
It seems that the batch size has been lost somehow. Try setting it manually ( |
Beta Was this translation helpful? Give feedback.
-
When you save or export a Learner object the dataset is not serialized. That's why you can't train it further. To do it you'd need to recreate the dataloaders. I'm curious when you say predictions are different, what do you mean? are they still created but with different values? |
Beta Was this translation helpful? Give feedback.
-
<It seems that the batch size has been lost somehow. Try setting it manually (learn.dls.train.bs and learn.dls.valid.bs) and see if that helps> It helped push the problem down the road a little...
@oguiza I've tried recreating the learner and then simply replacing the model with
but the predictions have different values from the same input data. I've rerun the code you provided on my data, here is the output:
torch.Size([479, 30])
torch.Size([479, 30]) Same problem even when manually setting the dataloaders with @vrodriguezf's method. Seriously, thank you for helping with this. |
Beta Was this translation helpful? Give feedback.
-
I'm afraid I'm unable to help. |
Beta Was this translation helpful? Give feedback.
-
Same problem with dummy data:
torch.Size([60, 5]) 0.2932287153601242
torch.Size([60, 5]) 0.2932296804365857 Although, the MSE is much closer than with my data. |
Beta Was this translation helpful? Give feedback.
-
Ok, I've tried it and while it's true that there's a difference between the predictions, it's minor. torch.max(p - p2) and the max diff is tensor(1.2338e-05). Edit: |
Beta Was this translation helpful? Give feedback.
-
I've found the root cause. There is a difference because the learner initially creates the predictions on the GPU. When you load the model it creates them on the CPU. If you change it to cpu=False, then there's no difference. |
Beta Was this translation helpful? Give feedback.
-
OK, so it sounds like if I want to deploy this on a CPU, I have to train it on a CPU? |
Beta Was this translation helpful? Give feedback.
-
Well... I did a little test, and am not sure if the GPU to CPU change is the issue:
torch.Size([2999, 5]) 0.256418886837684
torch.Size([2999, 5]) 0.256418886837684
torch.Size([2999, 5]) 0.2570435682650654 Obviously the difference is very small, but it is interesting that the issue seems to happen somewhere in here:
Edit: Hold on, are the dataloaders also on the GPU? |
Beta Was this translation helpful? Give feedback.
-
I've isolated the issue, it's the dataloaders going from GPU to CPU.
torch.Size([2999, 5]) 0.5691404370332382
p2, *_ = learn.get_X_preds(X)
|
Beta Was this translation helpful? Give feedback.
-
I will try training the model on GPU with the dataloaders on CPU this evening when EC2 capacity is available and will report back. Thank you @oguiza and @vrodriguezf for all the help. |
Beta Was this translation helpful? Give feedback.
-
Hi @DonRomaniello. An issue that can occur when going between CPU and GPU is ordering sensitivity for floating point numbers, particularly with respect to summation operations. Below is a simple example
In theory the associativity principle should yield the same answer in all of the above cases, however, limited mantissa precision can result in differences in the least significant digits depending on the order of evaluation and disparity of value magnitude. GPUs and more advanced scalar compilers will reorder operations to enhance parallelism, so some of this variability in lower order digits is to be expected. Using increased floating point precision (e.g.: FP64) can increase CPU/GPU agreement but at the cost of performance and memory consumption. |
Beta Was this translation helpful? Give feedback.
-
Thank you for the breakdown. So, it looks like if you're willing to trade off speed for removing this artifact, I found that moving the dataloaders onto the CPU before training allows for an export and import without any changes in the predictions.
Before training led to the results being the same after exporting and importing. |
Beta Was this translation helpful? Give feedback.
-
Hi @DonRomaniello, learn = load_learner(PATH, cpu=False) If for any reason you can't do that, you need to understand that there'll be a very minor difference between training and your predictions. Max difference usually less than 1e-5. I think we have debated this and found the root cause of the difference and the way to avoid it. This is clearly not a tsai-related issue. Are you ok if I close this issue? Or should I move it to discussions? |
Beta Was this translation helpful? Give feedback.
-
I agree that the issue is not tsai-related, but could we move it to discussions? Even though the issue is outside of tsai, it might be interesting to keep pursuing this. I'm wondering if I can find a way to do most of the training on the GPU, move it to the CPU, then run a few more cycles to try to tune it better. On the dummy data the differences were pretty small, but on my dataset the differences end up washing out some pretty significant trends that had been spot on when CPU-CPU or GPU-GPU. |
Beta Was this translation helpful? Give feedback.
-
@DonRomaniello, I think the point that is being missed is that neither cpu or gpu is more accurate, they are just different due to data dependant rounding effects. Although the model produces results in FP32, no results are to full FP32 precision (@oguiza suggests 1e-5 tol as a good rule of thumb). If you need more accuracy, try FP64, but I suspect other stochastic considerations limit true accuracy to much less. If you are just looking for consistency, the use np.isclose or np.allclose to compare results using the tolerance @oguiza recomments . Modifying tsai for mixed cpu/gpu is unlikely to yield the results you are trying to achieve. |
Beta Was this translation helpful? Give feedback.
-
I've been getting great results on forecasting a multistep horizon for a multivariate time series, but am having a lot of trouble exporting or saving the model to use on other machines or even in the same Jupyter Notebook.
I create the learner with ts_learner, train it, but when I use learner.save or learner.export, the imported model doesn't have the same predictions.
Any help would be appreciated.
Beta Was this translation helpful? Give feedback.
All reactions