-
Notifications
You must be signed in to change notification settings - Fork 158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make FTRL algo to work properly on views #1505
Conversation
|
||
ft.fit(df_train[rows,:], df_target[rows,:]) | ||
predictions = ft.predict(df_train[rows,:]) | ||
model = dt.Frame(ft.model.to_dict()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@st-pasha, this is something I don't understand. If I just do model = dt.Frame(ft.model)
, then model
is still a reference to ft.model
. Why is it so?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dt.Frame(DT)
creates a shallow copy of DT
. This is equivalent to DT.copy()
. What makes you think it doesn't work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. I don't say it doesn't work. I just didn't expect dt.Frame(DT)
to return a shallow copy. What is the best way to make a deep copy then? I managed to do it as dt.Frame(ft.model.to_dict())
, but it doesn't seem to be an elegant solution. Is there any datatable way to do it, or I better just do copy.deepcopy(DT)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe copy.deepcopy()
just pickles and then unpickles the object. The more important question is why do you need a deep copy? The datatable should work correctly even with shallow copies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because I need a copy of the model here, otherwise if I simply do
model = dt.Frame(ft.model)
ft.reset()
model
also resets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've looked into this a little, and I think there could be a problem with pointers z
and n
. Even though they are acquired as column->data_w()
pointers, they are kept for an arbitrary long period of time. During that time the pointers may become invalid (in particular, they may become invalid for writing if you create a copy of the Frame). It might be better to re-acquire these pointers at the beginning of each python call that needs them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I've made changes to re-acquire them at each Python call. Didn't solve the problem with this test though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure? Here's what I'm getting:
>>> import random
>>> import datatable as dt
>>> from datatable.models import Ftrl
>>> ft = Ftrl(d=4)
>>> df_train = dt.Frame([random.random() for _ in range(ft.d)])
>>> df_target = dt.Frame([bool(random.getrandbits(1)) for _ in range(ft.d)])
>>> ft.fit(df_train, df_target)
>>> model = ft.model
>>> model.to_list()
[[0.9553244374605763, 0.0, 0.0, 0.0], [1.0001815001713965, 0.0, 0.0, 0.0]]
>>> ft.reset()
>>> model.to_list()
[[0.9553244374605763, 0.0, 0.0, 0.0], [1.0001815001713965, 0.0, 0.0, 0.0]]
>>> print(ft.model)
None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've recompiled datatable from scratch and it seems like this problem has been fixed. I will update tests. Thanks!
Actually there was also another problem. When I just did df_train_range = dt.Frame(df_train[rows,:])
, it kind of inherited rowindex and df_train_range
was still a view. That is why I had to do df_train_range = dt.Frame(df_train[rows,:].to_list())
. Is it something we expect due to shallow copying? If yes, then what is the best way to create a real frame based on the view?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for any frame you can write frame.materialize()
, and it will be converted from "view" into a "real" data frame, in-place.
|
||
ft.fit(df_train[rows,:], df_target[rows,:]) | ||
predictions = ft.predict(df_train[rows,:]) | ||
model = dt.Frame(ft.model.to_dict()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dt.Frame(DT)
creates a shallow copy of DT
. This is equivalent to DT.copy()
. What makes you think it doesn't work?
Closes #1502