-
Hi all, Today I have a question about slicing a dataframe. I only need a small window of data, and processing the entire dataframe feels like wasted computation, so I want to reduce it to only the last range of datapoints (in this case, three times the roll_period I use for linear regression). Looking at the docs the "remove_data_by_idx" function looks like what I want, so I've written the following code:
It compiles fine, but when I run it, I get an odd error. Once the conditional is fulfilled I get In addition and more as a general question, is this the right way to effectively get a sliding window? I betray my Python roots as I've used the above pattern with pandas a lot in the past. Perhaps there is a better/more elegant way it can be done using this library and C++?
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 3 replies
-
I have limited access to computers for the next few days. So I can’t tell what’s going on. Experiment with different options, like assigning to a new DataFrame…
Sent from the all new AOL app for iOS
On Friday, August 25, 2023, 7:41 AM, Oggy16 ***@***.***> wrote:
Hi all,
Today I have a question about slicing a dataframe. I only need a small window of data, and processing the entire dataframe feels like wasted computation, so I want to reduce it to only the last range of datapoints (in this case, three times the roll_period I use for linear regression).
Looking at the docs the "remove_data_by_idx" function looks like what I want, so I've written the following code:
unsigned long df_size = df.get_column<float>("last").size();
cerr << "Size: " << df_size << endl;
if (df_size > (roll_period * 3)) {
df = df.get_data_by_idx<std::string, float>(
Index2D<ULDataFrame::IndexType>{df_size - (roll_period * 3), df_size}
);
}
It compiles fine, but when I run it, I get an odd error. Once the conditional is fulfilled I get DataFrame::get_column(): ERROR: Cannot find column 'timestamp'. The "timestamp" is a valid column in the original dataframe, and from the docs remove_data_by_idx should preserve all columns in the dataframe, so what am I doing wrong?
Only difference I can see from the hello_world example is that I am overwriting the original df variable with the new dataframe (in the interest of memory efficiency) but the copied dataframe slice should have all the columns anyway.
In addition and more as a general question, is this the right way to effectively get a sliding window? I betray my Python roots as I've used the above pattern with pandas a lot in the past. Perhaps there is a better/more elegant way it can be done using this library and C++?
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
During my tests I was able to reproduce the issue both with a separate dataframe and overwriting the existing one. While reading further through the docs, I found the "df.get_data_by_loc" function, which accepts Python style negative indexes.
As that pattern was familiar to me, I re-wrote the section as follows:
Not only does this look more familiar to me, my issue has also gone away. The sliding window works well and as I had expected, my performance has increased a lot after this change.
I am not sure w…