Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MovingWindow: Create Interface for requesting data and handling the data copy behaviour #214

Closed
matthias-wende-frequenz opened this issue Feb 21, 2023 · 14 comments
Assignees
Labels
part:data-pipeline Affects the data pipeline type:enhancement New feature or enhancement visitble to users
Milestone

Comments

@matthias-wende-frequenz
Copy link
Contributor

matthias-wende-frequenz commented Feb 21, 2023

When asking the MovingWindow for a subwindow we might return a slice of the buffer, that means a reference to the underlying numpy based ringbuffer.
There are cases, e.g. when dealing with small window sizes and high sample rates, where the data might get updated from the data stream before the requested subwindow has been processed.

As a solution we want to tell the MovingWindow to return a copy of the data instead of a reference. In order to do that we need to add an interface, such that users can control copy behavior.

We might implement a method for the MovingWindow as follows.

window(
	self,
	start: datetime,
	end: datetime,
	fill_gaps: Optional[Quantity] = None,
	force_copy: Bool = False
)

This method should return a sorted numpy array that holds the resampled data from start to end and fills the gaps with the Value that is provided by fill_gaps.


Summary of the (ongoing) work related to this and what I see essential for v1.0.

Essential for this issue:

Related to moving window:

@leandro-lucarella-frequenz leandro-lucarella-frequenz added this to the post-v1.0 milestone Apr 14, 2023
@matthias-wende-frequenz matthias-wende-frequenz added the part:data-pipeline Affects the data pipeline label Jul 21, 2023
@matthias-wende-frequenz matthias-wende-frequenz changed the title Create Interface for handling the data copy behaviour Create Interface for requesting data and handling the data copy behaviour Jul 21, 2023
@matthias-wende-frequenz
Copy link
Contributor Author

@thomas-nicolai-frequenz

I don't understand why we could not design it in a way where the user passes his numpy array by reference the moving window is writing to. This way we could avoid the need for the fill_gaps parameter. In that case the numpy array could be initialised with whatever values the someone wants.

@cwasicki
Copy link
Collaborator

To me passing in an array feels less like typical behavior of other functions in numpy or pandas.

fill_gaps: Optional[Quantity] = None,

Since we should not have gaps in the moving window but only NaNs or uninitialized in the beginning, I would name this differently.

start: datetime,
end: datetime,

That should also work with index, not only datetime.

@matthias-wende-frequenz
Copy link
Contributor Author

I don't understand why we could not design it in a way where the user passes his numpy array by reference the moving window is writing to. This way we could avoid the need for the fill_gaps parameter. In that case the numpy array could be initialised with whatever values the someone wants.

This is about the gaps that are coming from the resampler, e.g. due to faulty hardware.

Passing the np array is also planned but an independent issue (#530).

@thomas-nicolai-frequenz
Copy link

thomas-nicolai-frequenz commented Jul 24, 2023

This is about the gaps that are coming from the resampler, e.g. due to faulty hardware.

Yes but in this case we would copy the NaN's that are coming from the resampler. This has to be treated differently I think but not 100% sure.

@matthias-wende-frequenz
Copy link
Contributor Author

Yes but in this case we would copy the NaN's that are coming from the resampler. This has to be treated differently I think but not 100% sure.

You are right, we shouldn't copy NaN's. If fill_gaps is set to None and there are gaps then we should raise an exception.

@llucax llucax modified the milestones: v0.23.0, v0.24.0 Aug 3, 2023
@llucax llucax moved this from To do to In progress in Python SDK Roadmap Aug 8, 2023
@llucax llucax moved this from In progress to Review in progress in Python SDK Roadmap Aug 8, 2023
@llucax llucax added the status:blocked Other issues must be resolved before this can be worked on label Aug 8, 2023
@llucax
Copy link
Contributor

llucax commented Aug 8, 2023

Blocked because we need to fix some bugs in the ring buffer for this to be useful.

@llucax llucax moved this from Review in progress to In progress in Python SDK Roadmap Aug 8, 2023
@llucax llucax modified the milestones: v0.24.0, v0.25.0 Aug 8, 2023
@cwasicki
Copy link
Collaborator

cwasicki commented Aug 9, 2023

Blocked because we need to fix some bugs in the ring buffer for this to be useful.

@llucax Which bugs are those?

@cwasicki
Copy link
Collaborator

@llucax Which bugs are those?

Found it: #578

@llucax
Copy link
Contributor

llucax commented Aug 10, 2023

OK, if you need any further info about this, ask @matthias-wende-frequenz

@thomas-nicolai-frequenz

@leandro-lucarella-frequenz but this bug is fixed and merged. Can we unblock the issue?

@llucax llucax removed the status:blocked Other issues must be resolved before this can be worked on label Aug 22, 2023
@llucax
Copy link
Contributor

llucax commented Aug 22, 2023

Done.

@llucax llucax modified the milestones: v0.25.0, v1.0.0-rc Aug 22, 2023
@Marenz Marenz changed the title Create Interface for requesting data and handling the data copy behaviour MovingWindow: Create Interface for requesting data and handling the data copy behaviour Aug 22, 2023
@cwasicki cwasicki modified the milestones: v1.0.0-rc, v0.25.0 Aug 22, 2023
@cwasicki cwasicki self-assigned this Aug 24, 2023
@llucax llucax added the type:enhancement New feature or enhancement visitble to users label Sep 12, 2023
@llucax llucax modified the milestones: v1.0.0-rc1, v1.0.0-rc2 Sep 12, 2023
github-merge-queue bot pushed a commit that referenced this issue Sep 13, 2023
This PR introduces several copy-related updates to the ring buffer
window functionality

Changes:
* Default Copy Behavior: Changed the default behavior to copy data in
the ring buffer window. This is to prioritize data integrity over
performance, which is a minor concern for most expected use-cases. Made
`force_copy` as a keyword argument.
* Moved the logic for extracting wrapped buffers into a separate method
to fix a bug and allow for better testing and future maintainability.
* Handling None Values: Fixed an issue where None values were being
forcibly copied in the ring buffer window. The decision to enforce a
copy is now left to the user, even when the data contains None values.
* Test Coverage: Added tests for the window method in the ring buffer,
including tests for expected copy behavior on missing values.
* Fix to return an empty array in case start = end in ring buffer
window.
* Expose restricted ring buffer window by moving window.

Part of #214
@cwasicki
Copy link
Collaborator

cwasicki commented Oct 2, 2023

Apart from a very special bug (#646) all associated work has been merged.

@cwasicki cwasicki closed this as completed Oct 2, 2023
@github-project-automation github-project-automation bot moved this from In progress to Done in Python SDK Roadmap Oct 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
part:data-pipeline Affects the data pipeline type:enhancement New feature or enhancement visitble to users
Projects
Development

Successfully merging a pull request may close this issue.

6 participants