Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decide on data structure for Reporting client #29

Open
flora-hofmann-frequenz opened this issue May 8, 2024 · 7 comments
Open

Decide on data structure for Reporting client #29

flora-hofmann-frequenz opened this issue May 8, 2024 · 7 comments
Assignees
Labels
part:❓ We need to figure out which part is affected priority:❓ We need to figure out how soon this should be addressed type:enhancement New feature or enhancement visitble to users

Comments

@flora-hofmann-frequenz
Copy link
Contributor

flora-hofmann-frequenz commented May 8, 2024

What's needed?

A data structure that can expose the metrics as well as the states per microgrid and component.

Proposed solution

Option to be discussed:

  1. Full (artificial) mapping from protobuf to Python datatype
  2. Helpers (such as namedtuples) to work

Use cases

Not only metrics, but also states, warnings and errors should be exposed.

Alternatives and workarounds

Any alternative ideas are welcome.

Additional context

As discussed in API usergroup meeting on 6th May 2024:
Currently, the ReportingApiClient only exposes MetricSamples using a namedtuple structure.

To additionally expose States:

  • At any given timestamp, a component can have multiple states, such as charging and relay_open
  • Practically, whenever you get a metric you can also get the states information

The example output structure from the protobuf is as follows with the entry point to the GRPC returning those messages.

> // Response containing historical microgrid component metrics in one or multiple microgrids
> //
> // Each microgrid's components are provided as timeseries data structures that encapsulate
> // metrics, bounds, errors and operational state and their associated timestamps for each component
> // within the specified time range.
> //
> // !!! example
> //     Example output structure:
> //     ```
> //     microgrids: [
> //       {
> //         microgrid_id: 1,
> //         components: [
> //           {
> //             component_id: 13,
> //             metric_samples: [
> //               /* list of metrics for multiple timestamps */
> //               { sampled_at: "2023-10-01T00:00:00Z", metric: "DC_VOLTAGE_V", sample: {...}, bounds: {...} },
> //               { sampled_at: "2023-10-01T00:00:00Z", metric: "DC_CURRENT_A", sample: {...}, bounds: {...} }
> //               { sampled_at: "2023-10-01T00:05:00Z", metric: "DC_VOLTAGE_V", sample: {...}, bounds: {...} },
> //               { sampled_at: "2023-10-01T00:05:00Z", metric: "DC_CURRENT_A", sample: {...}, bounds: {...} }
> //             ],
> //             states: [
> //               /* list of states for multiple timestamps */
> //               { sampled_at: "2023-10-01T00:00:13.12Z", states: [...], errors: [...], warnings: [...] },
> //               { sampled_at: "2023-10-01T00:02:22.01Z", states: [...], errors: [...], warnings: [...] },
> //               { sampled_at: "2023-10-01T00:05:02.32Z", states: [...], errors: [...], warnings: [...] },
> //             ]
> //           },
> //           {
> //             component_id: 243,
> //             metric_samples: [ ... ],
> //             states: [ ... ]
> //           },
> //         ]
> //       },
> //       {
> //         microgrid_id: 2,
> //         components: [ ... ]
> //       }
> //     ]
> //     ```

To retrieve MetricSample information, we use the following helper in the ReportingApiClient:

MetricSample = namedtuple(
    "MetricSample", ["timestamp", "microgrid_id", "component_id", "metric", "value"]
)
"""Type for a sample of a time series incl. metric type, microgrid and component ID

A named tuple was chosen to allow safe access to the fields while keeping the
simplicity of a tuple. This data type can be easily used to create a numpy array
or a pandas DataFrame.
"""
@flora-hofmann-frequenz flora-hofmann-frequenz added part:❓ We need to figure out which part is affected priority:❓ We need to figure out how soon this should be addressed type:enhancement New feature or enhancement visitble to users labels May 8, 2024
@flora-hofmann-frequenz flora-hofmann-frequenz self-assigned this May 8, 2024
@llucax
Copy link
Contributor

llucax commented May 8, 2024

One unrelated comment is I recommend using the typing.NamedTuple class, so you can add type hints to it too.

@llucax
Copy link
Contributor

llucax commented May 8, 2024

For me this is very hard to think about without knowing how do we plan to use this data. As long as we don't have a clear picture, I think the safest bet is to make the client retrieve the data as it comes in the API and then adapt it at another level. Without changing the protocol, we'll still have to do this transformation, and limiting what the user can do with the client artificially doesn't sound like a great idea.

@cwasicki
Copy link
Contributor

I think the safest bet is to make the client retrieve the data as it comes in the API and then adapt it at another level.

Agree that there should be a way to retrieve the data as close as possible to the message definition. But I suggest to also keep the current approach as an alternative, since this format is already being used by multiple users and seems to be very useful. @llucax What is your idea of "another level"? A restricted client on top of the slim client? In the same repo?

@llucax
Copy link
Contributor

llucax commented May 17, 2024

What I mean I wouldn't make the adapted retrieval the only way to retrieve the data, maybe it is more clean in pseudo-code form:

adapted_data = await client.retrieve()  # Not for now, as we don't know if this will fit all use cases

adapted_data = adapt_data(await client.retrieve())  # Yes!

So if somebody needs to retrieve the data from the server in a non-adapted way (or adapt it differently) they can still use the client and don't have to get into the mess of using the raw bindings.

adapt_data() can be in the same repo for now, and I don't think it should be a whole new client for the adapted data if a simple function will suffice. We can see if it makes sense to split later, like we have with dispatch.

@flora-hofmann-frequenz
Copy link
Contributor Author

@llucax: We were thinking of reusing some of the dataclasses in here.
What is your take on that?
Could add a _types.py module and then add/update the ListMicrogridComponentsDataResponse and add the adaptor (as mentioned above).

@llucax
Copy link
Contributor

llucax commented May 23, 2024

Sounds good in principle, I'm still not very involved in all the protobuf stuff in common that is shared between microgrid and reporting, so I feel I don't have a full picture to have a strong opinion, so please don't hold me accountable if I change my mind in the future when I get more involved 😆 But again I think we are still at an exploration stage so I guess this is all fine, right?

@flora-hofmann-frequenz
Copy link
Contributor Author

Sounds good to me! 👍 Let me keep looking into it and share as I go along for further discussions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
part:❓ We need to figure out which part is affected priority:❓ We need to figure out how soon this should be addressed type:enhancement New feature or enhancement visitble to users
Projects
None yet
Development

No branches or pull requests

3 participants