Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Defining window operations in RML #85

Open
s-minoo opened this issue Jun 19, 2023 · 5 comments
Open

Defining window operations in RML #85

s-minoo opened this issue Jun 19, 2023 · 5 comments

Comments

@s-minoo
Copy link

s-minoo commented Jun 19, 2023

Issue

Currently, there is no way to define windowing semantics in RML.
Windowing is crucial when evaluating joins between different live streaming
data sources.

Furthermore, windowing could also support buffering capabilities for
aggregation functions when processing streaming data sources. For example,
calculating an average of the values over the last 5 minutes.

Requirements

According to Gedik B.,
windows' behaviour is defined based on its type, and policies.

There are 2 main types of windows: tumbling, and sliding windows.
An illustration about how these windows work can be found here.
Note: Session window is a special case of tumbling window where the window
only gets dropped when inactivity threshold is violated.

The policies control when the windows evicts the tuples inside
the window (eviction policy), and when they triggers the processing of the
tuples using the operator logic defined inside the window (trigger policy).

Policies are further divided into 4 categories namely:

  1. Count-based
    • Uses the number of incoming tuples to inform when to evict/trigger.
  2. Delta-based
    • Uses a threshold of an attribute of the incoming tuples to
      inform when to evict/trigger. E.g. When the temperature value of a sensor is above 40C.
  3. Time-based
    • Uses the timestamp of the incoming tuple.
  4. Punctuation-based
    • Injects punctuations inside the incoming data stream as markers to decide
      when to evict/trigger.

Thus, we need a set of vocabulary to define and configure windows by
describing:

  1. Window Type
  2. Eviction policy
  3. Trigger policy

The true semantics and combination of the policies are further explained by
Gedik B..

Example

Given the following RML with a join condition:

<#TM1> 
    rml:logicalSource <#STREAM1> ;
    rml:subjectMap <#SM1> ;
    rml:predicateObjectMap [
        rml:predicateMap <#PM1> ;
        rml:objectMap [ 
            rml:parentTriplesMap <#SM2>; 
            rr:joinCondition [
                rr:child "id";
                rr:parent "p_id"; 
            ];

        ];

    ]. 



<#TM2> 
    rml:logicalSource <#STREAM2> ;
    rml:subjectMap <#SM2> ;
    rml:predicateObjectMap [
        rml:predicateMap <#PM2> ;
        rml:objectMap <#OM2> ] .

Windows could be defined in the object map

<#TM1> 
    rml:logicalSource <#STREAM1> ;
    rml:subjectMap <#SM1> ;
    rml:predicateObjectMap [
        rml:predicateMap <#PM1> ;
        rml:objectMap [
            # Define the window to be used for joining
            rml:window [ 
                # Define window types 
                rml:windowType rml:Tumbling; 

                # Define the trigger policy for the window 
                # Every 5th record will execute the join
                rml:trigger [ a rml:CountPolicy
                    rml:countValue  5;

                ]; 

                # Define the eviction policy for the window
                # Clean up window after processing the 15th record
                rml:evict [ a rml:CountPolicy;
                    rml:countValue  15;
                ];

            ];
            rml:parentTriplesMap <#SM2>; 
            rr:joinCondition [
                rr:child "id";
                rr:parent "p_id"; 
            ];
        ];
    ]. 

<#TM2> 
    rml:logicalSource <#STREAM2> ;
    rml:subjectMap <#SM2> ;
    rml:predicateObjectMap [
        rml:predicateMap <#PM2> ;
        rml:objectMap <#OM2> ] .
@dachafra
Copy link
Member

@s-minoo is this a specific request for join conditions? If this is the case, please confirm me so I can move it to the proper repository

@s-minoo
Copy link
Author

s-minoo commented Oct 11, 2023

It is indeed a specific request for joins. So, I think it's more relevant to rml-join repo.

@dachafra dachafra transferred this issue from kg-construct/rml-core Oct 20, 2023
@dachafra
Copy link
Member

Transfer to its corresponding repository then

@elsdvlee
Copy link
Collaborator

The rml-jc repo will be closed. Moving this unsolved issue back to rml-core.

@dachafra dachafra transferred this issue from kg-construct/rml-jc Feb 14, 2024
@dachafra
Copy link
Member

dachafra commented Jul 3, 2024

I would suggest to leave this issue for the working-group

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants