Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: ensuring deterministic behavior for driftstream #179

Merged
merged 4 commits into from
Sep 8, 2024
Merged

Conversation

hmgomes
Copy link
Collaborator

@hmgomes hmgomes commented Aug 9, 2024

Previously DriftStream was reusing its internal MOA ConceptDriftStream object which led to divergences with a ConceptDriftStream generated from MOA or using the MOA CLI instead of the high-level DriftStream API (i.e. defining it with a list of Concepts and Drifts.

## Comparing that instances are the same
from capymoa.stream.drift import DriftStream, Drift, AbruptDrift, GradualDrift
from capymoa.stream.generator import SEA
from capymoa.classifier import HoeffdingTree
from moa.streams import ConceptDriftStream
from capymoa.evaluation import prequential_evaluation
import numpy as np

def _different(x1: np.ndarray, x2: np.ndarray, debug=False):
    if debug and not np.array_equal(x1, x2):
        print("Instance from stream_sea2drift (inst_capy.x):", x1)
        print("Instance from stream_sea2drift_MOA (inst_moa.x):", x2)
        # Optionally, you can add a more detailed comparison output
        diff_indices = np.where(x1 != x2)
        print("Differences at indices:", diff_indices)
        print("Values in inst_capy.x:", x1[diff_indices])
        print("Values in inst_moa.x:", x2[diff_indices])
    
    return not np.array_equal(x1, x2)


stream_sea2drift = DriftStream(stream=[SEA(function=1), 
                                AbruptDrift(position=50), 
                                SEA(function=3), 
                                GradualDrift(position=100, width=20), 
                                SEA(function=2)])

print(f'~~~~~~ DriftStream is accessible through the object ~~~~~~:\n {stream_sea2drift}')

stream_sea2drift_MOA = DriftStream(moa_stream=ConceptDriftStream(), 
                               CLI='-s (ConceptDriftStream -s generators.SEAGenerator -d (generators.SEAGenerator -f 3) -p 50 -w 0) \
                               -d (generators.SEAGenerator -f 2) -w 20 -p 100 -r 1 -a 0.0')

i = 1
while i < 150:
    inst_capy = stream_sea2drift.next_instance()
    inst_moa = stream_sea2drift_MOA.next_instance()

    if _different(inst_capy.x, inst_moa.x, debug=False):
        print(f"Error: Instances do not match. num_instance: {i}")
        raise ValueError("Execution stopped due to mismatch in instances.")
    else:
        print(f"results match. num_instance: {i}")
    
    i += 1

@hmgomes
Copy link
Collaborator Author

hmgomes commented Aug 9, 2024

Might as well use that example code as the basis for an automated test.
Still need to investigate the behaviour when a DriftStream object is restarted. For example, using it several times with prequential_evaluation will cause it to be restarted.

@hmgomes hmgomes merged commit 33ba7ba into main Sep 8, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant