Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add test cases for caption_images component and fixed bug in this com… #311

Merged
merged 10 commits into from
Jul 31, 2023

Conversation

satishjasthi
Copy link
Contributor

This the first commit to add test cases to all existing components of fondant. This PR tries to resolve #301

@mrchtr
Copy link
Contributor

mrchtr commented Jul 25, 2023

Thanks @satishjasthi! The test case is already a good starting point to build a unit test fundament!

During a quick offline discussion with @GeorgesLorre about the general test layout we came up with an idea.
It would be great if we would have something like an AbstractComponentTest which can be used for all components.

The basic idea is to make the implementation of unit tests as easy as possible for the users. In the best case scenario, users would only need to define a dataframe containing their input dummy data, the expected output data, and the component arguments. A default class could be created to handle tasks such as invoking the component's transform method and comparing the expected output dataframe with the actual result.

Would you be up for drafting such a class and use it to test the caption_images component?

@satishjasthi
Copy link
Contributor Author

satishjasthi commented Jul 25, 2023

Hey @mrchtr, Ya sure that sounds like a better idea, I would love to create such an abstract class for caption_images component.

PhilippeMoussalli and others added 4 commits July 25, 2023 21:33
PR that introduces the partitioning strategy discussed in ml6team#288 

1) The automatic behavior is as follows for all component types (dask,
pandas)
* The written dataframe is re-partitioned to 250 Mb
* The loaded dataframe is re-partitioned depending on the current number
of partitions and workers

2) The behavior above can be overwritten by the end user in case they
want to implement their own custom logic, this is done on the
ComponentOp level as an additional flag parameters that can be passed.
See added docs with this PR for more details

I will handle adding the diagnostic tools and optimizing the downloader
component in a separate PR.
@satishjasthi
Copy link
Contributor Author

Thanks @satishjasthi! The test case is already a good starting point to build a unit test fundament!

During a quick offline discussion with @GeorgesLorre about the general test layout we came up with an idea. It would be great if we would have something like an AbstractComponentTest which can be used for all components.

The basic idea is to make the implementation of unit tests as easy as possible for the users. In the best case scenario, users would only need to define a dataframe containing their input dummy data, the expected output data, and the component arguments. A default class could be created to handle tasks such as invoking the component's transform method and comparing the expected output dataframe with the actual result.

Would you be up for drafting such a class and use it to test the caption_images component?

Hey @mrchtr, I have pushed commit for AbstractComponentTest class. My idea was to create a pytest class which can validate

  • Whether a new component created has base component as one of the components defined in fondant.component.py
  • And to validate whether this new component has all the methods and its respective attributes that are defined in its base class

This can be used to test all the components, expect the actually functionality of the components which varies from one another, so user can add test cases as methods to this class to incorporate validations for those custom component behaviours.
Please let me know your views on this. Thank you

@mrchtr
Copy link
Contributor

mrchtr commented Jul 26, 2023

Thanks for the update @satishjasthi! I didn't think about your test cases, they are interesting.
I think we can combine our ideas here. In my mind, I had a slightly different approach. I would see it more as a base class, with concrete component tests inheriting from it.

So basically to have something like this for the AbstractComponentTest:

class AbstractComponentTest(ABC):

    @abstractmethod
    def create_component(self):
        """
        This method should be implemented by concrete test classes to create the specific component
        that needs to be tested.
        """
        raise NotImplementedError()

    @abstractmethod
    def input_dataframe(self):
        """
        This method should be implemented by concrete test classes to create the specific input dataframe.
        """
        raise NotImplementedError()

    def setUp(self):
        """
        This method will be run before each test method.
        Add any common setup steps for your components here.
        """
        self.component = self.create_component()
        self.input_dataframe = self.create_input_dataframe()
        self.expected_output_dataframe = self.create_output_dataframe()

    def tearDown(self):
        """
        This method will be run after each test method.
        Add any common cleanup steps for your components here.
        """
        pass
  
    def test_transform(self):
        """
        Default test for the transform method.
        Tests if the transform method executes without errors.
        """
        transformed_data = self.component.transform(self.input_dataframe)
        # compare transformed_data with self.expected_output

Here, we could implement default test methods that would be easy to maintain in the future. If we add new functionality for the base components, we wouldn't have to edit each single unit test. I suggest starting with a test_transform method that calls the transform method of the component and checks if the transformed output matches the expected one.

For the concrete implementation we just have to inherit from the abstract one and implement the create_component, create_input_dataset and create_output_dataset methods.

Based on the class provided, you can modify the caption_image_component test by inheriting from the base class and implementing the initialisation of the component and the datasets.

So something like this:

class CaptionImageComponentTest(AbstractComponentTest):

     def create_component(self):
         # your implementation
     ...

I also would like to hear other opinions on this as well.

@satishjasthi
Copy link
Contributor Author

Hi @mrchtr, Thanks for the detailed explanation, I like this idea as it can easily extend to all components easily. I'll create a base class in this manner and submit new commit.

@satishjasthi
Copy link
Contributor Author

satishjasthi commented Jul 26, 2023

Hey @mrchtr, I have updated the test case for CaptionImage with a new base class and used it for testing this component. Please review it and let me know your views.

Copy link
Contributor

@mrchtr mrchtr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot again @satishjasthi! I left some smaller comments. Got the feeling we are on the right track.

self.input_data = self.create_input_data()
self.expected_output_data = self.create_output_data()

def test_transform(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this implementation to the super class, since it should be the same for most components?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we can, as this method has pretty much stand method of testing transform method for all classes.

output = self.component.transform(self.input_data)
assert output.equals(self.expected_output_data)

def tearDown(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we can remove this method. I have added the tearDown just as en example and I think we don't need it here explicitly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, I felt the same but I retained it as its standard template in pytest testing with setup and teardown methods. I'll remove that.

)

@pytest.fixture(autouse=True)
def __setUp(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to remove this method as well? Ideally, the setUp of the super class is called.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we can, as the code within this method is again remains same across components

from caption_images.src.main import CaptionImagesComponent


class AbstractComponentTest(ABC):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Later on, this class should be moved to fondant itself, so that users can reuse it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that makes sense

@satishjasthi
Copy link
Contributor Author

@mrchtr, I have updated and pushed new commit with suggested changes, please review it.

Copy link
Contributor

@mrchtr mrchtr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. I have tested the code locally.
Added on super small nitpick.

I think we can merge it, and see it as a first blueprint for further component test. Thanks again!

Tests if the transform method executes without errors.
"""
output = self.component.transform(self.input_data)
if not output.equals(self.expected_output_data):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick - pandas offers you already an assertion function:
pd.testing.assert_frame_equal(output,self.expected_output_data)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mrchtr , I have pushed code with this change. Shall I create test cases for remaining components in the same PR or a new one?

@GeorgesLorre GeorgesLorre merged commit 5b123bf into ml6team:main Jul 31, 2023
@mrchtr mrchtr mentioned this pull request Jul 31, 2023
Hakimovich99 pushed a commit that referenced this pull request Oct 16, 2023
#311)

This the first commit to add test cases to all existing components of
fondant. This PR tries to resolve #301

---------

Co-authored-by: Philippe Moussalli <philippe.moussalli95@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add component tests
4 participants