large_product_affinity is used to obtain Product Affinity using big data without PySpark environment
Features:
- large_product_affinity has been proven to handle millions of rows of transactional data
- It can also handle tens of thousands of products
- large_product_affinity requires only a dataframe with just two columns
- It requires minimal pre-processing
- No post-processing required
Requirements for Input data:
- Data should be in a dataframe format of two columns
- First column must be transaction id or any field containing transaction information
- Second column should be products corresponding to transactions in the first column
Input:
- Please, Choose an acceptable Support
Pre-Processing:
- No Nulls
Post-Processing: None
Output:
- Product Affinity table is sorted in the order of Confidence and Lift in descending order.
Drawbacks:
- The only drawback noted was the user's system capability to read data.
- Please, Use Modin or Dask to read large volumes of data if Pandas fails.