You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In v2.2.0 of ClassifAI we added the ability to classify content within your own terms using OpenAI Embeddings. In order for this to work, we need embedding data to be generated for each term and for the post we are comparing those terms to.
The post embedding data is generated on the fly when the comparison is triggered but we don't want to do that for terms as we may have hundreds or thousands. So these are generated in bulk when the feature is first set up. This has always been a known limitation, that if you have lots of terms, this process will probably run into either timeouts, memory issues or rate limit issues with OpenAI.
In #758, we are doing some changes to how OpenAI Embeddings work but we have not yet fixed this issue, so ideally that is fixed and added to the same release (as these changes require all embeddings to be regenerated).
There are two issues I'm currently aware of:
We generate these embeddings when the settings are saved. But we only generate embeddings for taxonomies that are turned on. So the first time you save, the taxonomy settings aren't saved yet so we don't run anything. You have to save again for things to work
We generate embeddings for each term that doesn't currently have an embedding saved during this process (which again, fires when the settings are saved). For sites with 1000+ terms, this will almost certainly lead to timeouts or memory issues. Sites with far fewer terms will probably run into OpenAI rate limits
Ideally we would introduce some sort of queue management system to address this, ideally making this a general enough solution that it can be used by other features that may come in the future. There are tools out there we could look to use, like Action Scheduler or Cavalcade, but we may be fine just building a lightweight system on top of the scheduled event system in WordPress.
Steps to Reproduce
Setup the Classification Feature with OpenAI Embeddings as the Provider
Turn on at least one taxonomy and hit save
Notice that no embeddings are actually generated
Hit save again and notice the embeddings get generated
Can also generate 1000+ terms and try running this process again, though note this will cost money since it makes API requests. I've tested locally using an embeddings model run through Ollama and at around 1000 terms, I run into memory issues
Screenshots, screen recording, code snippet
No response
Environment information
No response
WordPress information
No response
Code of Conduct
I agree to follow this project's Code of Conduct
The text was updated successfully, but these errors were encountered:
I've investigated both Action Scheduler and Cavalcade and found that the latter requires disabling WP-Cron.
For this reason I think Action Scheduler is a more reasonable candidate.
I have a branch with Action Scheduler implemented, however I'm facing some PHP memory exhaustion errors. It is intermittent, but I suspect it has to do with scheduling jobs inside the for() loop. I'll fix that and push the branch this week.
@dkotter Dharmesh and I discussed this last week and concluded that either/or is a good choice as both has its pros and cons.
I decided to go ahead with Action Scheduler to align with Woo's decision to migrate all the background process related jobs to AS. Related: woocommerce/woocommerce#44246
Describe the bug
In v2.2.0 of ClassifAI we added the ability to classify content within your own terms using OpenAI Embeddings. In order for this to work, we need embedding data to be generated for each term and for the post we are comparing those terms to.
The post embedding data is generated on the fly when the comparison is triggered but we don't want to do that for terms as we may have hundreds or thousands. So these are generated in bulk when the feature is first set up. This has always been a known limitation, that if you have lots of terms, this process will probably run into either timeouts, memory issues or rate limit issues with OpenAI.
In #758, we are doing some changes to how OpenAI Embeddings work but we have not yet fixed this issue, so ideally that is fixed and added to the same release (as these changes require all embeddings to be regenerated).
There are two issues I'm currently aware of:
Ideally we would introduce some sort of queue management system to address this, ideally making this a general enough solution that it can be used by other features that may come in the future. There are tools out there we could look to use, like Action Scheduler or Cavalcade, but we may be fine just building a lightweight system on top of the scheduled event system in WordPress.
Steps to Reproduce
Can also generate 1000+ terms and try running this process again, though note this will cost money since it makes API requests. I've tested locally using an embeddings model run through Ollama and at around 1000 terms, I run into memory issues
Screenshots, screen recording, code snippet
No response
Environment information
No response
WordPress information
No response
Code of Conduct
The text was updated successfully, but these errors were encountered: