Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not save summaries in the database #343

Closed
powelski opened this issue Mar 16, 2014 · 22 comments
Closed

Do not save summaries in the database #343

powelski opened this issue Mar 16, 2014 · 22 comments
Labels

Comments

@powelski
Copy link

This is huge one. I remember we've had discussion about it alread and some of you found that unnecessary. I must strongly disagree. Stream would be much more powerful if summaries were not saved directly in the database. Just a few arguments why:

  1. It makes translation jobs more stressful. In the current state, translations should be updated with every release (and they are not). It's not a must for anyone to speak English and for some people logs in that language are useless, because they don't understand a word. If we update their language pack some time later, their logs will never be updated to the language they know.

  2. Somebody might use plugins to make Wordpress multi-language on the backend, because they work in international team. The behavior here will turn into a beautiful mix of several languages.

  3. We all make mistakes and those in logs messages can't be fixed.

  4. We could want to make logs exportable one day, that would be something if we let users choose destination's language.

@shadyvb
Copy link
Contributor

shadyvb commented Mar 17, 2014

This seems very reasonable, might be the time for it as well.
+1 for bringing this up now, /five!

@kucrut
Copy link
Contributor

kucrut commented Mar 17, 2014

capital_P_dangit() 😉

@powelski powelski self-assigned this Mar 18, 2014
@powelski
Copy link
Author

@fjarrett Do we have green light for this? Here's my plan for this one:

  1. First we need to figure out which logs have sufficient meta to get rid of summary data completely. For those which don't, I suggest moving summary to meta table. I don't think any summary has more than 256 characters for now, so our meta_value field should be fine with it.

  2. Then we remove summary column on update routine (I can't wait to see that happening 😄 !)

  3. Finally, we would need a method for each connector, which would build summaries messages based on record data, including meta.

@shadyvb
Copy link
Contributor

shadyvb commented Mar 18, 2014

  1. First we need to figure out which logs have sufficient meta to get rid of summary data completely. For those which don't, I suggest moving summary to meta table. I don't think any summary has more than 256 characters for now, so our meta_value field should be fine with it.

All summaries are essentially composed of meta values, so we're safe on that front.

I think we should have some kind of fallback, since not all sites would need this change, and since this change will essentially bring some more processing requirements on multiple ends.

@westonruter
Copy link
Contributor

since this change will essentially bring some more processing requirements on multiple ends

Right, because there are additional JOINs that have to be done to grab the additional meta?

@shadyvb
Copy link
Contributor

shadyvb commented Mar 18, 2014

We'll need to grab ALL metadata, so actually two queries has to be executed, one for the record(s) and one for meta record(s) as well.

@frankiejarrett
Copy link
Contributor

Just a few questions about how this would impact other things:

  1. The search input on the Records screen really just searches summaries for keywords, how would this be affected?
  2. How would this affect Notifications and Data Exporting (separate extensions)?
  3. The summary field is arguably the most important field in the private RSS/JSON feeds as well. How will these be affected with Summaries built on the fly instead of being fetched from the DB?

@shadyvb
Copy link
Contributor

shadyvb commented Mar 18, 2014

  1. This will affect the search on Records screen, a summary won't be searchable, the only fix for this is to introduce meta-value search, which will be a true hassle with all the dynamics in place.
    2a. Notifications will have to adopt the new changes, nothing major would happen i suppose since it is a one time thing, however an extra step would be needed to actually compile the summary.
    2b. Data exporting will have to adropt the new changes as well, nothing core will chance, just changing backed-up columns and stuff.
  2. That'll have to be done on the fly yes, more processing.

For all of that, i do think we should make the language switching stuff opt-in by users, so all things are stored in English, and they're only processed in real-time if the user has switched the option on.

@powelski
Copy link
Author

@fjarrett @shadyvb Storing summaries in English is not the greatest solution here as well. One the main advantages of this idea is being consistent. We should be able to change any summary text on development. To satisfy all those points, I think the best idea would be to save summaries in separate table, with language code. That would make records fully searchable in any available language case. We'd then need smart updater to change saved summaries when needed. Anyway, I would use summaries field only on search engine and display results dynamically.

@westonruter
Copy link
Contributor

Wait, are summaries currently being stored in English or are they being stored in the current WPLANG? Is it even a common use case that WPLANG would be changed later? It seems like an edge case.

If they do want to switch WPLANG then there could be a method for regenerating the summary for each item, though this would be quite an expensive operation. Would it be that bad if old Stream records have one language and newer ones have another when the WPLANG is switched?

I realize this has implications for multisite, though, as each site could have a different WPLANG.

@shadyvb
Copy link
Contributor

shadyvb commented Mar 19, 2014

@powelski did explain some use cases where the current implementation, or any regenerate-whole-data-on-demand solution, would simply create an inconsistent experience with data in multiple languages. And i can vouch for those use cases being a non-native English speaker myself.

And yes summaries are stored in the current WPLANG, but what i was proposing my last comment was to actually store them in English, disregarding the current value for WPLANG, only as a fallback if users doesn't opt-in for language-specific summaries.
I know that might not make sense, just thinking out loud.

Update: Not necessarily English, might be whatever language they have, on condition that this be the only language used ( if no users opt-in for the real-time generation of translated summaries ).

@powelski
Copy link
Author

@westonruter Yes, it would be bad 😄 Imagine you have part of your logs in Polish.

@frankiejarrett
Copy link
Contributor

@powelski @westonruter @shadyvb

  1. It should be noted that once Issue 313: Create new connector for Theme Editor #341 is merged its unlikely that new connectors will be introduced any time soon. We've now covered just about every area and action that can happen in WordPress.

New summaries and changes to existing summaries are becoming more and more rare. Of course this happened a lot during these first few months because we were maturing the plugin and adding a lot of new things, but this is going to be happening less and less now, so the notion that translations for summaries have to be updated after every release is not exactly the case.

  1. Even if summaries are built on-the-fly it doesn't change the fact that translations must still be provided for every summary, otherwise English is used. If I understand correctly, the biggest advantage to writing summaries on-the-fly is really that old records will be corrected, right?

So an alternative solution to this problem would be to offer an option to just reprocess the summaries when translations are updated so all past entries are updated appropriately. Or if a language pack suddenly becomes available that matches your WPLANG then you are given the option to reprocess your records.

@frankiejarrett
Copy link
Contributor

Just to piggy-back on my alternative solution above, if we build a more robust API for handling DB update routines (as suggested in #379) then we can reprocess summaries with ease when language packs are added/updated that match the site's WP_LANG.

We could even have a separate routine just for language pack updates, with it's own custom message:

"Aktualizacje dla danego języka (pl_PL) są dostępne, niech przerabiają bazy danych." 😄

@powelski
Copy link
Author

@fjarrett You speak Polish and you didn't tell me! 😿

What I'd really love to have here is also being able to translate Stream on multi-lingual backends. That could be done with the builder as well, of course, but each language would need to be built separately with language code attached.

@lukecarbis
Copy link
Contributor

@fjarrett @powelski Another important usage case to consider is multisite. I currently have my multisite configured with the Network Administration in English, and some sites in Italian.

Currently, in #154 the Network Admin Stream displays summaries in whichever language they were stored in - so if Stream had an Italian translation then some entries would be in Italian and some would be in English.

Ideally, the Network Admin should be able to view all summaries in whichever language they choose, regardless of the language used by the sites.

@westonruter
Copy link
Contributor

What about this:

  1. Store the template of the summary, e.g. “{title}” ({name}) widget updated in {sidebar}
  2. Store a JSON-serialized blob of the template variables: {"title":"Hello World","name":"Text","sidebar":"Primary Sidebar"}
  3. Searching the database would search both the summary text and the serialized template vars.
  4. Listing entries would load up a localized template (via gettext __() etc) and supply the template vars on the fly, e.g. “Hello World” (Text) widget updated in Primary Sidebar

The caveat about this is that a theme could have multiple localizations for a widget's name and a sidebar's name, but the localized strings wouldn't be used here in the summaries, since the stored template vars would be whatever language was active at the time that the entries were stored. If you tried storing the widget ID (base) and the sidebar ID instead, and looked up the localized strings on the fly when listing the records, then this seems like it would then break the ability to search.

@lukecarbis
Copy link
Contributor

@westonruter This seems like a good solution to me. I see a few issues.

I think it's acceptable to store things like Texte (Text) or Barre Generale (Primary Sidebar) in the database if that's how they appeared in the French admin interface when the action occurred.

However, I don't think we should be searching keys like title or name. If I wrote a blog post called 'My name is Luke', and then tried to find it by searching for 'name', I would get lots of totally unrelated results (possibly nearly every result).

The primary problem here is that this solution assumes a base language of English. Let's assume that my network admin interface is in French and the database summary is “{title}” ({name}) widget updated in {sidebar}. This would be translated into French when it is output, but it's not searchable in French, i.e. I can't search for mise á jour I must search for updated even though my interface says mise á jour.

Further, foreign search terms could return unrelated results if they match an English word that has a different meaning. e.g. gift (German for poison) would match gift (English for present). Fine means 'slim' in French. Will is the German word for 'want'. Attend in French means 'wait'. Share in French means 'flesh'.

This is why the summaries should be primarily generated, not stored.

Here's another alternative: store template variables as a JSON array in the database (but not associative), and have connector classes generate the summaries.

_connectors/widgets.php_

$summary_data = $record->summary_data;// array( 'Hello World', 'Text', 'Primary Sidebar' );
$summary = sprintf(
    _x(
        '"%1$s” (%2$s) widget updated in %3$s',
        '1. Widget title, 2. Widget type, 3. Sidebar name',
        'stream'
    ),
    esc_html( $summary_data[0] ),
    esc_html( $summary_data[1] ),
    esc_html( $summary_data[2] )
);

This would produce an acceptable result in any language. It also keeps the terms "Hello World", "Text" and "Primary Sidebar" searchable.

note: I purposely avoided translating the summary data.

@westonruter
Copy link
Contributor

Yeah, I can see that my use of template variables like {name} and {title} in #391 would be problematic for searching if the template was stored in the Stream record.

The problem with always generating summaries on the fly is that they cannot be searched.

@shadyvb
Copy link
Contributor

shadyvb commented Apr 12, 2014

Lot's of discussion on the topic, it is starting to feel like gold-plating.
Let me sum up the problems, concerns we're having:

  1. Summaries aren't translated for different users with different back-end locales
  2. Summaries are added to DB in the arbitrary language the current user has set.
  3. Summaries need to be searchable
  4. Stream needs to be as light-weight as it can possibly be

From my perspective, those four points are reversed in priority, eg: #4 is the most important concern, ordered by the user-base affected by the concern/problem.

Reading through discussion:

  1. There is so little chance where admins of a single site do not share a single language they can communicate with. So having a multi-lingual summaries is not that big of a concern.
  2. However, having a Stream with some entries in French and some in English is not something we want to happen. ( To me, this is the real problem here, more on that below ).
  3. It is not good to store summary templates in DB for EACH record, because they'll eat up un-needed space, and will probably change over-time as well ( though it'll be easier to change it via update routines, though now that we're moving to cloud, we're leaning towards less content updates as possible ).
  4. Generating summaries on the fly seems good, but it voids the searchability of them. I cannot see this working for multi-lingual setups at all, ie: you cannot search for French summaries in a template based approach, or in a generate-on-the-fly approach.
  5. I'm not sure how you guys see this, but i think some of our most important users are people with large networks, adding more conditions and extra translation layers to Stream will only hurt performance, maybe not in a big way, but still.

So..
The only thing I see needs to be addressed here, is having the ability to specify the language in which Stream will save summaries and meta. I'm not really sure how to do this yet, but it'd be great to unify the language Stream uses to interact with data.

/cc @westonruter @fjarrett @powelski @lukecarbis

@frankiejarrett
Copy link
Contributor

@shadyvb I agree with all of your points, and YES, performance is not a feature it's the number one requirement for Stream and the massive amounts of data that can result from using it.

I thought Stream is already saving summaries and meta values according to whatever WPLANG is? I tested by changing the constant and all summaries and meta values were localized (if the translations were provided).

screen shot 2014-04-12 at 8 26 49 pm

To me, this is perfectly acceptable, as @westonruter said, changing your site to a different language later is going to be a rare event.

The more probable event would be if someone was using Stream for a while but their native language was not supported until later. My argument would be that if you decided to use Stream without your language officially being available, or if the translators don't keep up fast enough, then your old records will just have to be in the other language. I don't think there should be an expectation that we must always retroactively support a language either by an update routine or doing everything on-the-fly.

I'm glad we had this discussion, but I think we've exhausted the conversation for now. We can safely close this and deal with any localization issues for meta values on a case-by-case basis.

@lukecarbis
Copy link
Contributor

@ThemeBoy This is a discussion which I suspect doesn't have any sort of resolution. The idea here is that we're saving a "Summary" of each record. Of course, that summary gets saved using the language pack that's currently being used.

This means that the summary isn't truly translatable. For example, if a site changes it's language, saved records will retain the old language. Also, if you are viewing the records for more than one site at once (think multisite) you may be seeing records in multiple languages.

Any thoughts on how to achieve a language independent summary?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants