-
Notifications
You must be signed in to change notification settings - Fork 378
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rfc: updated file-structure for multi-locale .lg and .lu authoring #1922
Closed
Closed
Changes from all commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
f44ed7c
Add multi-locale RFC
cwhitten 0fbade1
Self edits
cwhitten c3cccdc
Self edit
cwhitten 26c8149
Merge branch 'master' into cwhitten/multi-locale
cwhitten 75b4bed
Adds rfc to acceptable pull request prefix
cwhitten 8c646ca
Feedback/consolidation
cwhitten 2367d95
Merge branch 'cwhitten/multi-locale' of https://github.com/Microsoft/…
cwhitten ed68504
Updates
cwhitten 577580d
More edits
cwhitten e0e705a
Merge branch 'master' into cwhitten/multi-locale
cwhitten 1618d22
Merge branch 'master' into cwhitten/multi-locale
cwhitten File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,152 @@ | ||
### Multi-locale LG & LU authoring in Composer | ||
|
||
--- | ||
|
||
The following RFC is intended to be guided by the following scenarios to be supported in Composer: | ||
|
||
1. I can create, modify, or delete a .lg or .lu file for a dialog or create a common.lg file in a language of my choosing. | ||
2. I can set the language for assets to be rendered in the authoring surface and forms. | ||
3. If the Shell cannot find the configured language, the original authored language of the asset will be used. | ||
4. I can create a full set of files in a language (base -> target(s)) that copies base as target(s) initial implementation. | ||
5. I can copy all Bot directory assets to a location of my choosing. | ||
6. I can load Bot Assets to replace the current assets if identical files exist to over-write implementations with new or modified versions | ||
|
||
##### Implementation: | ||
|
||
Providing a good experience to allow translations of these files can be complex. In considering the UX to provide support for language specific .lg and .lu files, we should take the opportunity to consider what has become convention for how Composer bot assets are represented on the filesytem. This RFC lays out different options to write files to disk logically and proposes an update to the current convention. | ||
|
||
The distribution of .lg and .lu files in a set of Composer assets currently look like the following: | ||
|
||
``` | ||
/ComposerDialogs | ||
/common | ||
common.lg | ||
/Main | ||
Main.dialog | ||
Main.lg | ||
Main.lu | ||
/DialogFoo | ||
DialogFoo.dialog | ||
DialogFoo.lg | ||
DialogFoo.lu | ||
/DialogBar | ||
DialogBar.dialog | ||
DialogBar.lg | ||
DialogBar.lu | ||
/DialogBaz | ||
DialogBaz.dialog | ||
DialogBaz.lg | ||
DialogBaz.lu | ||
``` | ||
|
||
##### Problem | ||
|
||
We want to allow an editing experience for these files as well as allow a user to add .lg and .lu in different languages and make sensible choices on the user's behalf in how we structure the asset directory. | ||
|
||
What this would look like in today's file representation: | ||
|
||
``` | ||
/ComposerDialogs | ||
/common | ||
common.en-us.lg | ||
common.fr.lg | ||
common.de.lg | ||
/Main | ||
Main.dialog | ||
Main.en-us.lg | ||
Main.fr.lg | ||
Main.de.lg | ||
Main.lu | ||
/DialogFoo | ||
DialogFoo.dialog | ||
DialogFoo.en-us.lg | ||
DialogFoo.fr.lg | ||
DialogFoo.de.lg | ||
DialogFoo.lu | ||
/DialogBar | ||
DialogBar.dialog | ||
DialogBar.en-us.lg | ||
DialogBar.fr.lg | ||
DialogBar.de.lg | ||
DialogBar.lu | ||
/DialogBaz | ||
DialogBaz.dialog | ||
DialogBaz.en-us.lg | ||
DialogBaz.fr.lg | ||
DialogBaz.de.lg | ||
DialogBaz.lu | ||
``` | ||
|
||
##### Issues with current file structure | ||
|
||
"Main" became the convention to note the entry dialog, but this is a heavy constraint. We can reconsider to something more expressive. Instead of generating a `/BotName/Main.dialog`, why can't we generate a `/BotName/<BotName>.dialog` as the entry point? | ||
|
||
Representing the .lu and .lg locally with the .dialog file is logical in that it better places the files where they are being used. This makes a Dialog directory more portable in a world where Dialogs are not only used in a single bot. This file structure is a natural place to graduate to a system where Dialogs hold their own dependencies (.lu, .lg) and can be published or shared outside of the current bot. | ||
|
||
A example downside of this approach is that this distribution of files may not be set up for domain specific work in one of the file-formats. One could prefer that all the .lg files exist in its own directly, and all the .lg files exist in its own directory, or all "en-us" files live in an "en-us" directory, and so on. Because of the anticipation of a Dialog and its associated content files (.lu, .lg) are intended to be shared via mechanisms currently planned to be built, a structure to imply a tigher binding between .dialog, .lg, .lu is currently the preferred approach. | ||
|
||
##### Note | ||
|
||
1. This proposal only applies to a filesystem-based storage plugin, and has little bearing on a database-backed store plugin implementation. | ||
2. This is ideally the final time we make a significant naming or serialization decision before Composer hits GA. If we wanted to, for example, lowercase files and/or directories, this would be the time to do it. | ||
|
||
##### Alternative structures | ||
|
||
1. Assets partitioned based on dialog and dependent assets | ||
|
||
Benefit: Dependency encapsulation, recursive, convention can be applied to scenarios like publishing local dialogs and associated dependencies, or pulling down dialogs and associated dependencies from a external/third-party source. | ||
|
||
``` | ||
/coolbot | ||
coolbot.dialog | ||
/language-generation | ||
/en-us | ||
common.en-us.lg | ||
coolbot.en-us.dialog | ||
/language-understanding | ||
main.en-us.lu | ||
/dialogs | ||
/foo | ||
foo.dialog | ||
/language-generation | ||
/en-us | ||
foo.en-us.dialog | ||
/language-understanding | ||
foo.en-us.lu | ||
``` | ||
|
||
2. Assets partitioned by asset type | ||
|
||
Benefit: Physically maps to a content editing scenario (.lu, .lg) | ||
|
||
``` | ||
/coolbot | ||
/dialogs | ||
coolbot.dialog | ||
foo.dialog | ||
bar.dialog | ||
baz.dialog | ||
/language-generation | ||
/en-us | ||
common.en-us.lg | ||
coolbot.en-us.lg | ||
foo.en-us.lg | ||
bar.en-us.lg | ||
baz.en-us.lg | ||
/language-understanding | ||
/en-us | ||
main.en-us.lu | ||
foo.en-us.lu | ||
bar.en-us.lu | ||
baz.en-us.lu | ||
``` | ||
|
||
##### Proposal | ||
|
||
1. Adopt a lower-case naming convention for files and directories | ||
2. Remove hard-coded "Main" entrypoint requirement and key off of the bot name <botname>.dialog | ||
3. Adopt #1 alternative structure option for physical layout of .dialog, .lu, .lg | ||
|
||
##### Important consideration: | ||
|
||
When attempting file lookups, we should try and be agnostic to the file structure as much as possible, in trying to support the scenario where one authors these assets outside of Composer. We shouldn't limit the realistic scenario that users would wish to author files in a different text editor or IDE and load them into Composer expecting a full experience. To fully support this, we aim to utilize the Adaptive Dialog ResourceManager and supporting modules so there is near to exact parity in how the runtime and authoring surface do file lookups and resolution. Whatever we choose for a directory convention, we should not hardcode it into the resolution logic. |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually like this alternative more, because the workflow i knew is that users tend to group assets by locale (VA is an example).
If our folder structure is like this
I can image that the effort of adding a new language fr-fr would be as simple as copying the en-us folder into a fr-fr folder and do the editing in place.
In my opinion, this will also help team collaboration because it separate the concern of conversation designers and content write, and even model trainers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't over-index on a physical layout of the files to align well with collaboration scenarios, though it was mentioned in the preface of the RFC and should be considered to an extent. Abstraction of the file metaphor will need to exist regardless to provide an appropriate experience for content writers.
That said, partitioning on dialog/lu/lg is a valid alternative, but I'd like to discuss a bit more. I agree that in a content editing scenario there is an advantage to physically laying out the files this way. From a dialog "clone" or sharing/publishing to some central location for re-use, physically laying out the dialog/lu/lg in a way that encapsulates the dialog's dependencies would have the advantage.
Additionally, I'd like to propose we hoist the main.dialog to the root of the bot. I see the following layouts as reasonable adjustment.
While #2 looks clean physically, I tend to prefer the encapsulation and recursive nature of #1 and sets us up nicely to move/share dialogs between bots in the future.
cc @vishwacsena @benbrown
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the RFC to reflect this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#2 do looks clean physically, it would be even more cleaner if we cover the "settings" folder and "schemas" folder.
#1 is recursively and do reflective the dialog structure in certain way, and is better than #2 on certain scenarios like sharing. But my biggest concern of a recursive presentation is that this enforce a tree structure but the dialog structure is actually a graph, that said, if two dialog A, B are both referring C, who should be encapsulate C? Maybe symbol-link can help on this, but as AFAIK, symbol-link in Windows is a mess, also this mean our solution is more complex and have more coupled into a very specific fs concept).
From another perspective, I agree that we are not designing physical layout for collaboration, but i would argue that we probably should not result in a structure that somehow restricting or limiting collaboration on physical files.
A recursive structure, in my opinion, is very easy to go wrong if people ever touch the files themselves and not knowing what's wrong. And it's also hard to reason over the structure, let's say, figure out how many language models are been used. If we don't want users to touch or reason over physical files manually, then why should we align the physical files recursively, why not a layout more friendly for both Composer and user with other tools? What do you guys think @vishwacsena @benbrown @christopheranderson
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's easy to jump into the mental gymnastics of what are full-blown package manager and dependency resolution challenges, like tree/graph and local module linking, etc. My hope is we can table that discussion but keep it in mind when we make a decision here. But this is my answer to your question:
While #1 physically is more nested it is still sensible to reason about and edit with some education and clarity. Can you expand more on how #1 restricts & limits collaboration scenarios? I don't immediately see that.
#2 feels limiting and I'm concerned it suits a point in time (now) that won't work in the future. What if a dialog/sub-tree of dialogs want their own settings file? What if a dialog/sub-tree of dialogs want their own schema definition? We're not nearly as boxed in with #1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm late (or early to the party depending on your perspective.)
The early design decision for ResourceExplorer was to make resource ids unique and location independent. The reasoning for this was that:
a. It mapped to flat storage easier
b. It allows people to organize files in any manner which makes sense to them.
c. it means references to resources are less brittle because they continue to be correct even if you move files around.
That said, having an convention about how Composer represents them or the way that we we decide to have templates organize things seems like a good idea and will end up being something that people copy.
Some questions/comments I have are:
a. I don't get why the making everything lowercase is a good idea. What is driving that? If it's to make it easier to not have mistakes in references we can make case insensitive, but case is super useful for readability.
b. I am definitely biased towards assets being co-located so that LU/Dialog/LG can be worked with in the local, but I also believe there will be "global" shared assets which will be consumed by the things in the local. It feels like we aren't talking about that. For example, a bunch of LG templates defined at the root which are imported into the local LG files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's implied in this thread is the
common.<locale-code>.lg
convention, which exists in the proposal, and we do this today as well. The local .lg files will have the ability to import from this asset regardless of how Composer lays the files out. We should use a local .lg template that imports the shared asset automatically that users can then extend to their needs.You bring up a good point that I mention in the RFC - I posit that as soon as it is ready, Composer takes a hard dependency on the JS Adaptive ResourceExplorer in its storage plugin so the asset resolution mechanisms are exactly the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By "limiting or restricting collaration" of #1, my assumption was "people will collaborate, to some extent, on raw files, no matter what UI we provide".
Based on that assumption, my feeling is a recursive structure is a little bit harder to collocate on some scenarios i was imaging like
dialogA\dialogs\dialogC\dialogs\dialogsD\language-generation\a.lg
things like this,[import](../../dialogA/LG/b.lg)
, moving this dialog will cause the reference to break. (Put id and use resourceExplorer to implement a customized importResolver can solve this).Those are kind of no big deal issue, it's just thinking about some scenario (may not all valid) give me a general feeling that a recursive structure is not friendly on physically copying, moving, manipulating the files. So hope we take this into consideration.
Regarding the flexibility you talked about
If we organize the dialog as tree\sub-tree, we definitely gain some extra space to configure\customize on tree\sub-tree, while at the same time,
the cost is we organize the dialog as tree
.If it's the last chance we want to make change to folder structure, a structure without flavor perhaps is more likely to last than a structure with more flavor.
And, at the end, anyhow we should pick resourceExplorer in js to identify and load resources, what's missing today in resourceExplorer is creating resource following some pattern\layout, that's a gap Composer need to fill.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm even later than Tom. A few things:
Part of the reason that I care about breaking things up into smaller files is that there is important information in the structure of the filename which should allow being intelligent about merging regenerated assets. We don't have to have separate files if we support id as a first class thing when defining things inline. An id is either explicitly specified inline or cannot be specified and comes from the filename.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like #1 but I have few questions (see below)
Also, I was providing some of my personal experience working on internationalization/ localization to @cwhitten other day. In my prior experience both on Internet Explorer as well as on Cortana, what I have seen is localization is more effective when the person localizing is enabled with three things - a) able to see full context into what's going on rather than mere strings in a text file b) able to readily test their changes c) able to always see what the base language version of the exact same string was (so they are essentially not just overwriting the base language string and then losing context into what the old string was).
So my 2c - we should not try to gate our decision to enable a purely file based localization. Instead just have the localization team use composer and use source control to reject any changes to .dialog files. In fact this was how IE was able to simultaneously ship 60+ languages on the same day as English.
With that said, @cwhitten, few questions for option #1