Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add down sampling before luis publish #2629

Merged
merged 21 commits into from
Apr 23, 2020

Conversation

lei9444
Copy link
Contributor

@lei9444 lei9444 commented Apr 13, 2020

Description

If the number of intent's utterances more than 10(can be customized) times the least number of the intent's utterances, do a re-sampling process to keep the proportion;

If detect > 15000 utterances, downsize the utterances.

add down sampling config to bot setting
image

Task Item

refs #2610

Screenshots

@lei9444 lei9444 marked this pull request as ready for review April 15, 2020 09:31
@boydc2014
Copy link
Contributor

boydc2014 commented Apr 17, 2020

In terms of putting it into samples, we should give this a meaningful name

downsampling: {
max_imbalance_ratio: 10,
max_utterances_allowed: 15000
}

@lei9444
Copy link
Contributor Author

lei9444 commented Apr 17, 2020

In terms of putting it into samples, we should give this a meaningful name

downsampling: {
max_imbalance_ratio: 10,
max_utterance_allowed: 15000
}

Hi @boydc2014 I have add the config to client setting now. user can update it in setting page

@boydc2014
Copy link
Contributor

Looks good to me, can you ask @hibrenda or someone from skill team to have a try?

boydc2014
boydc2014 previously approved these changes Apr 17, 2020
@github-actions
Copy link

Coverage Status

Coverage increased (+0.09%) to 41.033% when pulling 248cebe on lei9444:downsampling into 723785b on microsoft:master.

@hibrenda
Copy link
Contributor

I tried this for todo skill. 2 findings.

  • down sampling does not only resizing the big ratio intent (eg. ratio is more than 1:10) but also resizing the small ratio intent (eg. 1:8) which should not be happen
  • For bot developer's awareness, hope to get the notification after down sampling for some intents

@boydc2014 boydc2014 self-assigned this Apr 21, 2020
@lei9444
Copy link
Contributor Author

lei9444 commented Apr 21, 2020

@hibrenda @boydc2014 I have fix the first commit. For the second, I think we can have more discusses about this.

@hibrenda
Copy link
Contributor

Verified the the fix. now the resizing only for the utterance number bigger than the ratio defined

@cwhitten
Copy link
Member

thank you @hibrenda

@boydc2014 boydc2014 added the Approved to merge approved, waiting to be merged label Apr 23, 2020
@cwhitten cwhitten merged commit 1b0c06c into microsoft:master Apr 23, 2020
@lei9444 lei9444 deleted the downsampling branch May 9, 2020 00:38
lei9444 added a commit to lei9444/BotFramework-Composer-1 that referenced this pull request Jun 15, 2021
* add bootstrap sampling before publish

* add reservoir sample if the utterances' number > 15000

* update the sample logic

* add unit test for sampler

* add downsampling config to bot

* update the type

* fix unit test

* don't do sample for the ratio is ok

Co-authored-by: Chris Whitten <christopher.whitten@microsoft.com>
Co-authored-by: Dong Lei <donglei@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Approved to merge approved, waiting to be merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants