Skip to content

Commit

Permalink
Do binomial check in label_encode functions only when adding new labe…
Browse files Browse the repository at this point in the history
…ls (#1892)

For binomial problem number of labels should be less or equal to two. This check should only be performed when a new label is encountered in the target column. When running label encoding in parallel, we now do the check making sure the label was not added from another thread.

Closes #1891
  • Loading branch information
oleksiyskononenko authored and st-pasha committed Jun 28, 2019
1 parent 3222eb8 commit 43ded51
Show file tree
Hide file tree
Showing 2 changed files with 9 additions and 11 deletions.
2 changes: 1 addition & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -209,7 +209,7 @@ and this project adheres to [Semantic Versioning](http://semver.org/).
- Thanks to everyone who helped make `datatable` more stable by discovering
and reporting bugs that were fixed in this release:

- [Arno Candel][] (#1619, #1730, #1738, #1800, #1803, #1846, #1857),
- [Arno Candel][] (#1619, #1730, #1738, #1800, #1803, #1846, #1857, #1891),
- [Antorsae][] (#1639),
- [Olivier][] (#1872),
- [Hawk Berry][] (#1834),
Expand Down
18 changes: 8 additions & 10 deletions c/models/label_encode.h
Original file line number Diff line number Diff line change
Expand Up @@ -124,12 +124,11 @@ void label_encode_fw(const Column* col, dtptr& dt_labels, dtptr& dt_encoded) {
outdata[irow] = labels_map[v];
} else {
lock.exclusive_start();
if (stype_to == SType::BOOL && labels_map.size() == 2) {
throw ValueError() << "Target column for binomial problem cannot "
"contain more than two labels";
}

if (labels_map.count(v) == 0) {
if (stype_to == SType::BOOL && labels_map.size() == 2) {
throw ValueError() << "Target column for binomial problem cannot "
"contain more than two labels";
}
size_t nlabels = labels_map.size();
labels_map[v] = static_cast<T_to>(nlabels);
outdata[irow] = labels_map[v];
Expand Down Expand Up @@ -192,12 +191,11 @@ void label_encode_str(const Column* col, dtptr& dt_labels, dtptr& dt_encoded) {
outdata[irow] = labels_map[v];
} else {
lock.exclusive_start();
if (stype_to == SType::BOOL && labels_map.size() == 2) {
throw ValueError() << "Target column for binomial problem cannot "
"contain more than two labels";
}

if (labels_map.count(v) == 0) {
if (stype_to == SType::BOOL && labels_map.size() == 2) {
throw ValueError() << "Target column for binomial problem cannot "
"contain more than two labels";
}
size_t nlabels = labels_map.size();
labels_map[v] = static_cast<T_to>(nlabels);
outdata[irow] = labels_map[v];
Expand Down

0 comments on commit 43ded51

Please sign in to comment.