Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Race when Predictor::Predictor is invoked for a shared Booster between multiple threads #6142

Open
stonebrakert6 opened this issue Oct 13, 2023 · 1 comment
Labels

Comments

@stonebrakert6
Copy link

stonebrakert6 commented Oct 13, 2023

Description

When Predictor::Predictor is invoked for a shared Booster between multiple threads, it causes data races for shared data in Boosting object(GBDT)

Here is an API i.e LGBM_BoosterPredictForMatSingleRowFastInit() which if called from 2 separate threads concurrently using the same booster would cause data race(see the fully reproducible code below)

LGBM_BoosterPredictForMatSingleRowFastInit()-> Booster::SetSingleRowPredictor -> SingleRowPredictor::SingleRowPredictor() ->Predictor::Predictor() -> GBDT::InitPredict()

which writes concurrently i.e data race to variables(atleast) num_iteration_for_pred_ and start_iteration_for_pred_ of Boosting object(concretely GBDT) src/boosting/gbdt.h:422

Here is another/alternate API i.e LGBM_BoosterPredictForMat() which when invoked concurrently for the same Booster, would cause data race

LGBM_BoosterPredictForMat() -> Booster::Predict() -> Booster::CreatePredictor() -> Predictor::Predictor()

See Issue 6024 comments here and here

#6024

Below is the code which when ran with Thread Sanitizer should reproduce/prove the race
I am trying to share a BoosterHandle between multiple threads only for inference/prediction. I intend to use the API LGBM_BoosterPredictForMatSingleRowFast and hence need to use LGBM_BoosterPredictForMatSingleRowFastInit to create/initialize a FastConfigHandle.

Reproducible example

#include <array>
#include <fstream>
#include <iostream>
#include <sstream>
#include <thread>
#include <vector>

#include "LightGBM/c_api.h"

const int kFeatures = 13;

std::vector<std::array<double, kFeatures>> readFile(const std::string& file) {
  std::vector<std::array<double, kFeatures>> ans;
  std::ifstream f(file);
  if (!f.is_open()) {
    std::cout << "Could not open file " << file << '\n';
    return ans;
  }
  bool is_header = true;
  std::string temp;
  int nline = 0;
  while (std::getline(f, temp)) {
    ++nline;
    if (is_header) {
      is_header = false;
      continue;
    }
    std::istringstream s(temp);
    std::string field;
    int idx = 0;
    std::array<double, kFeatures> row;
    while (std::getline(s, field, ',')) {
      row[idx++] = std::stod(field);
    }
    if (idx != kFeatures) {
      ans.clear();
      std::cout << "Incorrect # of cols in line " << nline << '\n';
      return ans;
    }
    ans.emplace_back(row);
  }
  return ans;
}

// shared booster handle for all threads
BoosterHandle handle;
// Input data
std::vector<std::array<double, kFeatures>> data;
// Final result or all predictions
std::vector<double> result;

void predict(ssize_t beg, ssize_t end) {
  FastConfigHandle config;
  int rc = LGBM_BoosterPredictForMatSingleRowFastInit(
      handle, C_API_PREDICT_NORMAL, 0, 0, C_API_DTYPE_FLOAT64, kFeatures, "",
      &config);
  if (rc != 0) {
    abort();
  }
  for (ssize_t i = beg; i < end; i++) {
    int64_t len = 0;
    rc = LGBM_BoosterPredictForMatSingleRowFast(config, &data[i], &len,
                                                &result[i]);
    if (rc != 0) {
      abort();
    }
  }
  rc = LGBM_FastConfigFree(config);
  if (rc != 0) {
    abort();
  }
}

int main(int argc, char* argv[]) {
  if (argc != 4) {
    std::cout
        << "Usage a.out <model_file> <input_file> <nworkers> ...exiting\n";
    return 1;
  }
  int nworkers = std::stoi(argv[3]);
  int num_iterations;
  std::cout << "Loading the Model from file\n";
  int rc = LGBM_BoosterCreateFromModelfile(argv[1], &num_iterations, &handle);
  if (rc != 0) {
    std::cout << "LGBM_BoosterCreateFromModelfile() returned " << rc << '\n';
    return 1;
  }
  data = readFile(argv[2]);
  ssize_t nrows = ssize(data);
  result.resize(nrows);
  std::vector<std::thread> workers(nworkers);
  ssize_t rows_per_thread = nrows / nworkers;
  for (ssize_t i = 0; i < nworkers; i++) {
    if (i != nworkers - 1) {
      workers[i] =
          std::thread(predict, rows_per_thread * i, rows_per_thread * (i + 1));
    } else {
      workers[i] = std::thread(predict, rows_per_thread * i, nrows);
    }
  }
  for (std::thread& t : workers) {
    t.join();
  }
  rc = LGBM_BoosterFree(handle);
  if (rc != 0) {
    abort();
  }
  return 0;
}

Environment info

LightGBM version or commit hash:

git log --oneline

8ed371c (HEAD -> master, origin/master, origin/HEAD) set explicit number of threads in every OpenMP parallel region (#6135)

Command(s) you used to install LightGBM

# this is part of a Makefile
mkdir -p LightGBM/build
env CC=$(CC) CXX=$(CXX) cmake -DUSE_DEBUG=ON -DUSE_SANITIZER=ON -DENABLED_SANITIZERS="thread" -DUSE_OPENMP=OFF -S LightGBM -B LightGBM/build
env CC=$(CC) CXX=$(CXX) VERBOSE=1 $(MAKE) -C LightGBM/build

Additional Comments

TSAN_OPTIONS="halt_on_error=1" ./builds/debug/d.out ~/Downloads/model.txt ~/Downloads/input_1k.txt 2
Loading the Model from file
==================
WARNING: ThreadSanitizer: data race (pid=15113)
  Read of size 4 at 0x7b5400000164 by thread T2:
    #0 int const& std::min<int>(int const&, int const&) /usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/stl_algobase.h:235:11 (lib_lightgbm.so+0x58dcd5)
    #1 LightGBM::GBDT::InitPredict(int, int, bool) /home/kartik/codeberg/lightgbm_bm/third_party/LightGBM/src/boosting/gbdt.h:424:23 (lib_lightgbm.so+0x587276)
    #2 LightGBM::Predictor::Predictor(LightGBM::Boosting*, int, int, bool, bool, bool, bool, int, double) /home/kartik/codeberg/lightgbm_bm/third_party/LightGBM/src/application/predictor.hpp:61:15 (lib_lightgbm.so+0x54125e)
    #3 LightGBM::SingleRowPredictorInner::SingleRowPredictorInner(int, LightGBM::Boosting*, LightGBM::Config const&, int, int) /home/kartik/codeberg/lightgbm_bm/third_party/LightGBM/src/c_api.cpp:81:26 (lib_lightgbm.so+0x564367)
    #4 LightGBM::SingleRowPredictor::SingleRowPredictor(yamc::alternate::basic_shared_mutex<yamc::rwlock::ReaderPrefer>*, char const*, int, int, int, LightGBM::Boosting*, int, int) /home/kartik/codeberg/lightgbm_bm/third_party/LightGBM/src/c_api.cpp:124:109 (lib_lightgbm.so+0x564d12)
    #5 LightGBM::Booster::InitSingleRowPredictor(int, int, int, int, int, char const*) /home/kartik/codeberg/lightgbm_bm/third_party/LightGBM/src/c_api.cpp:440:52 (lib_lightgbm.so+0x51f55f)
    #6 LGBM_BoosterPredictForMatSingleRowFastInit /home/kartik/codeberg/lightgbm_bm/third_party/LightGBM/src/c_api.cpp:2443:18 (lib_lightgbm.so+0x50bb41)
    #7 predict(long, long) /home/kartik/codeberg/bug_lightgbm/main.cc:54:12 (d.out+0x12c76a)
    #8 decltype(std::declval<void (*)(long, long)>()(std::declval<long>(), std::declval<long>())) std::__1::__invoke[abi:v170000]<void (*)(long, long), long, long>(void (*&&)(long, long), long&&, long&&) /home/kartik/build/git_llvm/bin/../include/c++/v1/__type_traits/invoke.h:340:25 (d.out+0x13e602)
    #9 void std::__1::__thread_execute[abi:v170000]<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, void (*)(long, long), long, long, 2ul, 3ul>(std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, void (*)(long, long), long, long>&, std::__1::__tuple_indices<2ul, 3ul>) /home/kartik/build/git_llvm/bin/../include/c++/v1/__thread/thread.h:221:5 (d.out+0x13e4ef)
    #10 void* std::__1::__thread_proxy[abi:v170000]<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, void (*)(long, long), long, long>>(void*) /home/kartik/build/git_llvm/bin/../include/c++/v1/__thread/thread.h:232:5 (d.out+0x13dbb2)

  Previous write of size 4 at 0x7b5400000164 by thread T1:
    #0 LightGBM::GBDT::InitPredict(int, int, bool) /home/kartik/codeberg/lightgbm_bm/third_party/LightGBM/src/boosting/gbdt.h:422:29 (lib_lightgbm.so+0x587214)
    #1 LightGBM::Predictor::Predictor(LightGBM::Boosting*, int, int, bool, bool, bool, bool, int, double) /home/kartik/codeberg/lightgbm_bm/third_party/LightGBM/src/application/predictor.hpp:61:15 (lib_lightgbm.so+0x54125e)
    #2 LightGBM::SingleRowPredictorInner::SingleRowPredictorInner(int, LightGBM::Boosting*, LightGBM::Config const&, int, int) /home/kartik/codeberg/lightgbm_bm/third_party/LightGBM/src/c_api.cpp:81:26 (lib_lightgbm.so+0x564367)
    #3 LightGBM::SingleRowPredictor::SingleRowPredictor(yamc::alternate::basic_shared_mutex<yamc::rwlock::ReaderPrefer>*, char const*, int, int, int, LightGBM::Boosting*, int, int) /home/kartik/codeberg/lightgbm_bm/third_party/LightGBM/src/c_api.cpp:124:109 (lib_lightgbm.so+0x564d12)
    #4 LightGBM::Booster::InitSingleRowPredictor(int, int, int, int, int, char const*) /home/kartik/codeberg/lightgbm_bm/third_party/LightGBM/src/c_api.cpp:440:52 (lib_lightgbm.so+0x51f55f)
    #5 LGBM_BoosterPredictForMatSingleRowFastInit /home/kartik/codeberg/lightgbm_bm/third_party/LightGBM/src/c_api.cpp:2443:18 (lib_lightgbm.so+0x50bb41)
    #6 predict(long, long) /home/kartik/codeberg/bug_lightgbm/main.cc:54:12 (d.out+0x12c76a)
    #7 decltype(std::declval<void (*)(long, long)>()(std::declval<long>(), std::declval<long>())) std::__1::__invoke[abi:v170000]<void (*)(long, long), long, long>(void (*&&)(long, long), long&&, long&&) /home/kartik/build/git_llvm/bin/../include/c++/v1/__type_traits/invoke.h:340:25 (d.out+0x13e602)
    #8 void std::__1::__thread_execute[abi:v170000]<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, void (*)(long, long), long, long, 2ul, 3ul>(std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, void (*)(long, long), long, long>&, std::__1::__tuple_indices<2ul, 3ul>) /home/kartik/build/git_llvm/bin/../include/c++/v1/__thread/thread.h:221:5 (d.out+0x13e4ef)
    #9 void* std::__1::__thread_proxy[abi:v170000]<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, void (*)(long, long), long, long>>(void*) /home/kartik/build/git_llvm/bin/../include/c++/v1/__thread/thread.h:232:5 (d.out+0x13dbb2)

  Location is heap block of size 584 at 0x7b5400000000 allocated by main thread:
    #0 operator new(unsigned long) /home/kartik/llvm-project/compiler-rt/lib/tsan/rtl/tsan_new_delete.cpp:64:3 (d.out+0x12b377)
    #1 LightGBM::Boosting::CreateBoosting(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&, char const*) /home/kartik/codeberg/lightgbm_bm/third_party/LightGBM/src/boosting/boosting.cpp:51:19 (lib_lightgbm.so+0x582aa4)
    #2 LightGBM::Booster::Booster(char const*) /home/kartik/codeberg/lightgbm_bm/third_party/LightGBM/src/c_api.cpp:164:21 (lib_lightgbm.so+0x519834)
    #3 LGBM_BoosterCreateFromModelfile /home/kartik/codeberg/lightgbm_bm/third_party/LightGBM/src/c_api.cpp:1843:43 (lib_lightgbm.so+0x503843)
    #4 main /home/kartik/codeberg/bug_lightgbm/main.cc:83:12 (d.out+0x12c98c)

  Thread T2 (tid=15117, running) created by main thread at:
    #0 pthread_create /home/kartik/llvm-project/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp:1020:3 (d.out+0xa43db)
    #1 std::__1::__libcpp_thread_create[abi:v170000](unsigned long*, void* (*)(void*), void*) /home/kartik/build/git_llvm/bin/../include/c++/v1/__threading_support:371:10 (d.out+0x13db29)
    #2 std::__1::thread::thread<void (&)(long, long), long, long&, void>(void (&)(long, long), long&&, long&) /home/kartik/build/git_llvm/bin/../include/c++/v1/__thread/thread.h:248:16 (d.out+0x12e35e)
    #3 main /home/kartik/codeberg/bug_lightgbm/main.cc:98:20 (d.out+0x12ccaa)

  Thread T1 (tid=15116, running) created by main thread at:
    #0 pthread_create /home/kartik/llvm-project/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp:1020:3 (d.out+0xa43db)
    #1 std::__1::__libcpp_thread_create[abi:v170000](unsigned long*, void* (*)(void*), void*) /home/kartik/build/git_llvm/bin/../include/c++/v1/__threading_support:371:10 (d.out+0x13db29)
    #2 std::__1::thread::thread<void (&)(long, long), long, long, void>(void (&)(long, long), long&&, long&&) /home/kartik/build/git_llvm/bin/../include/c++/v1/__thread/thread.h:248:16 (d.out+0x12e088)
    #3 main /home/kartik/codeberg/bug_lightgbm/main.cc:96:11 (d.out+0x12cbe3)

SUMMARY: ThreadSanitizer: data race /usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/stl_algobase.h:235:11 in int const& std::min<int>(int const&, int const&)
==================

@jameslamb jameslamb added the bug label Oct 13, 2023
@Ten0
Copy link
Contributor

Ten0 commented Oct 23, 2023

LGBM_BoosterPredictForMatSingleRowFastInit()-> Booster::SetSingleRowPredictor

I think the second example can indeed race but I'm not sure how the first one does currently race before #6024 because there's a unique lock here:

UNIQUE_LOCK(mutex_)

Ten0 added a commit to Ten0/lightgbm-rs that referenced this issue Oct 23, 2023
Ten0 added a commit to Ten0/LightGBM that referenced this issue Jan 12, 2024
Ten0 added a commit to Ten0/LightGBM that referenced this issue Jan 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants