diff --git a/demo/README.md b/demo/README.md index 26deb453bb58..df53b05bb568 100644 --- a/demo/README.md +++ b/demo/README.md @@ -145,7 +145,7 @@ Send a PR to add a one sentence description:) ## Tools using XGBoost - [BayesBoost](https://github.com/mpearmain/BayesBoost) - Bayesian Optimization using xgboost and sklearn API -- [FLAML](https://github.com/microsoft/FLAML) - An open source AutoML library +- [FLAML](https://github.com/microsoft/FLAML) - An open source AutoML library designed to automatically produce accurate machine learning models with low computational cost. FLAML includes [XGBoost as one of the default learners](https://github.com/microsoft/FLAML/blob/main/flaml/model.py) and can also be used as a fast hyperparameter tuning tool for XGBoost ([code example](https://microsoft.github.io/FLAML/docs/Examples/AutoML-for-XGBoost)). - [gp_xgboost_gridsearch](https://github.com/vatsan/gp_xgboost_gridsearch) - In-database parallel grid-search for XGBoost on [Greenplum](https://github.com/greenplum-db/gpdb) using PL/Python - [tpot](https://github.com/rhiever/tpot) - A Python tool that automatically creates and optimizes machine learning pipelines using genetic programming. diff --git a/jvm-packages/README.md b/jvm-packages/README.md index 451a0d981b08..78f9a5e0f9a1 100644 --- a/jvm-packages/README.md +++ b/jvm-packages/README.md @@ -3,161 +3,15 @@ [![Documentation Status](https://readthedocs.org/projects/xgboost/badge/?version=latest)](https://xgboost.readthedocs.org/en/latest/jvm/index.html) [![GitHub license](http://dmlc.github.io/img/apache2.svg)](../LICENSE) -[Documentation](https://xgboost.readthedocs.org/en/latest/jvm/index.html) | +[Documentation](https://xgboost.readthedocs.org/en/stable/jvm/index.html) | [Resources](../demo/README.md) | [Release Notes](../NEWS.md) -XGBoost4J is the JVM package of xgboost. It brings all the optimizations -and power xgboost into JVM ecosystem. +XGBoost4J is the JVM package of xgboost. It brings all the optimizations and power xgboost +into JVM ecosystem. -- Train XGBoost models in scala and java with easy customizations. -- Run distributed xgboost natively on jvm frameworks such as -Apache Flink and Apache Spark. +- Train XGBoost models in scala and java with easy customization. +- Run distributed xgboost natively on jvm frameworks such as Apache Flink and Apache +Spark. -You can find more about XGBoost on [Documentation](https://xgboost.readthedocs.org/en/latest/jvm/index.html) and [Resource Page](../demo/README.md). - -## Add Maven Dependency - -XGBoost4J, XGBoost4J-Spark, etc. in maven repository is compiled with g++-4.8.5. - -### Access release version - -Maven - -``` - - ml.dmlc - xgboost4j_2.12 - latest_version_num - - - ml.dmlc - xgboost4j-spark_2.12 - latest_version_num - -``` -or -``` - - ml.dmlc - xgboost4j_2.13 - latest_version_num - - - ml.dmlc - xgboost4j-spark_2.13 - latest_version_num - -``` - -sbt -```sbt -libraryDependencies ++= Seq( - "ml.dmlc" %% "xgboost4j" % "latest_version_num", - "ml.dmlc" %% "xgboost4j-spark" % "latest_version_num" -) -``` - -For the latest release version number, please check [here](https://github.com/dmlc/xgboost/releases). - - -### Access SNAPSHOT version - -First add the following Maven repository hosted by the XGBoost project: - -Maven: - -```xml - - XGBoost4J Snapshot Repo - XGBoost4J Snapshot Repo - https://s3-us-west-2.amazonaws.com/xgboost-maven-repo/snapshot/ - -``` - -sbt: - -```sbt -resolvers += "XGBoost4J Snapshot Repo" at "https://s3-us-west-2.amazonaws.com/xgboost-maven-repo/snapshot/" -``` - -Then add XGBoost4J as a dependency: - -Maven - -``` - - ml.dmlc - xgboost4j_2.12 - latest_version_num-SNAPSHOT - - - ml.dmlc - xgboost4j-spark_2.12 - latest_version_num-SNAPSHOT - -``` -or with scala 2.13 -``` - - ml.dmlc - xgboost4j_2.13 - latest_version_num-SNAPSHOT - - - ml.dmlc - xgboost4j-spark_2.13 - latest_version_num-SNAPSHOT - -``` - -sbt -```sbt -libraryDependencies ++= Seq( - "ml.dmlc" %% "xgboost4j" % "latest_version_num-SNAPSHOT", - "ml.dmlc" %% "xgboost4j-spark" % "latest_version_num-SNAPSHOT" -) -``` - -For the latest release version number, please check [the repository listing](https://s3-us-west-2.amazonaws.com/xgboost-maven-repo/list.html). - -### GPU algorithm -To enable the GPU algorithm (`tree_method='gpu_hist'`), use artifacts `xgboost4j-gpu_2.12` and `xgboost4j-spark-gpu_2.12` instead. -Note that scala 2.13 is not supported by the [NVIDIA/spark-rapids#1525](https://github.com/NVIDIA/spark-rapids/issues/1525) yet, so the GPU algorithm can only be used with scala 2.12. - -## Examples - -Full code examples for Scala, Java, Apache Spark, and Apache Flink can -be found in the [examples package](https://github.com/dmlc/xgboost/tree/master/jvm-packages/xgboost4j-example). - -**NOTE on LIBSVM Format**: - -There is an inconsistent issue between XGBoost4J-Spark and other language bindings of XGBoost. - -When users use Spark to load trainingset/testset in LIBSVM format with the following code snippet: - -```scala -spark.read.format("libsvm").load("trainingset_libsvm") -``` - -Spark assumes that the dataset is 1-based indexed. However, when you do prediction with other bindings of XGBoost (e.g. Python API of XGBoost), XGBoost assumes that the dataset is 0-based indexed. It creates a pitfall for the users who train model with Spark but predict with the dataset in the same format in other bindings of XGBoost. - -## Development - -You can build/package xgboost4j locally with the following steps: - -**Linux:** -1. Ensure [Docker for Linux](https://docs.docker.com/install/) is installed. -2. Clone this repo: `git clone --recursive https://github.com/dmlc/xgboost.git` -3. Run the following command: - - With Tests: `./xgboost/jvm-packages/dev/build-linux.sh` - - Skip Tests: `./xgboost/jvm-packages/dev/build-linux.sh --skip-tests` - -**Windows:** -1. Ensure [Docker for Windows](https://docs.docker.com/docker-for-windows/install/) is installed. -2. Clone this repo: `git clone --recursive https://github.com/dmlc/xgboost.git` -3. Run the following command: - - With Tests: `.\xgboost\jvm-packages\dev\build-linux.cmd` - - Skip Tests: `.\xgboost\jvm-packages\dev\build-linux.cmd --skip-tests` - -*Note: this will create jars for deployment on Linux machines.* +You can find more about XGBoost on [Documentation](https://xgboost.readthedocs.org/en/stable/jvm/index.html) and [Resource Page](../demo/README.md). \ No newline at end of file diff --git a/jvm-packages/dev/.gitattributes b/jvm-packages/dev/.gitattributes deleted file mode 100644 index ed670ecedb5e..000000000000 --- a/jvm-packages/dev/.gitattributes +++ /dev/null @@ -1,3 +0,0 @@ -# Set line endings to LF, even on Windows. Otherwise, execution within Docker fails. -# See https://help.github.com/articles/dealing-with-line-endings/ -*.sh text eol=lf diff --git a/jvm-packages/dev/.gitignore b/jvm-packages/dev/.gitignore deleted file mode 100644 index eb713db19674..000000000000 --- a/jvm-packages/dev/.gitignore +++ /dev/null @@ -1 +0,0 @@ -.m2 diff --git a/jvm-packages/dev/Dockerfile b/jvm-packages/dev/Dockerfile deleted file mode 100644 index 72ccdeba0825..000000000000 --- a/jvm-packages/dev/Dockerfile +++ /dev/null @@ -1,58 +0,0 @@ -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -FROM centos:7 - -# Install all basic requirements -RUN \ - yum -y update && \ - yum install -y bzip2 make tar unzip wget xz git centos-release-scl yum-utils java-1.8.0-openjdk-devel && \ - yum-config-manager --enable centos-sclo-rh-testing && \ - yum -y update && \ - yum install -y devtoolset-7-gcc devtoolset-7-binutils devtoolset-7-gcc-c++ && \ - # Python - wget https://repo.continuum.io/miniconda/Miniconda3-4.5.12-Linux-x86_64.sh && \ - bash Miniconda3-4.5.12-Linux-x86_64.sh -b -p /opt/python && \ - # CMake - wget -nv -nc https://cmake.org/files/v3.18/cmake-3.18.3-Linux-x86_64.sh --no-check-certificate && \ - bash cmake-3.18.3-Linux-x86_64.sh --skip-license --prefix=/usr && \ - # Maven - wget https://archive.apache.org/dist/maven/maven-3/3.6.1/binaries/apache-maven-3.6.1-bin.tar.gz && \ - tar xvf apache-maven-3.6.1-bin.tar.gz -C /opt && \ - ln -s /opt/apache-maven-3.6.1/ /opt/maven - -# Set the required environment variables -ENV PATH=/opt/python/bin:/opt/maven/bin:$PATH -ENV CC=/opt/rh/devtoolset-7/root/usr/bin/gcc -ENV CXX=/opt/rh/devtoolset-7/root/usr/bin/c++ -ENV CPP=/opt/rh/devtoolset-7/root/usr/bin/cpp -ENV JAVA_HOME=/usr/lib/jvm/java - -# Install Python packages -RUN \ - pip install numpy pytest scipy scikit-learn wheel kubernetes urllib3==1.22 awscli - -ENV GOSU_VERSION 1.10 - -# Install lightweight sudo (not bound to TTY) -RUN set -ex; \ - wget -O /usr/local/bin/gosu "https://github.com/tianon/gosu/releases/download/$GOSU_VERSION/gosu-amd64" && \ - chmod +x /usr/local/bin/gosu && \ - gosu nobody true - -WORKDIR /xgboost diff --git a/jvm-packages/dev/build-linux.cmd b/jvm-packages/dev/build-linux.cmd deleted file mode 100644 index a5d962f5fe52..000000000000 --- a/jvm-packages/dev/build-linux.cmd +++ /dev/null @@ -1,44 +0,0 @@ -@echo off - -rem -rem Licensed to the Apache Software Foundation (ASF) under one -rem or more contributor license agreements. See the NOTICE file -rem distributed with this work for additional information -rem regarding copyright ownership. The ASF licenses this file -rem to you under the Apache License, Version 2.0 (the -rem "License"); you may not use this file except in compliance -rem with the License. You may obtain a copy of the License at -rem -rem http://www.apache.org/licenses/LICENSE-2.0 -rem -rem Unless required by applicable law or agreed to in writing, -rem software distributed under the License is distributed on an -rem "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -rem KIND, either express or implied. See the License for the -rem specific language governing permissions and limitations -rem under the License. -rem - -rem The the local path of this file -set "BASEDIR=%~dp0" - -rem The local path of .m2 directory for maven -set "M2DIR=%BASEDIR%\.m2\" - -rem Create a local .m2 directory if needed -if not exist "%M2DIR%" mkdir "%M2DIR%" - -rem Build and tag the Dockerfile -docker build -t dmlc/xgboost4j-build %BASEDIR% - -docker run^ - -it^ - --rm^ - --memory 12g^ - --env JAVA_OPTS="-Xmx9g"^ - --env MAVEN_OPTS="-Xmx3g"^ - --ulimit core=-1^ - --volume %BASEDIR%\..\..:/xgboost^ - --volume %M2DIR%:/root/.m2^ - dmlc/xgboost4j-build^ - /xgboost/jvm-packages/dev/package-linux.sh "%*" diff --git a/jvm-packages/dev/build-linux.sh b/jvm-packages/dev/build-linux.sh deleted file mode 100755 index 1509a375236c..000000000000 --- a/jvm-packages/dev/build-linux.sh +++ /dev/null @@ -1,41 +0,0 @@ -#!/usr/bin/env bash -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -BASEDIR="$( cd "$( dirname "$0" )" && pwd )" # the directory of this file - -docker build -t dmlc/xgboost4j-build "${BASEDIR}" # build and tag the Dockerfile - -exec docker run \ - -it \ - --rm \ - --memory 12g \ - --env JAVA_OPTS="-Xmx9g" \ - --env MAVEN_OPTS="-Xmx3g -Dmaven.repo.local=/xgboost/jvm-packages/dev/.m2" \ - --env CI_BUILD_UID=`id -u` \ - --env CI_BUILD_GID=`id -g` \ - --env CI_BUILD_USER=`id -un` \ - --env CI_BUILD_GROUP=`id -gn` \ - --ulimit core=-1 \ - --volume "${BASEDIR}/../..":/xgboost \ - dmlc/xgboost4j-build \ - /xgboost/tests/ci_build/entrypoint.sh jvm-packages/dev/package-linux.sh "$@" - -# CI_BUILD_UID, CI_BUILD_GID, CI_BUILD_USER, CI_BUILD_GROUP -# are used by entrypoint.sh to create the user with the same uid in a container -# so all produced artifacts would be owned by your host user \ No newline at end of file diff --git a/jvm-packages/dev/package-linux.sh b/jvm-packages/dev/package-linux.sh deleted file mode 100755 index 1fd777d9b90b..000000000000 --- a/jvm-packages/dev/package-linux.sh +++ /dev/null @@ -1,36 +0,0 @@ -#!/usr/bin/env bash -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -cd jvm-packages - -case "$1" in - --skip-tests) SKIP_TESTS=true ;; - "") SKIP_TESTS=false ;; -esac - -if [[ -n ${SKIP_TESTS} ]]; then - if [[ ${SKIP_TESTS} == "true" ]]; then - mvn --batch-mode clean package -DskipTests - elif [[ ${SKIP_TESTS} == "false" ]]; then - mvn --batch-mode clean package - fi -else - echo "Usage: $0 [--skip-tests]" - exit 1 -fi