Skip to content

Commit

Permalink
Remove unmaintained jvm readme and dev scripts. (#9395)
Browse files Browse the repository at this point in the history
  • Loading branch information
trivialfis authored Jul 18, 2023
1 parent e082718 commit 0897477
Show file tree
Hide file tree
Showing 8 changed files with 8 additions and 337 deletions.
2 changes: 1 addition & 1 deletion demo/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -145,7 +145,7 @@ Send a PR to add a one sentence description:)
## Tools using XGBoost

- [BayesBoost](https://github.com/mpearmain/BayesBoost) - Bayesian Optimization using xgboost and sklearn API
- [FLAML](https://github.com/microsoft/FLAML) - An open source AutoML library
- [FLAML](https://github.com/microsoft/FLAML) - An open source AutoML library
designed to automatically produce accurate machine learning models with low computational cost. FLAML includes [XGBoost as one of the default learners](https://github.com/microsoft/FLAML/blob/main/flaml/model.py) and can also be used as a fast hyperparameter tuning tool for XGBoost ([code example](https://microsoft.github.io/FLAML/docs/Examples/AutoML-for-XGBoost)).
- [gp_xgboost_gridsearch](https://github.com/vatsan/gp_xgboost_gridsearch) - In-database parallel grid-search for XGBoost on [Greenplum](https://github.com/greenplum-db/gpdb) using PL/Python
- [tpot](https://github.com/rhiever/tpot) - A Python tool that automatically creates and optimizes machine learning pipelines using genetic programming.
Expand Down
160 changes: 7 additions & 153 deletions jvm-packages/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,161 +3,15 @@
[![Documentation Status](https://readthedocs.org/projects/xgboost/badge/?version=latest)](https://xgboost.readthedocs.org/en/latest/jvm/index.html)
[![GitHub license](http://dmlc.github.io/img/apache2.svg)](../LICENSE)

[Documentation](https://xgboost.readthedocs.org/en/latest/jvm/index.html) |
[Documentation](https://xgboost.readthedocs.org/en/stable/jvm/index.html) |
[Resources](../demo/README.md) |
[Release Notes](../NEWS.md)

XGBoost4J is the JVM package of xgboost. It brings all the optimizations
and power xgboost into JVM ecosystem.
XGBoost4J is the JVM package of xgboost. It brings all the optimizations and power xgboost
into JVM ecosystem.

- Train XGBoost models in scala and java with easy customizations.
- Run distributed xgboost natively on jvm frameworks such as
Apache Flink and Apache Spark.
- Train XGBoost models in scala and java with easy customization.
- Run distributed xgboost natively on jvm frameworks such as Apache Flink and Apache
Spark.

You can find more about XGBoost on [Documentation](https://xgboost.readthedocs.org/en/latest/jvm/index.html) and [Resource Page](../demo/README.md).

## Add Maven Dependency

XGBoost4J, XGBoost4J-Spark, etc. in maven repository is compiled with g++-4.8.5.

### Access release version

<b>Maven</b>

```
<dependency>
<groupId>ml.dmlc</groupId>
<artifactId>xgboost4j_2.12</artifactId>
<version>latest_version_num</version>
</dependency>
<dependency>
<groupId>ml.dmlc</groupId>
<artifactId>xgboost4j-spark_2.12</artifactId>
<version>latest_version_num</version>
</dependency>
```
or
```
<dependency>
<groupId>ml.dmlc</groupId>
<artifactId>xgboost4j_2.13</artifactId>
<version>latest_version_num</version>
</dependency>
<dependency>
<groupId>ml.dmlc</groupId>
<artifactId>xgboost4j-spark_2.13</artifactId>
<version>latest_version_num</version>
</dependency>
```

<b>sbt</b>
```sbt
libraryDependencies ++= Seq(
"ml.dmlc" %% "xgboost4j" % "latest_version_num",
"ml.dmlc" %% "xgboost4j-spark" % "latest_version_num"
)
```

For the latest release version number, please check [here](https://github.com/dmlc/xgboost/releases).


### Access SNAPSHOT version

First add the following Maven repository hosted by the XGBoost project:

<b>Maven</b>:

```xml
<repository>
<id>XGBoost4J Snapshot Repo</id>
<name>XGBoost4J Snapshot Repo</name>
<url>https://s3-us-west-2.amazonaws.com/xgboost-maven-repo/snapshot/</url>
</repository>
```

<b>sbt</b>:

```sbt
resolvers += "XGBoost4J Snapshot Repo" at "https://s3-us-west-2.amazonaws.com/xgboost-maven-repo/snapshot/"
```

Then add XGBoost4J as a dependency:

<b>Maven</b>

```
<dependency>
<groupId>ml.dmlc</groupId>
<artifactId>xgboost4j_2.12</artifactId>
<version>latest_version_num-SNAPSHOT</version>
</dependency>
<dependency>
<groupId>ml.dmlc</groupId>
<artifactId>xgboost4j-spark_2.12</artifactId>
<version>latest_version_num-SNAPSHOT</version>
</dependency>
```
or with scala 2.13
```
<dependency>
<groupId>ml.dmlc</groupId>
<artifactId>xgboost4j_2.13</artifactId>
<version>latest_version_num-SNAPSHOT</version>
</dependency>
<dependency>
<groupId>ml.dmlc</groupId>
<artifactId>xgboost4j-spark_2.13</artifactId>
<version>latest_version_num-SNAPSHOT</version>
</dependency>
```

<b>sbt</b>
```sbt
libraryDependencies ++= Seq(
"ml.dmlc" %% "xgboost4j" % "latest_version_num-SNAPSHOT",
"ml.dmlc" %% "xgboost4j-spark" % "latest_version_num-SNAPSHOT"
)
```

For the latest release version number, please check [the repository listing](https://s3-us-west-2.amazonaws.com/xgboost-maven-repo/list.html).

### GPU algorithm
To enable the GPU algorithm (`tree_method='gpu_hist'`), use artifacts `xgboost4j-gpu_2.12` and `xgboost4j-spark-gpu_2.12` instead.
Note that scala 2.13 is not supported by the [NVIDIA/spark-rapids#1525](https://github.com/NVIDIA/spark-rapids/issues/1525) yet, so the GPU algorithm can only be used with scala 2.12.

## Examples

Full code examples for Scala, Java, Apache Spark, and Apache Flink can
be found in the [examples package](https://github.com/dmlc/xgboost/tree/master/jvm-packages/xgboost4j-example).

**NOTE on LIBSVM Format**:

There is an inconsistent issue between XGBoost4J-Spark and other language bindings of XGBoost.

When users use Spark to load trainingset/testset in LIBSVM format with the following code snippet:

```scala
spark.read.format("libsvm").load("trainingset_libsvm")
```

Spark assumes that the dataset is 1-based indexed. However, when you do prediction with other bindings of XGBoost (e.g. Python API of XGBoost), XGBoost assumes that the dataset is 0-based indexed. It creates a pitfall for the users who train model with Spark but predict with the dataset in the same format in other bindings of XGBoost.

## Development

You can build/package xgboost4j locally with the following steps:

**Linux:**
1. Ensure [Docker for Linux](https://docs.docker.com/install/) is installed.
2. Clone this repo: `git clone --recursive https://github.com/dmlc/xgboost.git`
3. Run the following command:
- With Tests: `./xgboost/jvm-packages/dev/build-linux.sh`
- Skip Tests: `./xgboost/jvm-packages/dev/build-linux.sh --skip-tests`

**Windows:**
1. Ensure [Docker for Windows](https://docs.docker.com/docker-for-windows/install/) is installed.
2. Clone this repo: `git clone --recursive https://github.com/dmlc/xgboost.git`
3. Run the following command:
- With Tests: `.\xgboost\jvm-packages\dev\build-linux.cmd`
- Skip Tests: `.\xgboost\jvm-packages\dev\build-linux.cmd --skip-tests`

*Note: this will create jars for deployment on Linux machines.*
You can find more about XGBoost on [Documentation](https://xgboost.readthedocs.org/en/stable/jvm/index.html) and [Resource Page](../demo/README.md).
3 changes: 0 additions & 3 deletions jvm-packages/dev/.gitattributes

This file was deleted.

1 change: 0 additions & 1 deletion jvm-packages/dev/.gitignore

This file was deleted.

58 changes: 0 additions & 58 deletions jvm-packages/dev/Dockerfile

This file was deleted.

44 changes: 0 additions & 44 deletions jvm-packages/dev/build-linux.cmd

This file was deleted.

41 changes: 0 additions & 41 deletions jvm-packages/dev/build-linux.sh

This file was deleted.

36 changes: 0 additions & 36 deletions jvm-packages/dev/package-linux.sh

This file was deleted.

0 comments on commit 0897477

Please sign in to comment.