DataVec

DataVec is an Apache2 Licensed open-sourced tool for machine learning ETL (Extract, Transform, Load) operations. The goal of DataVec is to transform raw data into usable vector formats across machine learning tools.

Why Would I Use DataVec?

DataVec allows a practitioner to take raw data and produce open standard compliant vectorized data (svmLight, etc) quickly. Current input data types supported out of the box:

CSV Data
Raw Text Data (Tweets, Text Documents, etc)
Image Data

DataVec also includes sophisticated functionality for feature engineering, data cleaning and data normalization both for static data and for sequences (time series). Such operations can be executed on spark using DataVec-Spark.

Examples

Examples for using DataVec are available here: https://github.com/deeplearning4j/dl4j-0.4-examples/tree/master/datavec-examples/src/main

Contribute

Check for open issues, or open a new issue to start a discussion around a feature idea or a bug.
If you feel uncomfortable or uncertain about an issue or your changes, feel free to contact us on Gitter using the link above.
Fork the repository on GitHub to start making your changes.
Write a test, which shows that the bug was fixed or that the feature works as expected.
Note the repository follows the Google Java style with two modifications: 120-char column wrap and 4-spaces indentation. You can format your code to this format by typing mvn formatter:format in the subproject you work on, by using the contrib/formatter.xml at the root of the repository to configure the Eclipse formatter, or by using the INtellij plugin.
Send a pull request, and bug us on Gitter until it gets merged and published.

Name		Name	Last commit message	Last commit date
Latest commit History 505 Commits
contrib		contrib
datavec-api		datavec-api
datavec-camel		datavec-camel
datavec-data		datavec-data
datavec-dataframe		datavec-dataframe
datavec-geo		datavec-geo
datavec-hadoop		datavec-hadoop
datavec-local		datavec-local
datavec-spark-inference-server		datavec-spark-inference-server
datavec-spark		datavec-spark
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md
VERSION		VERSION
buildmultiplescalaversions.sh		buildmultiplescalaversions.sh
change-scala-versions.sh		change-scala-versions.sh
change-spark-versions.sh		change-spark-versions.sh
perform-release.sh		perform-release.sh
pom.xml		pom.xml
runtests.sh		runtests.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DataVec

Why Would I Use DataVec?

Examples

Contribute

About

Releases

Packages

Languages

License

DonaldAlan/DataVec

Folders and files

Latest commit

History

Repository files navigation

DataVec

Why Would I Use DataVec?

Examples

Contribute

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages