Skip to content

Commit

Permalink
Merge pull request #36 from sanketsarang/main
Browse files Browse the repository at this point in the history
THIS PR CONTAINS ENHANCEMENTS
Changed parameter of `train` function from `file_path` to `file`. Corresponding documentation enhanced.
  • Loading branch information
sanketsarang authored Oct 3, 2021
2 parents 8ff6f45 + c5d33a2 commit 85b0d98
Show file tree
Hide file tree
Showing 2 changed files with 24 additions and 5 deletions.
23 changes: 21 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ pip install blobcity

``` Python
import blobcity as bc
bc.train("data.csv", "Y_column")
bc.train(file="data.csv", target="Y_column")
bc.spill("my_code.ipynb")
```
`Y_column` is the name of the target column. The column must be present within the data provided.
Expand All @@ -21,6 +21,20 @@ Support input data formats are `.csv` and `.xlsx`. Extension for other file form

The `spill` function generates the model code with exhaustive documentation. Training code is also included for basic scikit-learn models. TensorFlow and other DNN models produce only the test / final use code.

## Use a Pandas Data Frame
``` Python
bc.train(df=my_df, target="Y_column")
```

If loading data from a Database or external system, create a DataFrame from your data source, and pass it directly to the `train` function.

## From a URL
``` Python
bc.train(file="https://example.com/data.csv", target="Y_column")
```

The `file` parameter can be a local file, or a URL. The function will load the data from URL specified. The file at the URL must be either in CSV or XLSX format. The URL should be accessible publicly without authentication.

# Code Generation
Multiple formats of code generation is supported by the framework. The `spill` function can be used to generate both `ipynb` and `py` files. The desired type is infered from the name of the output file. The code file will be created at the path specified. Relative or absolute file paths, are both supported.

Expand All @@ -41,7 +55,7 @@ bc.spill("my_code.py", docs=True)
```
Pass the optional `docs` parameter to generate Python code along with full code documentation.

# Specifying `X_Values`
# Specifying Features
Framework automatically performs a feature selection. All features are selected by default for feature selection.
Framework is smart enough to remove ID / Primary key columns.

Expand All @@ -51,3 +65,8 @@ Use the below code if you would like to manually specify the features to be used
bc.train("data.csv", target="Y_value", features=["col1", "col2", "col3"])
```

# Printing Model Stats
``` Python
model = bc.train(file='./test.csv')
model.stats()
```
6 changes: 3 additions & 3 deletions blobcity/main/driver.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,16 +25,16 @@
from blobcity.utils import AutoFeatureSelection as AFS
from blobcity.utils import writeYml
from blobcity.main.modelSelection import modelSearch
def train(file_path=None, df=None, target=None,features=None):
def train(file=None, df=None, target=None,features=None):
# this should internally create and a yml file. The yml file is used for generating the code in the future.
# this should also store a pickle / tensorflow file based on the model used
# Data read
#below function read tabular/Structured/Semi-Structured data based on file type and returns dataframe object.
dc=DictClass()
dc.resetVar()
#data read
if file_path!=None:
dataframe= getDataFrameType(file_path, dc)
if file!=None:
dataframe= getDataFrameType(file, dc)
else:
dataframe = df
dc.addKeyValue('data_read',{"class":"df"})
Expand Down

0 comments on commit 85b0d98

Please sign in to comment.