-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add categorical variables section to data pre-processing in machine l…
…earning course
- Loading branch information
Showing
1 changed file
with
40 additions
and
0 deletions.
There are no files selected for viewing
40 changes: 40 additions & 0 deletions
40
...e-learning-and-data-science-course/data-pre-processing/categorical-variables.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
--- | ||
id: "categorical-variables" | ||
title: "Categorical Variables" | ||
--- | ||
|
||
To use some variables that are categorical we need to transform them into numerical values. This process is known as encoding or feature encoding. | ||
|
||
### Label Encoding | ||
|
||
Label encoding is a technique used to convert categorical variables into numerical values. It assigns a unique integer to each category in the variable. | ||
|
||
```python | ||
from sklearn.preprocessing import LabelEncoder | ||
|
||
# Create a label encoder object | ||
label_encoder = LabelEncoder() | ||
|
||
# Fit the encoder to the data | ||
data['category'] = label_encoder.fit_transform(data['category']) | ||
``` | ||
|
||
### One-Hot Encoding | ||
|
||
This technique is used to convert categorical variables into binary vectors. It creates a new binary column for each category in the variable. | ||
|
||
```python | ||
from sklearn.preprocessing import OneHotEncoder | ||
from sklearn.compose import ColumnTransformer | ||
|
||
# Create a column transformer object | ||
column_transformer = ColumnTransformer( | ||
transformers=[ | ||
('encoder', OneHotEncoder(), ['category', 'category2']) | ||
], | ||
remainder='passthrough' # Keep the remaining columns | ||
) | ||
|
||
# Fit the transformer to the data | ||
data = column_transformer.fit_transform(data) | ||
``` |