Skip to content

Commit

Permalink
Add categorical variables section to data pre-processing in machine l…
Browse files Browse the repository at this point in the history
…earning course
  • Loading branch information
PauloHFS committed Oct 1, 2024
1 parent af36c40 commit 94fc939
Showing 1 changed file with 40 additions and 0 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
---
id: "categorical-variables"
title: "Categorical Variables"
---

To use some variables that are categorical we need to transform them into numerical values. This process is known as encoding or feature encoding.

### Label Encoding

Label encoding is a technique used to convert categorical variables into numerical values. It assigns a unique integer to each category in the variable.

```python
from sklearn.preprocessing import LabelEncoder

# Create a label encoder object
label_encoder = LabelEncoder()

# Fit the encoder to the data
data['category'] = label_encoder.fit_transform(data['category'])
```

### One-Hot Encoding

This technique is used to convert categorical variables into binary vectors. It creates a new binary column for each category in the variable.

```python
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer

# Create a column transformer object
column_transformer = ColumnTransformer(
transformers=[
('encoder', OneHotEncoder(), ['category', 'category2'])
],
remainder='passthrough' # Keep the remaining columns
)

# Fit the transformer to the data
data = column_transformer.fit_transform(data)
```

0 comments on commit 94fc939

Please sign in to comment.