This code snippet uses Label Encoding to transform categorical data in the 'label' column of the 'train' and 'test' datasets into numerical representations. The transformation is performed using the transform()
method, assuming that a LabelEncoder instance named le
has been initialized elsewhere in the code.
y_train = le.transform(train['label'])
y_test = le.transform(test['label'])
from sklearn.preprocessing import LabelEncoder
import pandas as pd
def transform_labels(train: pd.DataFrame, test: pd.DataFrame) -> tuple:
"""
Transform categorical labels into numerical labels.
Args:
- train (pd.DataFrame): Training dataset with categorical labels.
- test (pd.DataFrame): Testing dataset with categorical labels.
Returns:
- train_labels (pd.Series): Numerical labels for the training dataset.
- test_labels (pd.Series): Numerical labels for the testing dataset.
"""
# Initialize LabelEncoder
label_encoder = LabelEncoder()
# Create a copy to avoid modifying the original data
train_copy = train.copy()
test_copy = test.copy()
# Transform labels
train_labels = label_encoder.fit_transform(train_copy['label'])
test_labels = label_encoder.transform(test_copy['label'])
# Return the transformed labels
return train_labels, test_labels
This code snippet is used to transform categorical data in the 'label' column of train
and test
datasets into numerical representations using Label Encoding (le
).
y_train = le.transform(train['label'])
:
le
is an instance of LabelEncoder from scikit-learn library, used for encoding categorical values into numerical values.train['label']
is the column in the train
dataset that needs to be encoded.transform()
method is called on le
to perform the encoding. The resulting encoded values are stored in y_train
.y_test = le.transform(test['label'])
:
test
dataset.This code assumes that the le
variable has been initialized elsewhere in the code, and it is an instance of LabelEncoder.