Easy Way to Convert Categorical Variables in PySpark

Converting Categorical Data using OneHotEncoding

M. Masum, PhD
3 min readApr 23, 2022

--

We cannot directly feed categorical data into Machine Learning (ML) algorithms. We must provide a numerical representation to ML models of the categorical features of a dataset. While working in python, we generally use label encoder, get dummies, and OneHotEncoder for converting the categorical features. However, in PySpark, we can perform the conversion in different ways. In this post, we mainly discuss this.

--

--