Easy Way to Convert Categorical Variables in PySpark
Converting Categorical Data using OneHotEncoding
3 min readApr 23, 2022
We cannot directly feed categorical data into Machine Learning (ML) algorithms. We must provide a numerical representation to ML models of the categorical features of a dataset. While working in python, we generally use label encoder, get dummies, and OneHotEncoder for converting the categorical features. However, in PySpark, we can perform the conversion in different ways. In this post, we mainly discuss this.