Open in app
Home
Notifications
Lists
Stories

Write
Mohammad Masum, PhD
Mohammad Masum, PhD

Home

Published in MLearning.ai

·Pinned

Avoid Data Leakage in Data Preprocessing Steps

The experimental setting is a critical component of the Machine Learning (ML) ecosystem. This, I believe, is the most crucial (and I cannot… — The experimental setting is a critical component of the Machine Learning (ML) ecosystem. This, I believe, is the most crucial (and I cannot emphasize this enough) parameter that supports the outcome of your models. I’m a reviewer for a couple of machine learning-related conference publications. After reviewing a few scientific…

Data Leak

5 min read

Avoid Data Leakage in Data Preprocessing Steps
Avoid Data Leakage in Data Preprocessing Steps

Apr 23

Easy Way to Convert Categorical Variables in PySpark

Converting Categorical Data using OneHotEncoding — We cannot directly feed categorical data into Machine Learning (ML) algorithms. We must provide a numerical representation to ML models of the categorical features of a dataset. While working in python, we generally use label encoder, get dummies, and OneHotEncoder for converting the categorical features. However, in PySpark, we can…

Pyspark

3 min read

Easy Way to Convert Categorical Variables in PySpark
Easy Way to Convert Categorical Variables in PySpark

Published in Towards Data Science

·Apr 19

3 Fundamental Reasons why Quantization is important for tinyML

Achieving size, latency and portability for tinyML — Machine Learning (ML) has been the talk of the decades with successful real-world applications in every aspect of our life ranging from health informatics to business to cybersecurity. Embedding ML directly on edge devices revolutionize the research and applications of ML on inter of things (IoT), where thousands of thasounds…

Tinyml

4 min read

3 Fundamental Reasons why Quantization is important for tinyML
3 Fundamental Reasons why Quantization is important for tinyML

Apr 16

Quantum Machine Learning and available Simulators

Data is stored with Boolean bits at the lowest level in classical computing, where each bit can take only one of two possible values (0 or… — Data is stored with Boolean bits at the lowest level in classical computing, where each bit can take only one of two possible values (0 or 1) depending on the existence of electron charge: the existence of electron charge indicates 1, otherwise 0. On the other hand, the basic unit…

Quantum Computing

5 min read

Quantum Machine Learning and available Simulators
Quantum Machine Learning and available Simulators

Published in Towards Data Science

·Apr 6

4 Challenges of Reproducibility in the Machine Learning Model Deployment

In the pipeline of a Machine Learning model, there are a few environments that we consider — research environment, development environment… — In the pipeline of a Machine Learning model, there are a few environments that we consider — research environment, development environment, and production environment. In the research environment, different integral parts of model development are performed such as exploratory data analysis (EDA), model building, model evaluation, and result analysis. …

Ml Model Deployment

4 min read

4 Challenges of Reproducibility in the Machine Learning Model Deployment
4 Challenges of Reproducibility in the Machine Learning Model Deployment

Mar 15

How Statistics & Numbers can Mislead You: Simpson’s Paradox

Recently, I come to know about one of the most interesting experiments that were conducted in the last century. It was quite surprising to… — Recently, I come to know about one of the most interesting experiments that were conducted in the last century. It was quite surprising to swallow! The experiment was conducted in 1996 to study the effect of smoking on a sample of over 1300 English women. The experiment was conducted for…

Simpsons Paradox

7 min read

How Statistics & Numbers can Mislead You: Simpson’s Paradox
How Statistics & Numbers can Mislead You: Simpson’s Paradox

Published in Towards Data Science

·Feb 28

Bayesian Hyperparameter Optimization for a Deep Neural Network in Cybersecurity

Bayesian optimization with Gaussian processes and Random Search Optimization methods for Optimal DNN in application to Network Intrusion… — Deep neural networks (DNN) have been successfully applied for many real-world problems ranging from disease classification to cybersecurity. The optimal use of DNN-based classifiers requires careful tuning of the hyper-parameters. Manually tuning the hyperparameters is tedious, time-consuming, and computationally expensive. Hence, there is a need for an automatic technique to…

Bayesian Optimization

8 min read

Bayesian Hyperparameter Optimization for a Deep Neural Network in Cybersecurity
Bayesian Hyperparameter Optimization for a Deep Neural Network in Cybersecurity

Published in MLearning.ai

·Feb 26

Effect of Sample Size in Central Limit Theorem

What Central Limit Theorem Tells You??? — Fundamentals of Statistics Central Limit Theorem (CLT) is a fundamental and key concept in statistics/ probability theory. In this article, we will discuss: Central Limit Theorem Visually Understand the CLT Why sample size should be a minimum of 30? Effect of Sample Size in Central Limit Theorem …

Central Limit Theorem

5 min read

Effect of Sample Size in Central Limit Theorem
Effect of Sample Size in Central Limit Theorem

Published in DataDrivenInvestor

·Feb 21

Fundamentals of Spark DataFrame: PySpark on Google CoLab

Introduction to PySpark for Big Data Analytics — After you become comfortable with applying. Machine learning methods on your local machine, the next you can do is to use a cloud server like AWS to increase your skill. This will introduce you to a new set of challenges ranging from pretty simple to hard problems. Sometimes a simple…

Pyspark

6 min read

Fundamentals of Spark DataFrame: PySpark on Google CoLab
Fundamentals of Spark DataFrame: PySpark on Google CoLab

Published in DataDrivenInvestor

·Feb 18

Greedy Algorithms for Computer Network Topology

Application of Prims’ and Kruskals’ algorithms towards Network Topology — Application of Prims’ and Kruskals’ algorithms towards Network Topology A network topology is a significant procedure of a network where devices (nodes) are interconnected in the network using network lines such as ethernet. The topology also describes the process of transferring data between the connection of the network. Thus, the…

Greedy Algorithms

5 min read

Greedy Algorithms for Computer Network Topology
Greedy Algorithms for Computer Network Topology
Mohammad Masum, PhD

Mohammad Masum, PhD

Machine Learning Enthusiast | Thought Partner for Data| https://www.linkedin.com/in/mohammadmasumds/

Following
  • The Good Men Project

    The Good Men Project

  • Sik-Ho Tsang

    Sik-Ho Tsang

  • Dariusz Gross #DATAsculptor

    Dariusz Gross #DATAsculptor

  • Michał Oleszak

    Michał Oleszak

  • Felix Laumann

    Felix Laumann

Help

Status

Writers

Blog

Careers

Privacy

Terms

About

Knowable