Photo by Neil Thomas on Unsplash

Support Vector Machine (SVM) is a supervised Machine Learning (ML) algorithm that initially developed in the 1960s and later refined in the 1990s. And it is only now, that the SVM become popular in the ML area due to its certain characteristics. SVM can be used in many ways such as —

  • SVM can perform classification, regression, and even outlier detection
  • SVM can perform linear and nonlinear classification
  • SVM can perform binary and multi-class classification

In this post, I will primarily discuss SVM for linear binary classification. I divide this article into three parts —

  1. Intuition behind SVM
  2. Mathematics for…

Self-Organizing Maps for Dimension Reduction, Data Visualization, and Clustering

Photo by Andrew Stutesman on Unsplash

Self-Organizing Map (SOM) is one of the common unsupervised neural network models. SOM has been widely used for clustering, dimension reduction, and feature detection. SOM was first introduced by Professor Kohonen. For this reason, SOM also called Kohonen Map. It has many real-world applications including machine state monitoring, fault identification, satellite remote sensing process, and robot control [1]. Moreover, SOM is used for data visualization by projecting higher dimensional data into lower dimensional space leveraging topological similarity properties. In the process of model development, SOM utilizes competitive learning techniques while the conventional neural network applies an error reduction process through…

A demonstration of transfer learning to classify the Mnist digit data using a feature extraction process

Photo by T. Q. on Unsplash

Transfer learning is one of the state-of-the-art techniques in machine learning that has been widely used in image classification. In this article, I will discuss about transfer learning, the VGG model, and feature extraction. In the last section, I will demonstrate an interesting example of transfer learning where the transfer learning technique displays unexpectedly poor performance in classifying the Mnist digit dataset.

VGG is a convolutional neural network with a specific architecture that was proposed in the paper — Very Deep Convolutional Networks for Large-Scale Image Recognition by a group of researchers (visual geometry group) from the University of Oxford…

Photo by Moritz Kindler on Unsplash

Building a deep network using original digital images requires learning many parameters which may reduce the accuracy rates. The images can be compressed by using dimension reduction methods and extracted reduced features can be feeding into a deep network for classification. Hence, in the training phase of the network, the number of parameters will be decreased. Principal Component Analysis is a well-known dimension reduction technique that leverages the orthogonal linear transformation of the original data. In this article, we demonstrate a neural network-based framework, named Fusion-Net, which implements PCA on an image dataset (CIFAR-10), and then a neural network applies…

Photo by Javier Allegue Barros on Unsplash

Explore-Exploit Dilemma

Decision and dilemma are the two sides of the same coin. Imagine a student looking forward to learning data science. He searches online for data science courses and it returns a number of courses from Harvard, MIT, Coursera, Udemy, Udacity, etc. Now, here’s is the dilemma: how does he figure out which course is the best for him at the initial stage given all the courses/information? Deciding the best course after going through all of the courses outlines one by one is might be the ideal solution for him. In reinforcement learning, this is an exploration where one…

A comprehensive study of Multivariate Time Series Analysis and Forecasting

Photo by Victor Rodriguez on Unsplash

Multivariate Time Series Analysis

A univariate time series data contains only one single time-dependent variable while a multivariate time series data consists of multiple time-dependent variables. We generally use multivariate time series analysis to model and explain the interesting interdependencies and co-movements among the variables. In the multivariate analysis — the assumption is that the time-dependent variables not only depend on their past values but also show dependency between them. Multivariate time series models leverage the dependencies to provide more reliable and accurate forecasts for a specific given data, though the univariate analysis outperforms multivariate in general[1]. …

Photo by Nick Chong on Unsplash

Selecting candidate Auto Regressive Moving Average (ARMA) models for time series analysis and forecasting, understanding Autocorrelation function (ACF), and Partial autocorrelation function (PACF) plots of the series are necessary to determine the order of AR and/ or MA terms. Though ACF and PACF do not directly dictate the order of the ARMA model, the plots can facilitate understanding the order and provide an idea of which model can be a good fit for the time-series data. In this article, primarily I share my experience in understanding the ACF, PACF plots, and their significance in selecting the order of ARMA models.

Nature of data, the notion of personal data, ownership of data, consent, and purpose, trustworthiness, privacy, and confidentiality of data.

Photo by Markus Spiske on Unsplash

Data science- from big data analytics to artificial intelligence- provides immense opportunities to improve our private and public life by optimizing the decision-making process. These huge opportunities are sadly also associated with major ethical issues [1]. The issues are becoming increasingly relevant in the recent digital world with the growing use of personal, and in some cases sensitive data. The algorithms (both traditional and state-of-the-art), that are extensively used for analyzing the data, along with the exponential reduction of human participation in the decision-making process, raise pressing issues of data fairness, accountability, and respect of human rights [1]. In this…

Applying the ARIMA model to forecast time-series data

Photo by NeONBRAND on Unsplash

The stationarity of a time series data means that the statistical properties like mean, variance, and autocorrelation of the series do not change over time. The notion of stationarity of a series is important for applying statistical forecasting models since:

  1. most of the statistical methods like ARIMA are based on the assumption that the process is stationary or approximately stationary [1].
  2. a stationary time series can provide meaningful sample statistics like mean, variance, correlation with other variables [1].

The stationarity of the process can be verified by visually check the time series plot or variogram of the series. Statistical tests…

Download and Upload files in Colab from Local system and Google Drive

Photo by Pat Whelen on Unsplash

Google Colaboratory, known as Colab, is a free Jupyter Notebook environment with many pre-installed libraries like Tensorflow, Pytorch, Keras, OpenCV, and many more. It is one of the cloud services that support GPU and TPU for free. Importing a dataset and training models on the data in the Colab facilitate coding experience. We can apply different ways to import and download data in Colab. In this tutorial, I will discuss my experience in:

  1. Importing data from Google Drive
  2. Importing and downloading data in the local system

Mounting Google Drive

We can access files in drive using mounting Google Drive. Mounting…

Mohammad Masum

Ph.D. Student in Analytics and Data Science, Kennesaw State University. Research focus: Improving and understanding model performance for Imbalanced datasets.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store