Support Vector Machine (SVM) is a supervised Machine Learning (ML) algorithm that initially developed in the 1960s and later refined in the 1990s. And it is only now, that the SVM become popular in the ML area due to its certain characteristics. SVM can be used in many ways such as —

- SVM can perform classification, regression, and even outlier detection
- SVM can perform linear and nonlinear classification
- SVM can perform binary and multi-class classification

In this post, I will primarily discuss** SVM for linear binary classification. **I divide this article into three parts —

- Intuition behind SVM
- Mathematics for…

**Self-Organizing Map (SOM)** is one of the common unsupervised neural network models. SOM has been widely used for clustering, dimension reduction, and feature detection. SOM was first introduced by Professor Kohonen. For this reason, SOM also called Kohonen Map. It has many real-world applications including machine state monitoring, fault identification, satellite remote sensing process, and robot control [1]. Moreover, SOM is used for data visualization by projecting higher dimensional data into lower dimensional space leveraging topological similarity properties. In the process of model development, SOM utilizes competitive learning techniques while the conventional neural network applies an error reduction process through…

Transfer learning is one of the state-of-the-art techniques in machine learning that has been widely used in image classification. In this article, I will discuss about transfer learning, the VGG model, and feature extraction. In the last section, I will demonstrate an interesting example of transfer learning where the transfer learning technique displays unexpectedly poor performance in classifying the Mnist digit dataset.

**VGG **is a convolutional neural network with a specific architecture that was proposed in the paper — *Very Deep Convolutional Networks for Large-Scale Image Recognition** *by a group of researchers (visual geometry group) from the University of Oxford…

Building a deep network using original digital images requires learning many parameters which may reduce the accuracy rates. The images can be compressed by using dimension reduction methods and extracted reduced features can be feeding into a deep network for classification. Hence, in the training phase of the network, the number of parameters will be decreased. Principal Component Analysis is a well-known dimension reduction technique that leverages the orthogonal linear transformation of the original data. In this article, we demonstrate a neural network-based framework, named Fusion-Net, which implements PCA on an image dataset (CIFAR-10), and then a neural network applies…

**Explore-Exploit Dilemma**

Decision and dilemma are the two sides of the same coin. Imagine a student looking forward to learning data science. He searches online for data science courses and it returns a number of courses from Harvard, MIT, Coursera, Udemy, Udacity, etc. Now, here’s is the dilemma: how does he figure out which course is the best for him at the initial stage given all the courses/information? Deciding the best course after going through all of the courses outlines one by one is might be the ideal solution for him. In reinforcement learning, this is an **exploration** where one…

**Multivariate Time Series Analysis**

A univariate time series data contains only one single time-dependent variable while a multivariate time series data consists of multiple time-dependent variables. We generally use multivariate time series analysis to model and explain the interesting interdependencies and co-movements among the variables. In the multivariate analysis — the assumption is that the time-dependent variables not only depend on their past values but also show dependency between them. Multivariate time series models leverage the dependencies to provide more reliable and accurate forecasts for a specific given data, though the univariate analysis outperforms multivariate in general[1]. …

Selecting candidate **Auto Regressive Moving Average **(ARMA) models for time series analysis and forecasting, understanding **Autocorrelation function** (ACF), and **Partial autocorrelation function** (PACF) plots of the series are necessary to determine the order of AR and/ or MA terms. Though ACF and PACF do not directly dictate the order of the ARMA model, the plots can facilitate understanding the order and provide an idea of which model can be a good fit for the time-series data. In this article, primarily I share my experience in understanding the ACF, PACF plots, and their significance in selecting the order of ARMA models.

**…**

Data science- from big data analytics to artificial intelligence- provides immense opportunities to improve our private and public life by optimizing the decision-making process. These huge opportunities are sadly also associated with major ethical issues [1]. The issues are becoming increasingly relevant in the recent digital world with the growing use of personal, and in some cases sensitive data. The algorithms (both traditional and state-of-the-art), that are extensively used for analyzing the data, along with the exponential reduction of human participation in the decision-making process, raise pressing issues of data fairness, accountability, and respect of human rights [1]. In this…

The **stationarity** of a time series data means that the statistical properties like mean, variance, and autocorrelation of the series do not change over time. The notion of stationarity of a series is important for applying statistical forecasting models since:

- most of the statistical methods like ARIMA are based on the assumption that the process is stationary or approximately stationary [1].
- a stationary time series can provide meaningful sample statistics like mean, variance, correlation with other variables [1].

The stationarity of the process can be verified by visually check the **time series plot** or **variogram of the series**. Statistical tests…

Google Colaboratory, known as Colab, is a free Jupyter Notebook environment with many pre-installed libraries like Tensorflow, Pytorch, Keras, OpenCV, and many more. It is one of the cloud services that support GPU and TPU for free. Importing a dataset and training models on the data in the Colab facilitate coding experience. We can apply different ways to import and download data in Colab. In this tutorial, I will discuss my experience in:

- Importing data from Google Drive
- Importing and downloading data in the local system

**Mounting Google Drive**

We can access files in drive using mounting Google Drive. Mounting…

Ph.D. Student in Analytics and Data Science, Kennesaw State University. Research focus: Improving and understanding model performance for Imbalanced datasets.