Fast Classification and Clustering via Image Convolution Filters

Subtitled “Alternative to Generative Mixture Models”, the full version in PDF format is accessible in the “Free Books and Articles” section, here. It is also described in details in my book “Stochastic Processes and Simulations: A Machine Learning Perspective”, available here.

I explain, with Python code and numerous illustrations, how to turn traditional tabular data into images, to perform both clustering and supervised classification using simple image filtering techniques. I also explain how to generalize the methodology to higher dimensions, using tensors rather than images. In the end, image bitmaps are 2D arrays or matrices, that is, 2D tensors. By classifying the entire space (in low dimensions), the resulting classification rule is very fast. I also discuss the convergence of the algorithm, and how to further improve its speed.

Classification (top) and clustering (bottom): first loop (left), third loop (right)

This short article covers many topics and can be used as a first introduction to synthetic data generation, mixture models, boundary effects, explainable AI, fractal classification, stochastic convergence, GPU machine learning, deep neural networks, and model-free Bayesian classification. I use very little math, making it accessible to the layman, and certainly, to non-mathematicians. Introducing an original, intuitive approach to general classification problems, I explain in simple English how it relates to deep and very deep neural networks. In the process, I make connections to image segmentation, histogram equalization, hierarchical clustering, convolution filters, and stochastic processes. I also compare standard neural networks with very deep but sparse ones, in terms of speed and performance. The fractal classifier — an example of very deep neural network — is illustrated with a Python-generated video (see video here). It is useful when dealing with massively overlapping clusters and a large number of observations. Hyperparameters allow you to fine tune the level of cluster overlap in the synthetic data, and the shape of the clusters.

Abstract

I generate synthetic data using a superimposition of stochastic processes, comparing it to Bayesian generative mixture models (Gaussian mixtures). I explain the benefits and differences. The actual classification and clustering algorithms are model-free, and performed in GPU as image filters, after transforming the raw data into an image. I then discuss the generalization to 3D or 4D, and to higher dimensions with sparse tensors. The technique is particularly suitable when the number of observations is large, and the overlap between clusters is substantial.

It can be done using few iterations and a large filter window, comparable to a neural network, with pixels in the local window being the nodes, and their distance to the local center being the weight function. Or you can implement the method with a large number of iterations — the equivalent of hundreds of layers in a deep neural network — and a tiny window. This latter case corresponds to a sparse network with zero or one connection per node. It is used to implement fractal classification, where point labeling changes at each iteration, around highly non-linear cluster boundaries. This is equivalent to putting a prior on class assignment probabilities in a Bayesian framework. Yet, classification is performed without underlying model. Finally, the clustering (unsupervised) part of the algorithm relies on the same filtering techniques, combined with a color equalizer. The latter can be used to perform hierarchical clustering.

The Python code, included in this document, is also on my GitHub repository. A data animation illustrates how simple the methodology is: each frame in the video represents one iteration, that is, a single application of the filter to all the data points. Indeed, the classifier can be used as a black box system. It follows the modern trend of interpretable machine learning, also called explainable AI. The video shows how the algorithm converges to an optimum, producing a classification of the entire observation space. Classifying a new point is then immediate: read its color. The whole system is time-efficient. It does not require the computation of all training set point intra-distances. However it is memory-intensive. Large filters can be slow, though they require very few iterations. I discuss a simple technique to make them a lot faster.

Introduction

Generating the synthetic data

Simulations with logistic distribution
Mapping the raw observations onto an image bitmap

Classification and unsupervised clustering

Supervised classification based on convolution filters
Clustering based on histogram equalization
Fractal classification: deep neural network analogy
Generalization to higher dimensions
Towards a very fast implementation

Python code

Fractal classification
GPU classification and clustering
Home-made graphic library

Download the Article

The technical article, entitled Fast Classification and Clustering via Image Convolution Filters, is accessible in the “Free Books and Articles” section, here. The text highlighted in orange in this PDF document are keywords that will be incorporated in the index, when I aggregate all my related articles into a single book about innovative machine learning techniques. The text highlighted in blue corresponds to external clickable links, mostly references. And red is used for internal links, pointing to a section, bibliography entry, equation, and so on.

To not miss future articles, sign-up to our newsletter, here.

The post Fast Classification and Clustering via Image Convolution Filters first appeared on Machine Learning Techniques.

Fast Classification and Clustering via Image Convolution Filters

Abstract

Table of Contents

Download the Article

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112