KAN Network

The KAN Network is proposed by the paper KAN: Kolmogorov-Arnold Networks. It has attracted a significant amount of interest since the paper was uploaded to arXiv on April 30, 2024. The associated repository (github) has already received more than 8 thousand stars by May 6, 2024!

The paper proposes a novel neural network architecture called Kolmogorov-Arnold Networks (KANs). The key idea is to replace fixed activation functions in neural networks with learnable ones. This is inspired by the Kolmogorov-Arnold representation theorem, which states that any function can be expressed as a combination of simple functions and basic operations. The authors argue that KANs are more interpretable and accurate than traditional MLPs, especially for scientific tasks. They also propose a specific KAN architecture with learnable spline activation functions and demonstrate how to improve accuracy by extending the grid resolution of these splines.

Kolmogorov-Arnold Representation Theorem

The key idea of the paper is Kolmogorov-Arnold Representation Theorem.

In essence, the Kolmogorov-Arnold representation theorem states that any continuous function of multiple variables can be decomposed into a finite combination of continuous functions of a single variable and addition.

More formally, it states that for every continuous function f on the n-dimensional unit cube $[0, 1]^{n}$ , there exist 2n+1 continuous functions of a single variable, ϕ and ψ, such that:

$f (x_{1}, x_{2}, ..., x_{n}) = \sum_{q = 0}^{2 n} ϕ_{q} (\sum_{p = 1}^{n} ψ_{p, q} (x_{p}))$

Significance

Hilbert’s 13th Problem: This theorem effectively resolved a restricted version of Hilbert’s 13th problem, which questioned whether general multivariate functions could be built from simpler functions.
Foundation for Neural Networks: It provides a theoretical base suggesting that complex multivariate behaviors can be built from compositions of simpler operations.

Methodology

It defines the model as $KAN (x) = Φ_{L - 1} \circ \dots \circ Φ_{1} \circ Φ_{0} \circ x$ where

$\circ$ represents function composition
$Φ$ is the Kolmogorov-Arnold layer

$Φ = ϕ_{1, 1} (\cdot) ⋮ ϕ_{n_{out}, 1} (\cdot) \dots \dots ϕ_{1, n_{in}} (\cdot) ⋮ ϕ_{n_{out}, n_{in}} (\cdot)$
As a comparison, a Multi-Layer Perceptron is interleaved by linear layers

$MLP (x) = W_{L - 1} \circ σ \circ \dots \circ W_{1} \circ σ \circ W_{0} \circ x$
The paper uses B-spline for $ϕ_{i, j} (\cdot)$ .
- This approach provides flexibility, as B-splines are represented by weighted sums of all input points.
- However, it can be computationally expensive, as calculations involve all input points.
The paper also use L1 regularization to encourage sparse representation.

FourierKAN

FourierKAN utilizes a Fourier basis instead of B-splines. This offers several potential advantages:

Faster computation: Fourier representations have fixed basis functions, unlike B-splines, which must calculate their basis functions based on input data. This leads to faster processing.
Easier optimization: Fourier representations are global, meaning they influence the entire output. B-splines are local, affecting only limited regions. This global nature makes FourierKANs potentially easier to optimize during training.

Comments

If we denote the standard Fourier transform as $F$ , then the KAN network can be expressed as $Φ (x) = W_{L} \circ F \circ W_{L - 1} \circ F \circ W_{1} \circ F \circ x$ , as appose to $W_{L - 1} \circ σ \circ \dots \circ W_{1} \circ σ \circ W_{0} \circ x$ for MLP. This shows the similarity and key difference to Multilayer Perceptrons (MLPs). The key difference lies in the activation function: in MLPs, it’s typically a function like $σ$ , whereas KANs replace this with a family of B-Spline or Fourier basis transforms. In the other words, the linear layer in MLP is applied on a transformed basis (e.g. b-spline or Fourier) of the inputs, in replace of the $σ$ activation function. The activation function is not needed in KANs as the basis transform introduces non-linearity into the model.

🪴 Luckyrand's Garden

Explorer

KAN Network

Kolmogorov-Arnold Representation Theorem

Methodology

FourierKAN

Comments

Graph View

Table of Contents

Backlinks