The KAN Network is proposed by the paper KAN: Kolmogorov-Arnold Networks. It has attracted a significant amount of interest since the paper was uploaded to arXiv on April 30, 2024. The associated repository (github) has already received more than 8 thousand stars by May 6, 2024!
The paper proposes a novel neural network architecture called Kolmogorov-Arnold Networks (KANs). The key idea is to replace fixed activation functions in neural networks with learnable ones. This is inspired by the Kolmogorov-Arnold representation theorem, which states that any function can be expressed as a combination of simple functions and basic operations. The authors argue that KANs are more interpretable and accurate than traditional MLPs, especially for scientific tasks. They also propose a specific KAN architecture with learnable spline activation functions and demonstrate how to improve accuracy by extending the grid resolution of these splines.
Kolmogorov-Arnold Representation Theorem
The key idea of the paper is Kolmogorov-Arnold Representation Theorem.
In essence, the Kolmogorov-Arnold representation theorem states that any continuous function of multiple variables can be decomposed into a finite combination of continuous functions of a single variable and addition.
More formally, it states that for every continuous function f on the n-dimensional unit cube , there exist 2n+1 continuous functions of a single variable, ϕ and ψ, such that:
Significance
- Hilbert’s 13th Problem: This theorem effectively resolved a restricted version of Hilbert’s 13th problem, which questioned whether general multivariate functions could be built from simpler functions.
- Foundation for Neural Networks: It provides a theoretical base suggesting that complex multivariate behaviors can be built from compositions of simpler operations.
Methodology
It defines the model as where
-
represents function composition
-
is the Kolmogorov-Arnold layer
-
As a comparison, a Multi-Layer Perceptron is interleaved by linear layers
-
The paper uses B-spline for .
- This approach provides flexibility, as B-splines are represented by weighted sums of all input points.
- However, it can be computationally expensive, as calculations involve all input points.
-
The paper also use L1 regularization to encourage sparse representation.
FourierKAN
FourierKAN utilizes a Fourier basis instead of B-splines. This offers several potential advantages:
- Faster computation: Fourier representations have fixed basis functions, unlike B-splines, which must calculate their basis functions based on input data. This leads to faster processing.
- Easier optimization: Fourier representations are global, meaning they influence the entire output. B-splines are local, affecting only limited regions. This global nature makes FourierKANs potentially easier to optimize during training.
Comments
If we denote the standard Fourier transform as , then the KAN network can be expressed as , as appose to for MLP. This shows the similarity and key difference to Multilayer Perceptrons (MLPs). The key difference lies in the activation function: in MLPs, it’s typically a function like , whereas KANs replace this with a family of B-Spline or Fourier basis transforms. In the other words, the linear layer in MLP is applied on a transformed basis (e.g. b-spline or Fourier) of the inputs, in replace of the activation function. The activation function is not needed in KANs as the basis transform introduces non-linearity into the model.