The KAN Network is proposed by the paper KAN: Kolmogorov-Arnold Networks. It has attracted a significant amount of interest since the paper was uploaded to arXiv on April 30, 2024. The associated repository (github) has already received more than 8 thousand stars by May 6, 2024!

The paper proposes a novel neural network architecture called Kolmogorov-Arnold Networks (KANs). The key idea is to replace fixed activation functions in neural networks with learnable ones. This is inspired by the Kolmogorov-Arnold representation theorem, which states that any function can be expressed as a combination of simple functions and basic operations. The authors argue that KANs are more interpretable and accurate than traditional MLPs, especially for scientific tasks. They also propose a specific KAN architecture with learnable spline activation functions and demonstrate how to improve accuracy by extending the grid resolution of these splines.

Kolmogorov-Arnold Representation Theorem

The key idea of the paper is Kolmogorov-Arnold Representation Theorem.

In essence, the Kolmogorov-Arnold representation theorem states that any continuous function of multiple variables can be decomposed into a finite combination of continuous functions of a single variable and addition.

More formally, it states that for every continuous function f on the n-dimensional unit cube , there exist 2n+1 continuous functions of a single variable, ϕ and ψ, such that:

Significance

  • Hilbert’s 13th Problem: This theorem effectively resolved a restricted version of Hilbert’s 13th problem, which questioned whether general multivariate functions could be built from simpler functions.
  • Foundation for Neural Networks: It provides a theoretical base suggesting that complex multivariate behaviors can be built from compositions of simpler operations.

Methodology

It defines the model as where

  • represents function composition

  • is the Kolmogorov-Arnold layer

  • As a comparison, a Multi-Layer Perceptron is interleaved by linear layers

  • The paper uses B-spline for .

    • This approach provides flexibility, as B-splines are represented by weighted sums of all input points.
    • However, it can be computationally expensive, as calculations involve all input points.
  • The paper also use L1 regularization to encourage sparse representation.

FourierKAN

FourierKAN utilizes a Fourier basis instead of B-splines. This offers several potential advantages:

  • Faster computation: Fourier representations have fixed basis functions, unlike B-splines, which must calculate their basis functions based on input data. This leads to faster processing.
  • Easier optimization: Fourier representations are global, meaning they influence the entire output. B-splines are local, affecting only limited regions. This global nature makes FourierKANs potentially easier to optimize during training.

Comments

If we denote the standard Fourier transform as , then the KAN network can be expressed as , as appose to for MLP. This shows the similarity and key difference to Multilayer Perceptrons (MLPs). The key difference lies in the activation function: in MLPs, it’s typically a function like , whereas KANs replace this with a family of B-Spline or Fourier basis transforms. In the other words, the linear layer in MLP is applied on a transformed basis (e.g. b-spline or Fourier) of the inputs, in replace of the activation function. The activation function is not needed in KANs as the basis transform introduces non-linearity into the model.