kindxiaoming / pykan Goto Github PK

View Code? Open in Web Editor NEW

13.5K 13.5K 1.2K 17.09 MB

Kolmogorov Arnold Networks

License: MIT License

Python 2.26% Jupyter Notebook 97.74%

pykan's People

Stargazers

Watchers

Forkers

allthingsllm jselvam11 mbenelli theonetrueguy prajolshrestha apollohuang1 linux-leo tfius hipitihop jeongwhanchoi sciumo dosyago mrcodechef valeman viktor-ktorvi evelynmitchell ahmed-fau kinddevil ayeganov eltociear vyunolbek yoshi31 wuchangjin94 nousr thearchiver kayabaakihiko13 artemamentes dady-mlops muqadma saiddddd gitpaulsmith edwardceballos mardom kennykangmpc dimitarpg13 anamhira47 gmavaliani theycallmeloki ns3284 bootstrapm alcidesmorales szegota jnros exponentialr ashh-z kotthoff migo-iq-inc rashed-rah towzeur diode23 zeroxclem ukaserge vtempest techthiyanes corykornowicz kaenova jseam2 dophist vaibtan inscapist thieu1995 anusornc mlw67 liunix61 pengqianhan zhuyuntong fdoperezi blackstarez whitezou jithinraj zhanzheng8585 uxtl oleksandrkorenkovua mhknb royess murphyhoops zachlee10 erley mspil88 cloudbee7 ototao lifeworks wilsey1016 lld7 david20080125 jianwei-liu93 leixy76 theolefur geometrylearner alicewithrabbit weiqisun999 mshoush frostime codeaudit guquandaju012 deanriddle loucerac vpegasus ashioyajotham udemirezen

pykan's Issues

(not issue) (question) Approximation Abilitiy / 拟合能力

更多变量的就不举例了，我就用一个变量的来示意，多变量高维度可以想见的更复杂。
sin(x*333-cos(999x+2))-cos(22x+1)/exp(x)/sin(6x*x+1)/55*x

-1<x<+1, -2<y<+2，函数图像为：

如果增加上面的三角函数频率到一个很高的值，函数复合更多几层：单增+周期组合，数据点上和高度离散没有规律的差不多，很难拟合到这个上帝函数，如果组合几十次带有周期的函数就更复杂，虽然这些函数个个连续可微，都还不包含更难的函数。
现实世界还有一定的噪音。
（可以很用心的设计一些组合，让变化更复杂很多。在不同的x区间变化很大。这里仅仅示意。）

为了增强逼近能力，能够变化的： layer_deepth, layer_width/主要？, spline_grid_number, spline_order_k
首先最敏感的：order_k 肯定不能太大，k=3是不是算足够大了；太大对于简单问题和grid一样是否导致过拟合？很需要斟酌的超参数了？
其次比较敏感的： grid_number，也不能无限的细分
深度相对变化范围有限：比如这个问题，从几到几十？
如果要精确的逼近，宽度的合理增加，对精度提升，泛化问题，作用多大？好像也只有增加宽度合理一些，也能应对这种高频的快速大变化。

也没有深入去对比MLP等其他的基本的网络building block的拟合能力。

对于这种高频周期函数的task，用: 样条函数+Silu+Linear的组合去fit，即便数学上“存在”，但实际找到那些函数（那些基本函数去复合），很有挑战的样子。

水平有限，欢迎指正。

Embedded image URL is time limited.

I think you should embed images using the raw url versions, as they are now using a time based token, and they throw 404 after some time.

About some concerns for integration of KAN into regular NN

Hi, Ziming!
While trying your very detailed tutorial, I’ve found a severe issue which may undermine the effectiveness of integrating KAN into other regular neural networks.

For instance, in tutorial/Example_4_symbolic_regression.ipynb, you’ve shown how to do symbolic regression with KAN. But if we alter the range of dataset input from [-1, 1] to [-1.25, 1.25], the model would fail to even be sparse. That’s because the spline functions are not defined outside [-1, 1] in the first place.
Well, true enough that with a given dataset, we can force the grid range to be exactly the range of data. But what if we use KAN as a hidden layer of other model? The ranges of latent variables are not bounded, even after normalization! (unless we use some unstable normalizations like projecting the maximum and minimum of a batch to 1 and -1).
So, how can we resolve that?
Say, after a batch normalization (std=1), then set the range of KAN grids to be [-3, 3] (following the 3σ principle)?
Or can we use some weird techniques like expanding the left 2k splines beyond the left bound, and the same for the right side?

model.train(device='cuda') is not working: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

Hi,

I tried this:

https://github.com/KindXiaoming/pykan/blob/master/tutorials/API_10_device.ipynb

But nvidia-smi shows no GPU usage at all, and top shows high CPU usage, looks like it's not training on GPU.

Then I add device=device:

model.train(dataset, opt="LBFGS", steps=50, lamb=5e-5, lamb_entropy=2., device=device);

it then errors out:

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

How to train own dataset for regression?

Hello, how to train own dataset for regression task?
I created the dataset in this way to check the regression task.

dataset = {
    'train_input':torch.from_numpy(X_train[:3000]),
    'test_input': torch.from_numpy(X_test[:2000]),
    'train_label':torch.from_numpy(y_train[:3000]),
    'test_label':torch.from_numpy(y_test[:2000]),
}

but when I set model to train

model.train(dataset, opt="LBFGS", steps=20, lamb=0.01, lamb_entropy=10.);

it gave me an error:

`File /opt/conda/lib/python3.10/site-packages/kan/LBFGS.py:319, in LBFGS.step(self, closure)
316 state.setdefault('n_iter', 0)
318 # evaluate initial f(x) and df/dx
--> 319 orig_loss = closure()
320 loss = float(orig_loss)
321 current_evals = 1

File /opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py:115, in context_decorator..decorate_context(*args, **kwargs)
112 @functools.wraps(func)
113 def decorate_context(*args, **kwargs):
114 with ctx_factory():
--> 115 return func(*args, **kwargs)

File /opt/conda/lib/python3.10/site-packages/kan/KAN.py:897, in KAN.train..closure()
895 train_loss = loss_fn(pred[id_], dataset['train_label'][train_id][id_].to(device))
896 else:
--> 897 train_loss = loss_fn(pred, dataset['train_label'][train_id].to(device))
898 reg_ = reg(self.acts_scale)
899 objective = train_loss + lamb*reg_

IndexError: index 2941 is out of bounds for dimension 0 with size 2000`

FourierSynthKan?

So the use of splines is cool, but computationally expensive, and for large models storage of the parameters might become extremely large.

Instead of a spline for each edge, what about a sine wave?

Functions can be approximated to varying levels of precision by sine waves

For each edge, we store the amplitude, phase and frequency.

We could get even simpler, and each level would increase the frequency of the previous by one, and each column could shift the phase by a set amount.

Of course we need more edges, but each edge is simpler.

Finally, another simplification is to simply use a sampled lookup table for the sine.

This could be done, I think fairly quickly in hardware.

This might not be as useful for precise function approximation, but would be like quantitized models currenly being used elsewhere.

Data type error

Thank you very much for your excellent work! I want to try to replace the MLP in my model with KAN. When I import KAN via "from kan import KAN", I find that some variables are changed from float32 to float64.

Therefore, I get this error:
File "/home/shuaizhang/anaconda3/envs/4DGS-KAN/lib/python3.8/site-packages/diff_gaussian_rasterization-0.0.0-py3.8-linux-x86_64.egg/diff_gaussian_rasterization/init.py", line 92 , in forward
num_rendered, color, depth, radii, geomBuffer, binningBuffer, imgBuffer = _C.rasterize_gaussians(*args)
RuntimeError: expected scalar type Float but found Double

I want to know how can I solve this problem?

loss_fn by default is Mean Squared Error (MSE), not Root Mean Squared Error (RMSE)?

pykan/kan/KAN.py

Line 844 in 915726a

loss_fn = loss_fn_eval = lambda x, y: torch.mean((x - y) ** 2)

Also can you add the doc for this loss_fn param?

What does George-Lorentz Varian mean to KAN?

simpler architecture?

ConvKan or LinearKan needed !

How can I modify it so that it can operate on the last dimension like a real MLP?

I think "hence beating curse of dimensionality" P8 a little unconvincing

Obviously, C of Theorem 2.1 is dependent of dimension. There are remarkably a contradiction between the Red box and Blue Box.

PDE Modelling, more complex benchmarks

Hi! Amazing work, the PDEs modelling with Physics Informed Neural Network seems to be a breakthrough by looking at the root mean square error.

I wanted to ask if you tried more complex PDEs, e.g Burgers, Navier Stokes, KdV?

If you are interest we could collaborate to put KAN layer in PINA which is a versatile software for equation learning, and try more complex benchmarks! Thanks for the work🚀🚀

Freely defining aggregation over nodes

It would be interesting to allow multiplication effect from one node to another.
Example: in time series analysis if the Sundays are always closed, so the sales is always 0, it's not possible to create a clean 0 prediction from pure additive rules. If a node could affect the final result through a multiplier logic, it would be possible to learn closed store cases.

set_aggregate_effect(0, [1,1], 'x_3*(x_0+x_1+x_2)')
where the first argument is the input layer, second argument is the target layer and node, third argument is the equation for the aggregation of the input nodes.

I am pessimistic about 'extracting symbolic representations through regression'.

I am pessimistic about 'extracting symbolic representations through regression'.
The feasible path as I understand it is: (abstraction + (search and/or (heuristic-guessing + validation)))

model can't move to cuda

replicating tutorials/API_10_device.ipynb, i see no load on the GPU, just the CPU. VRAM gets occupied, however
checking the device of the dataset returns "cuda", the model parameters however return "cpu" as their device. this can be fixed by calling .to(device) on the model, but this breaks the training, leading to the following error

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[19], line 8
      4 dataset = create_dataset(f, n_var=4, train_num=3000, device=device)
      6 # train the model
      7 #model.train(dataset, opt="LBFGS", steps=20, lamb=1e-3, lamb_entropy=2.)
----> 8 model.train(dataset, opt="LBFGS", steps=10, lamb=5e-5, lamb_entropy=2.)

File ~/.conda/envs/pykan/lib/python3.9/site-packages/kan/KAN.py:913, in KAN.train(self, dataset, opt, steps, log, lamb, lamb_l1, lamb_entropy, lamb_coef, lamb_coefdiff, update_grid, grid_update_num, loss_fn, lr, stop_grid_update_step, batch, small_mag_threshold, small_reg_factor, metrics, sglr_avoid, save_fig, in_vars, out_vars, beta, save_fig_freq, img_folder, device)
    910 test_id = np.random.choice(dataset['test_input'].shape[0], batch_size_test, replace=False)
    912 if _ % grid_update_freq == 0 and _ < stop_grid_update_step and update_grid:
--> 913     self.update_grid_from_samples(dataset['train_input'][train_id].to(device))
    916 if opt == "LBFGS":
    917     optimizer.step(closure)

File ~/.conda/envs/pykan/lib/python3.9/site-packages/kan/KAN.py:242, in KAN.update_grid_from_samples(self, x)
    219 '''
    220 update grid from samples
    221 
   (...)
    239 tensor([0.0128, 1.0064, 2.0000, 2.9937, 3.9873, 4.9809])
    240 '''
    241 for l in range(self.depth):
--> 242     self.forward(x)
    243     self.act_fun[l].update_grid_from_samples(self.acts[l])

File ~/.conda/envs/pykan/lib/python3.9/site-packages/kan/KAN.py:313, in KAN.forward(self, x)
    308 self.acts.append(x) # acts shape: (batch, width[l])
    311 for l in range(self.depth):
--> 313     x_numerical, preacts, postacts_numerical, postspline = self.act_fun[l](x)
    315     if self.symbolic_enabled == True:
    316         x_symbolic, postacts_symbolic = self.symbolic_fun[l](x)

File ~/.conda/envs/pykan/lib/python3.9/site-packages/torch/nn/modules/module.py:1511, in Module._wrapped_call_impl(self, *args, **kwargs)
   1509     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1510 else:
-> 1511     return self._call_impl(*args, **kwargs)

File ~/.conda/envs/pykan/lib/python3.9/site-packages/torch/nn/modules/module.py:1520, in Module._call_impl(self, *args, **kwargs)
   1515 # If we don't have any hooks, we want to skip the rest of the logic in
   1516 # this function, and just call forward.
   1517 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1518         or _global_backward_pre_hooks or _global_backward_hooks
   1519         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1520     return forward_call(*args, **kwargs)
   1522 try:
   1523     result = None

File ~/.conda/envs/pykan/lib/python3.9/site-packages/kan/KANLayer.py:175, in KANLayer.forward(self, x)
    173 preacts = x.permute(1,0).clone().reshape(batch, self.out_dim, self.in_dim)
    174 base = self.base_fun(x).permute(1,0) # shape (batch, size)
--> 175 y = coef2curve(x_eval=x, grid=self.grid[self.weight_sharing], coef=self.coef[self.weight_sharing], k=self.k) # shape (size, batch)
    176 y = y.permute(1,0) # shape (batch, size)
    177 postspline = y.clone().reshape(batch, self.out_dim, self.in_dim)

File ~/.conda/envs/pykan/lib/python3.9/site-packages/kan/spline.py:99, in coef2curve(x_eval, grid, coef, k, device)
     64 '''
     65 converting B-spline coefficients to B-spline curves. Evaluate x on B-spline curves (summing up B_batch results over B-spline basis).
     66 
   (...)
     95 torch.Size([5, 100])
     96 '''
     97 # x_eval: (size, batch), grid: (size, grid), coef: (size, coef)
     98 # coef: (size, coef), B_batch: (size, coef, batch), summer over coef
---> 99 y_eval = torch.einsum('ij,ijk->ik', coef, B_batch(x_eval, grid, k, device=device))
    100 return y_eval

File ~/.conda/envs/pykan/lib/python3.9/site-packages/torch/functional.py:380, in einsum(*args)
    375     return einsum(equation, *_operands)
    377 if len(operands) <= 2 or not opt_einsum.enabled:
    378     # the path for contracting 0 or 1 time(s) is already optimized
    379     # or the user has disabled using opt_einsum
--> 380     return _VF.einsum(equation, operands)  # type: ignore[attr-defined]
    382 path = None
    383 if opt_einsum.is_available():

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument mat2 in method wrapper_CUDA_bmm)

environment: fresh conda venv with the requirements.txt installed
cuda version: 12.2

any ideas which parameter could be left behind on the CPU?

MNIST Example

Thank you for the excellent work! I wanted to ask if it's possible to conduct toy experiments using the MNIST dataset. As you know, MNIST is a standard dataset used in image classification tasks.

Thanks!

Is it only suitbale for small-scale model?

Hi, thanks for your great work. I am thinking about implementing a KAN with 3072 as input dims and 2000 as output dims. Do you think the GPU is capable for running it? I have tried fourierKANs but always got killed.

[DOCS] Improve installation process

Hello,

While working with the pykan project, I noticed some opportunities to enhance the installation guidelines, which could benefit users, particularly those new to Python development. I would like to propose a few changes aimed at simplifying the setup process and promoting the use of best practices for managing Python environments.

Current Installation Process

The current guidelines in the README involve:

Cloning the repository
Navigating to the directory
Installing the package with pip

Proposed Enhancements

Streamline Installation Commands: Combine the cloning and installation steps into a single command to simplify the process:

pip install git+https://github.com/KindXiaoming/pykan.git

Guide on Virtual Environment Setup: Add instructions for setting up a Python virtual environment, which helps in isolating and managing dependencies:

python -m venv pykan-env
source pykan-env/bin/activate  # On Windows, use `pykan-env\Scripts\activate`

Optional Conda Environment Setup: Provide guidelines for users who prefer Conda:

conda create --name pykan-env python=3.9.7
conda activate pykan-env

These changes aim to make the installation process more straightforward and robust, minimizing potential dependency conflicts and enhancing the user experience.

I am eager to hear your thoughts on these suggestions and would be happy to collaborate on refining them further.

How to apply KAN on Computer Vision

Hi Author,

Thank you for your great work. I am wondering if we can apply this network on Vision based task such as classification/detection/segmentation, etc.

Thank you for your help.

KAN.train doesn't allow to use KAN inside other torch models

KAN implementation overrides https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.train
If i embed KAN in a pytorch module and call a train on it it causes error, because KAN's train expects a dict

a improve about Unsupervised learning

I noticed that changing the seed in the example of unsupervised learning can lead to inaccurate results, and I found that the last layer function has become a quadratic function (Figure 4). I think the last layer function should remain a Gaussian function, so I added a feature called keep_fit

The effect is as follows

simple change：

Please provide a description of the basis functions B_i

Unless I missed something, the paper does not include one. It appears that the B_i are computed using the "Cox–de Boor recursion formula", or something similar (https://en.wikipedia.org/wiki/B-spline#Definition):

pykan/kan/spline.py

Lines 56 to 60 in 6a5b99f

 if k == 0: 

 value = (x >= grid[:, :-1]) * (x < grid[:, 1:]) 

 else: 

 B_km1 = B_batch(x[:, 0], grid=grid[:, :, 0], k=k - 1, extend=False, device=device) 

 value = (x - grid[:, :-(k + 1)]) / (grid[:, k:-1] - grid[:, :-(k + 1)]) * B_km1[:, :-1] + (grid[:, k + 1:] - x) / (grid[:, k + 1:] - grid[:, 1:(-k)]) * B_km1[:, 1:]

Э хуй

Как её запустить?

model2.initialize_from_another_model() does not honor device='cuda'

    # model is trained with device='cuda', but when we do:
    model2.initialize_from_another_model(model, dataset['train_input']);

it error out:

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

Can you give me some tips about the "2n + 1" in Eq. 2.1

Hi,

I recently came across your fascinating paper and thoroughly enjoyed reading it. The insights you presented were truly thought-provoking.

However, I am hoping to gain some clarity. In Equation 2.1, the upper (or layer width) of q is set to 2n + 1. I tried to find more information about this in the references you provided, but unfortunately, I couldn't locate any specific details.

I was wondering if you could kindly provide me with some tips or guidance on understanding the reasoning behind this particular choice of the upper limit. Any additional context or resources you could share would be greatly appreciated.

Thank you in advance for your time and consideration.

out of memory if I use kanlayer replacing mlp in nano-gpt

I replaced a mlp of kanlayer like this:
`class MLP_KAN(nn.Module):

def __init__(self,config):
    super().__init__()
    self.in_dim=config.n_embd
    self.first_dim=config.n_embd //4
    self.out_dim=config.n_embd //4
    self.kan_1 = KANLayer(self.in_dim,self.first_dim,device="cuda")
    self.kan_2 = KANLayer(self.first_dim,self.out_dim,device="cuda")

def forward(self,x):
    n,l,c = x.shape
    x = x.reshape([n*l,c])
    x = self.kan_1(x)
    x = self.kan_2(x)
    x = x.view([n,l,c])
    return x`

using above layer instead of original MLP
`class MLP(nn.Module):

def __init__(self, config):
    super().__init__()
    self.c_fc    = nn.Linear(config.n_embd, 4 * config.n_embd, bias=config.bias)
    self.gelu    = nn.GELU()
    self.c_proj  = nn.Linear(4 * config.n_embd, config.n_embd, bias=config.bias)
    self.dropout = nn.Dropout(config.dropout)

def forward(self, x):
    x = self.c_fc(x)
    x = self.gelu(x)
    x = self.c_proj(x)
    x = self.dropout(x)
    return x`

and it run out of memory :
File "/home/alex/Projects/pykan-master/kan/spline.py", line 60, in B_batch
value = (x - grid[:, :-(k + 1)]) / (grid[:, k:-1] - grid[:, :-(k + 1)]) * B_km1[:, :-1] + (grid[:, k + 1:] - x) / (grid[:, k + 1:] - grid[:, 1:(-k)]) * B_km1[:, 1:]
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 22.50 GiB. GPU 0 has a total capacty of 23.64 GiB of which 13.85 GiB is free. Including non-PyTorch memory, this process has 9.05 GiB memory in use. Of the allocated memory 8.58 GiB is allocated by PyTorch, and 25.99 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Let's face real world dataset

so far I find that all exaples is generated by some function, I want to test the KAN in real world.
I choose the famous boston house prices dataset from the https://www.kaggle.com/datasets/vikrishnan/boston-house-prices

Here is my test code, and test loss is very bad.
Maybe my setting is wrong.
please let me know if anybody can test it sucessfully.

from kan import KAN, create_dataset
import torch
import pandas as pd
from sklearn import preprocessing
# Let's scale the columns before plotting them against MEDV
scaler = preprocessing.StandardScaler()

def create_boston_house_data(train_num=450):

    from pandas import read_csv
    #Lets load the dataset and sample some
    column_names = ['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX', 'PTRATIO', 'B', 'LSTAT', 'MEDV']
    data = read_csv('./housing.csv', header=None, delimiter=r"\s+", names=column_names)
    print(data.head(5))
    #data = data.sample(frac=1.0)
    data = data.sample(frac=1).reset_index(drop=True)
    column_sels = ['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX', 'PTRATIO', 'B', 'LSTAT']
    x = data.loc[:,column_sels]
    x = pd.DataFrame(data=scaler.fit_transform(x), columns=column_sels)
    print(x)
    x_tain = x.loc[0:450,column_sels]
    y_train = data.loc[0:450,'MEDV']
    x_test = x.loc[450:,column_sels]
    y_test = data.loc[450:,'MEDV']
        
    dataset = {}
    dataset['train_input'] = torch.from_numpy(x_tain.values).float()
    dataset['test_input'] = torch.from_numpy(x_test.values).float()

    dataset['train_label'] = torch.from_numpy(y_train.values).float() 
    dataset['test_label'] = torch.from_numpy(y_test.values).float()

    return dataset
  
dataset = create_boston_house_data()
model = KAN(width=[13,13,13,6,6,2,1], grid=10, k=3, seed=0)
model.train(dataset, opt="Adam", steps=250, lamb=0.001, lamb_entropy=2.);
print(model)

output

train loss: 9.34e+00 | test loss: 7.98e+00 | reg: 1.32e+03 : 100%|█| 250/250 [00:45<00:00,  5.50it/s
KAN(
  (biases): ModuleList(
    (0-1): 2 x Linear(in_features=13, out_features=1, bias=False)
    (2-3): 2 x Linear(in_features=6, out_features=1, bias=False)
    (4): Linear(in_features=2, out_features=1, bias=False)
    (5): Linear(in_features=1, out_features=1, bias=False)
  )
  (act_fun): ModuleList(
    (0-5): 6 x KANLayer(
      (base_fun): SiLU()
    )
  )
  (base_fun): SiLU()
  (symbolic_fun): ModuleList(
    (0-5): 6 x Symbolic_KANLayer()
  )
)

RuntimeError. Problems with dimensions.

Hello! We are trying to plot KAN at initialization:

and get error, (where 88 - is length of my set):

RuntimeError                              Traceback (most recent call last)
[<ipython-input-43-31a261174e74>](https://localhost:8080/#) in <cell line: 1>()
----> 1 model(dataset['train_input'])
      2 model.plot(beta=100)

5 frames
[/usr/local/lib/python3.10/dist-packages/kan/KANLayer.py](https://localhost:8080/#) in forward(self, x)
    170         batch = x.shape[0]
    171         # x: shape (batch, in_dim) => shape (size, batch) (size = out_dim * in_dim)
--> 172         x = torch.einsum('ij,k->ikj', x, torch.ones(self.out_dim,).to(self.device)).reshape(batch, self.size).permute(1,0)
    173         preacts = x.permute(1,0).clone().reshape(batch, self.out_dim, self.in_dim)
    174         base = self.base_fun(x).permute(1,0) # shape (batch, size)

RuntimeError: shape '[88, 10]' is invalid for input of size 1320

What could be the problem?

GPU Acceleration

Hello,

I tried running hellokan.ipynb on cpu and gpu, and the speed is similar. What are some places where parallelization can be introduced to improve performance?

kan/tutorials/API_10_device.ipynb errors out, tensors are not on the same device

the tutorial example API_10_device errors out with the below stack trace

description:   0%|                                                           | 0/50 [00:01<?, ?it/s]
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
[<ipython-input-7-3f2157aed2aa>](https://localhost:8080/#) in <cell line: 7>()
      5 # train the model
      6 #model.train(dataset, opt="LBFGS", steps=20, lamb=1e-3, lamb_entropy=2.);
----> 7 model.train(dataset, opt="LBFGS", steps=50, lamb=5e-5, lamb_entropy=2.);

6 frames
[/usr/local/lib/python3.10/dist-packages/torch/functional.py](https://localhost:8080/#) in einsum(*args)
    378         # the path for contracting 0 or 1 time(s) is already optimized
    379         # or the user has disabled using opt_einsum
--> 380         return _VF.einsum(equation, operands)  # type: ignore[attr-defined]
    381 
    382     path = None

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

reproduced on a fresh install on colab and local system

Error installing via pip

Hi folks,

When I clone and install using pip install -e ., I get an error reading the long description from the README.md file.

Error Message

        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "C:\Users\K\Documents\Research\GitHub\pykan\setup.py", line 5, in <module>
          long_description = fh.read()
        File "C:\Users\K\Anaconda3\envs\tkc\lib\encodings\cp1252.py", line 23, in decode
          return codecs.charmap_decode(input,self.errors,decoding_table)[0]
      UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 4711: character maps to <undefined>```

If you change line 4 in setup.py from with open("README.md", "r") as fh: to with open("README.md", "r", encoding="utf8") as fh:, it solves the error for me.

pykan for Julia (Jukan)?

Is there a plan to implement the network in julia? Maybe provide an interface with Lux.jl?

Test ai

The results are different from hellokan.ipynb

Hi. I had followed the instruction exactly. When I run the hellokan.ipynb it worked. But the result is different from the one in the original version. For example the prouned network had more notes than the original one:
The original one:

My version was:

For the symbolic formula, the original one is:

My version was:

What's wrong? Please help me to fix this problem.

Not issue, to KAN-er, a minimum-kan.py share with you guys. just 60+ lines. for study...

The working memory length of you guys is 7 while mine is only 3,
and my math is too shit.

So I try my best to keep the original code of the Great-Master, without maximizing optimization.
It's convenient for everyone (especial me) to understand the algorithm by just looking at the key dozens of lines of code.

I love you all soooooooo much, sharing here:
https://github.com/yuedajiong/minimum-kan/blob/main/kan.py

Fixing learned weight for continuous learning

I'm building a model by iteratively adding new input variables. It would be brilliant to continue teaching a model by inheriting from another one (by simply just creating a copy of it, like model2 = model) and adding new nodes and/or edges. Then another functionality of fixing an edge's weight function.
Something like this:

model = KAN(width=[1,1], grid=6, k=3)
model.train(dataset, opt="Adam", lr=0.01, steps=1000)

model2 = model.copy()
model2.add_node(0,1) # this would automatically add edges between the new node and all the nodes in the next layer)
model2.fix_edge(0,0,0) # to keep the previously learned weight function
model2.train(dataset2, opt="Adam", lr=0.01, steps=200)

use case:
in time series analysis the DayOfWeek is a very strong variable, would be great to first just find the function that describes it and then force the model to find some additive change necessary on top of the found DayOfWeek pattern - like adding holiday effect, or adding trend.

ImportError: cannot import name 'KAN' from partially initialized module 'kan' (most likely due to a circular import)

Hi. May I ask what's wrong with this error? I'm just test the codes in API_10_device.ipynb.

from kan import KAN, create_dataset
ImportError: cannot import name 'KAN' from partially initialized module 'kan' (most likely due to a circular import)

Graph Neural Network version of Kolmogorov Arnold Networks (GraphKAN)

I have opened a GraphKan (https://github.com/WillHua127/GraphKAN), a Graph Neural Network version of Kolmogorov Arnold Networks (GraphKAN). Feel free to discuss and explore!!!

A question about model.plot() method

When I run examples of code (like from hellokan) method model.plot() didn't plot figures (however parts of it saved inside .figures) but matplotlib running in the same environment produce figures correctly.

Training on GPU doesn't work

Hi, when training on GPU, the data is on GPU, while the layers are on CPU. One of the reasons is that the constructor of KAN takes in an arg called 'device' but doesn't use it. For example, it should be passed into KANLayer in this line but is not.

Another thing is that setting model.to(device) doesn't do anything, as far as I can tell.

Attempting to solve the Time-Independent Schrodinger's Equation in 1D for a particle in a box problem using a KAN model

Hello!

First off, I would like to say, excellent work on KANs and this library. I read your thoughts on the recent explosion in attention that KANs have been receiving and wholeheartedly agree that while KANs may not serve the best purpose in trying to be frontrunners to replace current traditional neural network architectures like MLPs, there is immense potential for their applicability in Physics, especially for symbolic regression tasks and (re)discovering formulae from data.

Recently, me and my advisor had been investigating into deriving novel symbolic series expansions of neutrino oscillation probabilities up to 1st and 2nd orders in oscillation parameters such as the mass-hierarchy parameter and the mixing angles, but we ran into a wall due to the limitations posed by Mathematica and current symbolic CAS systems. Although we do not immediately see how KANs could help us overcome these limitations, we have been considering giving it another shot, this time with the power of KANs by our side.

Coming to my question, I've been playing around with the PDE solving example in the library's documentation and trying to write a similar example for solving the TI Schrodinger's equation in 1D for a classic particle in a box problem and I ran into a roadblock.

$$-\frac{\hbar^2}{2m}\frac{d^2 \psi(x)}{dx^2} = E \psi(x)$$

Generally when we attempt to solve Schrodinger's equation for a particle in a box problem in wave mechanics, we obtain both the energy eigenvalues as well as the solution to the wavefunction in the position basis. If I am to implement a KAN model for this task, would I have to have energy E as a parameter/variable similar to the position x and have the KAN model infer the energy eigenvalue's symbolic relationships by itself, or would I have to specify E by fixing n (the energy level) ahead of time, use the energy eigenvalue equation for this specific system, and utilise that info in the process?

I would really not like to solve for E ahead of time because that just makes it less general and in a setting where one is trying to derive symbolic expressions from data, it is not immediately aware how the energy eigenvalues arise for a given system if there is no further information that can be grokked. Furthermore, if I am to treat energy as a variable, the energy eigenvalues are discrete so I assume this is something I have to encode into the sampling process or would a KAN model be able to identify the discrete nature of this parameter itself if I sample it randomly (which implicitly assumes a continuous nature)?

Is there support for pykan to incorporate and work with parameters that are not variables as of now?

Any help would be greatly appreciated. Cheers!

Is KAN 10X slower per step of training, or does it need 10X many steps to converge?

Hi Ziming,

In Section 6 of your paper, you mentioned that KANs are practically 10X slower than MLPs. I am curious what you meant by it. Did you mean a KAN takes 10X as many steps to converge in comparison to an MLP with the same number of parameters, or that a KAN takes 10X as much wall-clock time to run a single step of training (forward, backward, and gradient update) in comparison with a same-parameter MLP?

One more question I have is, if I understand correct, a KAN has more parameters per neuron versus an MLP. In your speed (or slowness) claim, you did control for parameter count, not neuron count, correct?

how to train own data for multivarite time series forcasting task?

MLPs already used in time-series forecasting

The purpose of KAN class and pykan is unclear.

The KAN class is an implementation of an nn.Module, classically in ML, this expects it to be modular, reusable, extensible, etc. yet there are many utility methods that try to make it something more. The whole implementation of KAN is clearly more in the application side trying to make it easier for people to interface with KANs (most likely targeted at scientists and letting them do quick experiments). However for ML use-cases, it would be more beneficial to give a clean and modular implementation of KANs, which leave more room for pytorch like plug-n-play usage.

Given this dichotomy of expectations from ML RnD vs Scientific RnD, would it not be more feasible to have a minimal implementation of the KAN class which leans more towards existing torch/ML expectations and then have a wrapper library that uses the KAN class for further scientific usage?

KAN overrides `nn.Module` `train()`

Hi,

KAN overrides the base function train (link), and uses it for a different purpose. Someone using KAN, assuming that it acts the same as any nn.Module, will encounter an error when calling model.train() as they usually would.

This will lead to issues when combining KAN in a larger module with batchnorms and dropouts which need the model.train() call.

For example:

import torch.nn as nn
from kan import KAN

device = "cuda"

model = nn.Sequential(
    nn.Conv1d(3, 8, 3),
    nn.BatchNorm1d(8),
    nn.Dropout1d(),
    # ...
    KAN(width=[8, 8, 3], grid=5, k=3, device=device)
)
model.to(device)

print(model)

for epoch in range(10):
    model.train()
    # train loop
    # ...

    model.eval()
    # eval loop
    # ...

This throws an error:

File "C:\Users\todos\PycharmProjects\kan-examples\train_error.py", line 18, in <module>
    model.train()
  File "C:\Users\todos\anaconda3\envs\torch_opf_clone\lib\site-packages\torch\nn\modules\module.py", line 2394, in train
    module.train(mode)
  File "C:\Users\todos\anaconda3\envs\torch_opf_clone\lib\site-packages\kan\KAN.py", line 881, in train
    batch_size = dataset['train_input'].shape[0]
TypeError: 'bool' object is not subscriptable

Not issue: code optimization

看到有人尝试将KAN和MLP等在更大的task上去比较速度了。

也大概看了一下，感觉KANLayer等里面的代码可以优化；这样比较的时候才比较公平，不然有点吃亏。

比如大量的clone(重点), reshape, permute, unsqueeze, 还有ones:create_new_tensor_to(GPU), 还有那个B_batch操作，感觉这部分可以仔细分析一下，特别是被pytorch编译后的执行图，是不是优化的写法：在cpu/gpu之间tensor.to()，有在循环内不断构造tensor，然后向GPU传，如果不能被pytorch的编译优化掉，那肯定影响性能。

Out of sample generalisation

What is the expected stability of these methods for out of sample predictions?
I ran a simple example using a power curve function ( see below ) limited between [1,100].
The model performs quite well on the training set. Out of set it quickly diverges.

from kan import *

model = KAN(width=[1,2,1], grid=3, k=2, symbolic_enabled=False)
a,b,c = 0.7, 0.2, 0.1
f = lambda x: a*x**-b + c
dataset = create_dataset(f, n_var=1, ranges=[1,100])
model.train(dataset, opt="LBFGS", steps=20);


x = torch.tensor(np.arange(1,1000,0.1)).reshape(-1,1)[1:]
y_actual = f(x)
y_pred = model(x)[:,0].detach().numpy()

plt.scatter(x.numpy(),y_actual, color="red")
plt.scatter(x.numpy(),y_pred, color="blue")

Wierd Runtime error

When executing hellokan.ipynb the cell model.train(dataset, opt="LBFGS", steps=50); throws this error:

RuntimeError Traceback (most recent call last)
Cell In[14], line 1
----> 1 model.train(dataset, opt="LBFGS", steps=50)

File ~/python-test/pykan/kan/KAN.py:896, in KAN.train(self, dataset, opt, steps, log, lamb, lamb_l1, lamb_entropy, lamb_coef, lamb_coefdiff, update_grid, grid_update_num, loss_fn, lr, stop_grid_update_step, batch, small_mag_threshold, small_reg_factor, metrics, sglr_avoid, save_fig, in_vars, out_vars, beta, save_fig_freq, img_folder, device)
893 test_id = np.random.choice(dataset['test_input'].shape[0], batch_size_test, replace=False)
895 if _ % grid_update_freq == 0 and _ < stop_grid_update_step and update_grid:
--> 896 self.update_grid_from_samples(dataset['train_input'][train_id].to(device))
898 if opt == "LBFGS":
899 optimizer.step(closure)

File ~/python-test/pykan/kan/KAN.py:242, in KAN.update_grid_from_samples(self, x)
240 for l in range(self.depth):
241 self.forward(x)
--> 242 self.act_fun[l].update_grid_from_samples(self.acts[l])

File ~/python-test/pykan/kan/KANLayer.py:218, in KANLayer.update_grid_from_samples(self, x)
216 grid_uniform = torch.cat([grid_adaptive[:, [0]] - margin + (grid_adaptive[:, [-1]] - grid_adaptive[:, [0]] + 2 * margin) * a for a in np.linspace(0, 1, num=self.grid.shape[1])], dim=1)
217 self.grid.data = self.grid_eps * grid_uniform + (1 - self.grid_eps) * grid_adaptive
--> 218 self.coef.data = curve2coef(x_pos, y_eval, self.grid, self.k, device=self.device)

File ~/python-test/pykan/kan/spline.py:135, in curve2coef(x_eval, y_eval, grid, k, device)
133 # x_eval: (size, batch); y_eval: (size, batch); grid: (size, grid); k: scalar
134 mat = B_batch(x_eval, grid, k, device=device).permute(0, 2, 1)
--> 135 coef = torch.linalg.lstsq(mat.to('cpu'), y_eval.unsqueeze(dim=2).to('cpu')).solution[:, :, 0] # sometimes 'cuda' version may diverge
136 return coef.to(device)

RuntimeError: false INTERNAL ASSERT FAILED at "../aten/src/ATen/native/BatchLinearAlgebra.cpp":1539, please report a bug to PyTorch. torch.linalg.lstsq: (Batch element 0): Argument 6 has illegal value. Most certainly there is a bug in the implementation calling the backend library.

has anybody a solution to this? Thanks Winux-Arch

(Not issue) some notes

记录一些与此算法相关的点：

KAN除了论文中提及的一些分析，KAN这种做法，到底有那些优点和缺点？
连续vs.离散
对特别高维度的输入（比如图像，1024x1024，还是conv或者patch，然后在更小表示的feature上吗)
对于大量冗余信息的输入（比如图像，边缘比较敏感，可以一个对象的中间一大片颜色差不多没有纹理，信息量很小）
对于跨度很大的不同空间（比如从图片到label-id-int，比如从长长的文本tokend到类别ai）
从很模糊表示的张量到相对明晰的概念的获取，这个和抽象出符号化函数组合本质一样还是不一样？
被张量化优化的潜力
对参数初始的敏感程度（比如用三角周期函数做激活函数/傅里叶级数思路设计网络，参数初始化比较考虑到初始化能覆盖不同频率，否则收敛就有问题）
对于数学很专业的人来说，对逼近理论研究很深的人来说，如果知道至少当前逼近理论的全貌，如果从0来设计类似这种NN中的原子“逼近层”，会怎样？
与此相关的，从泰勒级数/Taylor series，到傅里叶级数Fourier series，到万能逼近定理universal approximation theorem，到KA表示定理Kolmogorov–Arnold representation theorem，还有好像Weierstrass approximation theorem/Stone–Weierstrass theorem，...
除了“无限逼近”的能力，还有哪些需要考虑的点呢？我抛个砖：
数学部分：数值区间；连续离散；...
其他部分：参数多少；对应到硬件优化后的计算量；是否与输入参数个数强绑定；逼近速度；假设有一个上帝函数需要逼近，学习得到的和这个上帝函数的接近程度（假设有多种函数复合形式，在有限的数据和学习步骤下，和上帝函数的接近程度；可能新的测试数据有很大的差异）；是不是一定追求参数最小化；参数最少化是不是就是最小描述；...
不同的算法实现，可能得到差异很大的表示？比如不同思路下去，有些是尽快将大的成分给逼近；有些是尽力不同成分都给逼近；有些是尽力将不同输入给解纠缠；甚至还可以设计出不同成分尽力正交的。我们是依靠我们对问题的先验（假设我们知道一点点上帝函数的特征）来选择，还是怎么如何选择呢？
论文和代码中，都涉及到grid的扩展，我有两个问题：1)能不能基于loss/gradients来自适应的扩展？2)工程实现上，是否可以实现一个super-grids-network，开始从小的grid数开始使用即可，这样减少网络创建/拷贝/销毁等折腾？

不是issue，就是个人学习中的一些开放的问题。

Low R^2 Scores in KAN Model Regression

I am currently employing the KAN model for a basic engineering regression problem and have observed unexpectedly low R^2 scores, even for the simplest configurations. Given this, I suspect there might be an issue with the implementation in my script.

Would you be able to review my code to identify any potential errors or suggest improvements? I have attached the script for your reference. Any insights or recommendations you could provide would be greatly appreciated as I aim to optimize the model's performance.

@import pandas as pd
import torch
from kan import KAN
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score, explained_variance_score, max_error
import tkinter as tk
from tkinter import filedialog

def load_data():
root = tk.Tk()
root.withdraw() # GUI ekranını gizle
filepath = filedialog.askopenfilename(title="Excel dosyası seçin", filetypes=(("Excel files", ".xlsx .xls"), ("All files", ".")))
root.destroy()
if not filepath:
print("Dosya seçilmedi.")
return None, None
data = pd.read_excel(filepath)
X = data.iloc[:, :-1].values
y = data.iloc[:, -1].values
return X, y

def train_and_evaluate(X, y):
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=42)
X_train = torch.tensor(X_train, dtype=torch.float32)
X_test = torch.tensor(X_test, dtype=torch.float32)
y_train = torch.tensor(y_train, dtype=torch.float32).view(-1, 1)
y_test = torch.tensor(y_test, dtype=torch.float32).view(-1, 1)

grids = [5, 10, 20, 50, 100]
for grid in grids:
    model = KAN(width=[X_train.shape[1], 50, 100, 1], grid=grid, k=3, noise_scale=0.1, seed=0)
    model.train({'train_input': X_train, 'train_label': y_train}, opt='LBFGS', steps=50, lamb=0.01)
    
    y_train_pred = model.predict(X_train)
    y_test_pred = model.predict(X_test)
    
    print(f"Results for Grid Size: {grid}")
    print_metrics(y_train, y_train_pred, "Training")
    print_metrics(y_test, y_test_pred, "Testing")

def print_metrics(true_values, predicted_values, dataset_type):
mse = mean_squared_error(true_values, predicted_values)
mae = mean_absolute_error(true_values, predicted_values)
r2 = r2_score(true_values, predicted_values)
evs = explained_variance_score(true_values, predicted_values)
max_err = max_error(true_values, predicted_values)

print(f"{dataset_type} - Mean Squared Error: {mse}")
print(f"{dataset_type} - Mean Absolute Error: {mae}")
print(f"{dataset_type} - R^2 Score: {r2}")
print(f"{dataset_type} - Explained Variance Score: {evs}")
print(f"{dataset_type} - Max Error: {max_err}")

def main():
X, y = load_data()
if X is not None and y is not None:
train_and_evaluate(X, y)

if name == "main":
main()

	if k == 0:
	value = (x >= grid[:, :-1]) * (x < grid[:, 1:])
	else:
	B_km1 = B_batch(x[:, 0], grid=grid[:, :, 0], k=k - 1, extend=False, device=device)
	value = (x - grid[:, :-(k + 1)]) / (grid[:, k:-1] - grid[:, :-(k + 1)]) * B_km1[:, :-1] + (grid[:, k + 1:] - x) / (grid[:, k + 1:] - grid[:, 1:(-k)]) * B_km1[:, 1:]