Tutorial: Solving the Kuramoto–Sivashinsky Equation with Averaging Neural Operator¶

Open In Colab

In this tutorial, we will build a Neural Operator using the AveragingNeuralOperator model and the SupervisedSolver. By the end of this tutorial, you will be able to train a Neural Operator to learn the operator for time-dependent PDEs.

Let's start by importing the necessary modules.

In [1]:
## routine needed to run the notebook on Google Colab
try:
    import google.colab

    IN_COLAB = True
except:
    IN_COLAB = False
if IN_COLAB:
    !pip install "pina-mathlab[tutorial]"
    # get the data
    !mkdir "data"
    !wget "https://github.com/mathLab/PINA/raw/refs/heads/master/tutorials/tutorial10/data/Data_KS.mat" -O "data/Data_KS.mat"
    !wget "https://github.com/mathLab/PINA/raw/refs/heads/master/tutorials/tutorial10/data/Data_KS2.mat" -O "data/Data_KS2.mat"

import torch
import matplotlib.pyplot as plt
import warnings

from scipy import io
from pina import Trainer, LabelTensor
from pina.model import AveragingNeuralOperator
from pina.solver import SupervisedSolver
from pina.problem.zoo import SupervisedProblem

warnings.filterwarnings("ignore")

Data Generation¶

In this tutorial, we will focus on solving the Kuramoto-Sivashinsky (KS) equation, a fourth-order nonlinear PDE. The equation is given by:

$$ \frac{\partial u}{\partial t}(x,t) = -u(x,t)\frac{\partial u}{\partial x}(x,t) - \frac{\partial^{4}u}{\partial x^{4}}(x,t) - \frac{\partial^{2}u}{\partial x^{2}}(x,t). $$

In this equation, $x \in \Omega = [0, 64]$ represents a spatial location, and $t \in \mathbb{T} = [0, 50]$ represents time. The function $u(x, t)$ is the value of the function at each point in space and time, with $u(x, t) \in \mathbb{R}$. We denote the solution space as $\mathbb{U}$, where $u \in \mathbb{U}$.

We impose Dirichlet boundary conditions on the derivative of $u$ at the boundary of the domain $\partial \Omega$:

$$ \frac{\partial u}{\partial x}(x,t) = 0 \quad \forall (x,t) \in \partial \Omega \times \mathbb{T}. $$

The initial conditions are sampled from a distribution over truncated Fourier series with random coefficients $\{A_k, \ell_k, \phi_k\}_k$, as follows:

$$ u(x,0) = \sum_{k=1}^N A_k \sin\left(2 \pi \frac{\ell_k x}{L} + \phi_k\right), $$

where:

  • $A_k \in [-0.4, -0.3]$,
  • $\ell_k = 2$,
  • $\phi_k = 2\pi \quad \forall k=1,\dots,N$.

We have already generated data for different initial conditions. The goal is to build a Neural Operator that, given $u(x,t)$, outputs $u(x,t+\delta)$, where $\delta$ is a fixed time step.

We will cover the Neural Operator architecture later, but for now, let’s start by importing the data.

Note: The numerical integration is obtained using a pseudospectral method for spatial derivative discretization and implicit Runge-Kutta 5 for temporal dynamics.

In [2]:
# load data
data = io.loadmat("data/Data_KS.mat")

# converting to label tensor
initial_cond_train = LabelTensor(
    torch.tensor(data["initial_cond_train"], dtype=torch.float),
    ["t", "x", "u0"],
)
initial_cond_test = LabelTensor(
    torch.tensor(data["initial_cond_test"], dtype=torch.float), ["t", "x", "u0"]
)
sol_train = LabelTensor(
    torch.tensor(data["sol_train"], dtype=torch.float), ["u"]
)
sol_test = LabelTensor(torch.tensor(data["sol_test"], dtype=torch.float), ["u"])

print("Data Loaded")
print(f" shape initial condition: {initial_cond_train.shape}")
print(f" shape solution: {sol_train.shape}")
Data Loaded
 shape initial condition: torch.Size([100, 12800, 3])
 shape solution: torch.Size([100, 12800, 1])

The data is saved in the form [B, N, D], where:

  • B is the batch size (i.e., how many initial conditions we sample),
  • N is the number of points in the mesh (which is the product of the discretization in $x$ times the one in $t$),
  • D is the dimension of the problem (in this case, we have three variables: $[u, t, x]$).

We are now going to plot some trajectories!

In [3]:
# helper function
def plot_trajectory(coords, real, no_sol=None):
    # find the x-t shapes
    dim_x = len(torch.unique(coords.extract("x")))
    dim_t = len(torch.unique(coords.extract("t")))
    # if we don't have the Neural Operator solution we simply plot the real one
    if no_sol is None:
        fig, axs = plt.subplots(1, 1, figsize=(15, 5), sharex=True, sharey=True)
        c = axs.imshow(
            real.reshape(dim_t, dim_x).T.detach(),
            extent=[0, 50, 0, 64],
            cmap="PuOr_r",
            aspect="auto",
        )
        axs.set_title("Real solution")
        fig.colorbar(c, ax=axs)
        axs.set_xlabel("t")
        axs.set_ylabel("x")
    # otherwise we plot the real one, the Neural Operator one, and their difference
    else:
        fig, axs = plt.subplots(1, 3, figsize=(15, 5), sharex=True, sharey=True)
        axs[0].imshow(
            real.reshape(dim_t, dim_x).T.detach(),
            extent=[0, 50, 0, 64],
            cmap="PuOr_r",
            aspect="auto",
        )
        axs[0].set_title("Real solution")
        axs[1].imshow(
            no_sol.reshape(dim_t, dim_x).T.detach(),
            extent=[0, 50, 0, 64],
            cmap="PuOr_r",
            aspect="auto",
        )
        axs[1].set_title("NO solution")
        c = axs[2].imshow(
            (real - no_sol).abs().reshape(dim_t, dim_x).T.detach(),
            extent=[0, 50, 0, 64],
            cmap="PuOr_r",
            aspect="auto",
        )
        axs[2].set_title("Absolute difference")
        fig.colorbar(c, ax=axs.ravel().tolist())
        for ax in axs:
            ax.set_xlabel("t")
            ax.set_ylabel("x")
    plt.show()


# a sample trajectory (we use the sample 5, feel free to change)
sample_number = 20
plot_trajectory(
    coords=initial_cond_train[sample_number].extract(["x", "t"]),
    real=sol_train[sample_number].extract("u"),
)
No description has been provided for this image

As we can see, as time progresses, the solution becomes chaotic, making it very difficult to learn! We will now focus on building a Neural Operator using the SupervisedSolver class to tackle this problem.

Averaging Neural Operator¶

We will build a neural operator $\texttt{NO}$, which takes the solution at time $t=0$ for any $x\in\Omega$, the time $t$ at which we want to compute the solution, and gives back the solution to the KS equation $u(x, t)$. Mathematically:

$$ \texttt{NO}_\theta : \mathbb{U} \rightarrow \mathbb{U}, $$

such that

$$ \texttt{NO}_\theta[u(t=0)](x, t) \rightarrow u(x, t). $$

There are many ways to approximate the following operator, for example, by using a 2D FNO (for regular meshes), a DeepOnet, Continuous Convolutional Neural Operator, or MIONet. In this tutorial, we will use the Averaging Neural Operator presented in The Nonlocal Neural Operator: Universal Approximation, which is a Kernel Neural Operator with an integral kernel:

$$ K(v) = \sigma\left(Wv(x) + b + \frac{1}{|\Omega|}\int_\Omega v(y)dy\right) $$

where:

  • $v(x) \in \mathbb{R}^{\rm{emb}}$ is the update for a function $v$, with $\mathbb{R}^{\rm{emb}}$ being the embedding (hidden) size.
  • $\sigma$ is a non-linear activation function.
  • $W \in \mathbb{R}^{\rm{emb} \times \rm{emb}}$ is a tunable matrix.
  • $b \in \mathbb{R}^{\rm{emb}}$ is a tunable bias.

In PINA, many Kernel Neural Operators are already implemented. The modular components of the Kernel Neural Operator class allow you to create new ones by composing base kernel layers.

Note: We will use the already built class AveragingNeuralOperator. As a constructive exercise, try to use the KernelNeuralOperator class to build a kernel neural operator from scratch. You might employ the different layers that we have in PINA, such as FeedForward and AveragingNeuralOperator layers.

In [4]:
class SIREN(torch.nn.Module):
    def forward(self, x):
        return torch.sin(x)


embedding_dimesion = 40  # hyperparameter embedding dimension
input_dimension = 3  # ['u', 'x', 't']
number_of_coordinates = 2  # ['x', 't']
lifting_net = torch.nn.Linear(input_dimension, embedding_dimesion)
projecting_net = torch.nn.Linear(embedding_dimesion + number_of_coordinates, 1)
model = AveragingNeuralOperator(
    lifting_net=lifting_net,
    projecting_net=projecting_net,
    coordinates_indices=["x", "t"],
    field_indices=["u0"],
    n_layers=4,
    func=SIREN,
)

Super easy! Notice that we use the SIREN activation function, which is discussed in more detail in the paper Implicit Neural Representations with Periodic Activation Functions.

Solving the KS problem¶

We will now focus on solving the KS equation using the SupervisedSolver class and the AveragingNeuralOperator model. As done in the FNO tutorial, we now create the Neural Operator problem class with SupervisedProblem.

In [5]:
# initialize problem
problem = SupervisedProblem(
    initial_cond_train,
    sol_train,
    input_variables=initial_cond_train.labels,
    output_variables=sol_train.labels,
)
# initialize solver
solver = SupervisedSolver(problem=problem, model=model)
# train, only CPU and avoid model summary at beginning of training (optional)
trainer = Trainer(
    solver=solver,
    max_epochs=40,
    accelerator="cpu",
    enable_model_summary=False,
    batch_size=5,  # we train on CPU and avoid model summary at beginning of training (optional)
    train_size=1.0,
    val_size=0.0,
    test_size=0.0,
)
trainer.train()
You are using the plain ModelCheckpoint callback. Consider using LitModelCheckpoint which with seamless uploading to Model registry.
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
`Trainer.fit` stopped: `max_epochs=40` reached.

We can now visualize some plots for the solutions!

In [6]:
sample_number = 2
no_sol = solver(initial_cond_test)
plot_trajectory(
    coords=initial_cond_test[sample_number].extract(["x", "t"]),
    real=sol_test[sample_number].extract("u"),
    no_sol=no_sol[5],
)
No description has been provided for this image

As we can see, we can obtain nice results considering the small training time and the difficulty of the problem! Let's take a look at the training and testing error:

In [7]:
from pina.loss import PowerLoss

error_metric = PowerLoss(p=2)  # we use the MSE loss

with torch.no_grad():
    no_sol_train = solver(initial_cond_train)
    err_train = error_metric(
        sol_train.extract("u"), no_sol_train
    ).mean()  # we average the error over trajectories
    no_sol_test = solver(initial_cond_test)
    err_test = error_metric(
        sol_test.extract("u"), no_sol_test
    ).mean()  # we average the error over trajectories
    print(f"Training error: {float(err_train):.3f}")
    print(f"Testing error: {float(err_test):.3f}")
Training error: 0.077
Testing error: 0.065

As we can see, the error is pretty small, which aligns with the observations from the previous plots.

What's Next?¶

You have completed the tutorial on solving time-dependent PDEs using Neural Operators in PINA. Great job! Here are some potential next steps you can explore:

  1. Train the network for longer or with different layer sizes: Experiment with various configurations, such as adjusting the number of layers or hidden dimensions, to further improve accuracy and observe the impact on performance.

  2. Use a more challenging dataset: Try using the more complex dataset Data_KS2.mat where $A_k \in [-0.5, 0.5]$, $\ell_k \in [1, 2, 3]$, and $\phi_k \in [0, 2\pi]$ for a more difficult task. This dataset may require longer training and testing.

  3. ... and many more...: Explore other models, such as the FNO, DeepOnet, or implement your own operator using the KernelNeuralOperator class to compare performance and find the best model for your task.

For more resources and tutorials, check out the PINA Documentation.