Code discussion,about lululxvi/deepxde

Comments (22)

lululxvi commented on June 1, 2024 3

It is easy to use mini-batch. Let me first explain the details of the following code for generating the training points:

@run_if_all_none("train_x", "train_y")
def train_next_batch(self, batch_size=None):
    self.train_x = self.train_points()
    self.train_x = np.vstack((self.bc_points(), self.train_x))
    self.train_y = self.soln(self.train_x) if self.soln else None
    return self.train_x, self.train_y

self.train_x = self.train_points(). This line will generate all the points (including inside domain, on boundary, and anchors) according to the arguments specified in PDE(...). The order of these points does not matter. Assume we have N points in total.
self.train_x = np.vstack((self.bc_points(), self.train_x)). For each BC/IC, we first pick up the points for this BC/IC from all the N points in Step 1, and then make a copy of these points and insert them in the beginning. Assume we picked up N1 points for BC1, N2 points for BC2, etc.
Therefore, the training points are organized as follows: the first N1 points is for BC1, then the following N2 points for BC2, ..., and finally N points for PDE.

Note: Certain points will be repeated more than once, if they are used for both PDE and BCs.

Next, let us see how to implement mini-batch:

A call to train_next_batch() will re-generate all the training points. If PDE is initialized with train_distribution="random" (which is the default behavior), a call to train_next_batch() will re-generate a new set of random training points.
train_next_batch() is called in each SGD iteration, and thus in each iteration, we will re-generate the whole dataset, which is expensive if the dataset is large. To address this issue, I use run_if_all_none decorator to check whether train_x is None, if it is None, we run train_next_batch, otherwise we return directly train_x and train_y. Hence, we only run train_next_batch once to generate the dataset.

So, to use mini-batch,

The most easiest way is that you just remove run_if_all_none decorator, and then in each SGD iteration, different points will be use.
A more efficient way is that we modify train_next_batch, such that we re-generate the points (for example) every 10000 iterations. Or we generate a large dataset, and split into several batched, and then loop over them

from deepxde.

lululxvi commented on June 1, 2024 1

It is the num_domain when you define PDE or TimePDE

from deepxde.

lululxvi commented on June 1, 2024 1

dde.callbacks.PDEPointResampler can sample mini-batch for PDEs and BCs.

from deepxde.

lululxvi commented on June 1, 2024 1

If I understood you correctly, the dde.callbacks.PDEPointResampler provides mini-batches in the following manner. For example, I have a geometry interval [-1,1], and I need to train my network on the num_domain = 1KK points inside this interval to reach the required accuracy. To avoid OOM problems/exceeding the computational limits, I can simply set num_domain = 1K and dde.callbacks.PDEPointResampler will pick random 1K points inside [-1,1] every period. So finally if I resample 1K enough times, will PINN "learn" a lot of intermediate values inside my [-1,1] interval and, therefore, interpolate just as well as the whole single-batch 1KK dataset?

Yes.

If yes, should I train longer/use smaller periods to let the network learn these intermediate values better?

Probably yes, there are hyperparameters you should tune.

from deepxde.

lululxvi commented on June 1, 2024

I haven't tried multiple GPUs, but it should be not hard. TF supports multiple GPUs, and you can also use Horovod. This may require a small modification of the code.
Which problem you are solving? For most problems, I believe you don't need so many points. Compared to finite difference/element, we need much less points to achieve the same accuracy. You can also use residual-based adaptive refinement method to reduce the number of points, see the paper (https://arxiv.org/abs/1907.04502).
I don't use mini batch, due to the previous reason. But it is very easy to use mini batch. The only thing to modify is the function train_next_batch(self, batch_size=None) in the class class PDE. As you can see, I actually have the argument batch_size. During training, we call train_next_batch for each SGD iteration. Currently, I just ignore batch_size, and return the whole dataset.
You are right. We don't have the overiftting issue, so we don't need dropout. Unless you need uncertainty, as shown in this paper (https://www.sciencedirect.com/science/article/pii/S0021999119305340).

from deepxde.

kpratik41 commented on June 1, 2024

I was just trying to implement the NSFnets paper(March 2020) as part of my learning process. They have used very large number of points in the domain (144,000). The GPU that I am using has 12 GB memory and I have 4 such GPU's. If I use just one then I am get OOM error for more than 40,000 points.

The results I am getting for less points are not satisfactory so I wanted to use more number of points. Also currently I am using using anchors functionality in your code to specify points rather than specifying num_domains and num_boundary and num_initial because I felt that uniform number of points in time and space might lead to lower loss values.

data = dde.data.TimePDE( geomtime, 3, pde, [ic1, ic2, bc1, bc2, bc4, bc5, bc8, bc9, bc10], anchors=points)

The total training loss for me never goes below 10^-2 while in the paper I see the loss is lot smaller (in the range of 10^-4)

I have read the paper you mentioned in point 2. I will try to read the paper mentioned in point 4.

from deepxde.

lululxvi commented on June 1, 2024

Unless you are trying to do something very similar to this paper, this paper may not be a good example for learning, because they are solving complex equations.

I am not involved in the NSFnets paper, but here are some details that might be useful for you to repeat their results.

They use a single GPU with mini batch.
They use random points in the paper. Generally random points are not worse than uniform points.
To achieve a good result, some details need to be taken care of, e.g., learning rate scheduling, batch size, etc. All these details should be carefully tuned.

@xiaoweijin Welcome to add more details here!

from deepxde.

kpratik41 commented on June 1, 2024

@lululxvi Regarding your comment 3 about implementing mini batch. Currently your code is structured in such a way where the position of points in the generated dataset (train_x) determine what error (IC, Dirichlet, Neuman) function would be be used.

So if I specify just batch_size there then the code logic would break probably because in each of the batches based on the position of points losses are getting calculated.

Also if someone is implementing batch_size then they should ideally sample initial, boundary and domain points in each of the batches so that the batch is representative of entire dataset.

Due to these reasons I felt that it would be easier trying to run on multiple GPU's with less code modification compared to implementing mini batch.

Also in pde.py file there is a decorator 'run_if_any_none("train_x", "train_y")'
What exactly is this step doing? Is there any parameter that we are trying to set/initialize here? Thanks for your inputs.

from deepxde.

lululxvi commented on June 1, 2024

If you want to use multiple GPUs, I don't have much experience of TensorFlow support of multiple GPUs, but if you use Horovod, then the code can be modified easily (maybe several lines of codes).

from deepxde.

kpratik41 commented on June 1, 2024

Regarding mini-batch you mentioned 2 points:

Remove decorator run_if_all_none. This ended up being too slow even for very small number of points. It is more almost similar to running on a GPU
This seems like a good idea. I will try to implement it.

I also tried using Horovod but it threw up few errors. I believe horovod and its dependencies are installed correctly since hvd.init() worked perfectly. Will try to google the errors if not will reach out to you.

I also tried to add decay parameter in compile stage. decay: String. Name of decay to the initial learning rate. I tried using decay = inverse time and threw up an error. I looked deeper and found that decay[1], decay[2] parameters are needed for using this.

Don't you think this part of your code requires some modifications. Thanks again for your inputs.

from deepxde.

lululxvi commented on June 1, 2024

For the second approach of mini-batch, make sure that each time you have the same N1, N2, etc.

Another useful information is that according to the author of the NSFnets paper, large batch size does not lead to better results.

The docs for decay is wrong... Here it is:

decay: List. Name and parameters of decay to the initial learning rate. One of the following options:
- inverse time decay: ["inverse time", decay_steps, decay_rate]
- cosine decay: ["cosine", decay_steps, alpha]

from deepxde.

kpratik41 commented on June 1, 2024

I tried running it on multiple GPU's using Horovod and made the following changes in the model.py file

import horovod.tensorflow as hvd

# Initialize Horovod
hvd.init()

def train(self,....):
        if self.train_state.step == 0:
            print("Initializing variables...")
            self.sess.run(tf.global_variables_initializer())
            ###horovod
            self.hooks = [hvd.BroadcastGlobalVariablesHook(0)]
         ...
         ...
        print("Training model...\n")
        # Save checkpoints only on worker 0 to prevent other workers from corrupting them.
        self.checkpoint_dir = '/tmp/train_logs' if hvd.rank() == 0 else None

    def _open_tfsession(self):
        tfconfig = tf.ConfigProto()
        tfconfig.gpu_options.allow_growth = True
--->    tfconfig.gpu_options.visible_device_list = str(hvd.local_rank())
--->    self.config = tfconfig
        self.sess = tf.Session(config=tfconfig)
        self.saver = tf.train.Saver(max_to_keep=None)
        self.train_state.set_tfsession(self.sess)

Then I made the following change in train.py file in get_train_op function:

    with tf.control_dependencies(update_ops):
        lr =lr*hvd.size()
        optim = _get_optimizer(optimizer, lr)
        train_op = hvd.DistributedOptimizer(optim).minimize(loss, global_step=global_step)
    return train_op

With only the following changes the code ran on multiple GPU's and all of them were not working in sync and you get the epoch output being printer 4 times for each GPU. I ran the file using following command:

horovodrun -np 4 -H localhost:4 python file.py

I was following the horovod docs which show an example for tf1

After that I made the following change in the model.py file:

I replaced the following code:

    def _train_sgd(self, epochs, display_every, uncertainty):
        for i in range(epochs):
            self.callbacks.on_epoch_begin()
            self.callbacks.on_batch_begin()

            self.train_state.set_data_train(
                *self.data.train_next_batch(self.batch_size)
            )

            self.sess.run(self.train_op, feed_dict=self._get_feed_dict(True, True, 0, self.train_state.X_train, self.train_state.y_train),)

with

    def _train_sgd(self, epochs, display_every, uncertainty):
        for i in range(epochs):
            self.callbacks.on_epoch_begin()
            self.callbacks.on_batch_begin()

            self.train_state.set_data_train(
                *self.data.train_next_batch(self.batch_size)
            )


            with tf.train.MonitoredTrainingSession(checkpoint_dir=self.checkpoint_dir,  config=self.config,  hooks=self.hooks) as mon_sess:
                while not mon_sess.should_stop():
                    # Perform synchronous training.
                    mon_sess.run(self.sess.run(self.train_op, feed_dict=self._get_feed_dict(True, True, 0, self.train_state.X_train, self.train_state.y_train ),))

It throws the following error at the MonitoredTrainingSession step above:

"Global step should be created to use StepCounterHook."

Do you have any idea why this happens? Thanks.

from deepxde.

lululxvi commented on June 1, 2024

@XuhuiM Any idea?

from deepxde.

lululxvi commented on June 1, 2024

@kpratik41 The error could be from MonitoredTrainingSession. Could you try not to use MonitoredTrainingSession? See https://horovod.readthedocs.io/en/latest/tensorflow.html.

Another thing is that in different GPUs, you should use different data for PDE/BC, but keeping the size the same. So PDE.train_next_batch() should return different training points for different GPUs, but keeping the size the same, i.e., N, N1, N2, ... should be the same. See an example here https://github.com/XuhuiM/Distributed-training-Horovod/blob/master/pinn_hvd_data.py. Note, it is not necessary to use all the BC points in all GPUs.

from deepxde.

engsbk commented on June 1, 2024

Thank you very much for your amazing work. Had a few questions.

I was able to successfully run it on single GPU but it gave me an OOM error when i increased the domain points to a large value. How do I make it run on a multiple GPU (using TF 1.14). Could you provide me some direction regarding this? Have you ever ran this code on multiple GPU's?

How did you manage to run it on a single GPU? Did you have to change the code? Or simply run it on a machine with GPU?
I have a NVIDIA Geforce RTX 2060 SUPER on my machine, however the code is still taking a significant time to run...
I don't notice the change in runtime when I run it on a normal CPU ?

Thanks.

from deepxde.

lululxvi commented on June 1, 2024

If you correctly install the GPU-version TF with all the libraries, the code will run on GPU without any change. Check your GPU usage to see if the GPU is being used or not.

from deepxde.

engsbk commented on June 1, 2024

I haven't tried multiple GPUs, but it should be not hard. TF supports multiple GPUs, and you can also use Horovod. This may require a small modification of the code.

Which problem you are solving? For most problems, I believe you don't need so many points. Compared to finite difference/element, we need much less points to achieve the same accuracy. You can also use residual-based adaptive refinement method to reduce the number of points, see the paper (https://arxiv.org/abs/1907.04502).

I don't use mini batch, due to the previous reason. But it is very easy to use mini batch. The only thing to modify is the function train_next_batch(self, batch_size=None) in the class class PDE. As you can see, I actually have the argument batch_size. During training, we call train_next_batch for each SGD iteration. Currently, I just ignore batch_size, and return the whole dataset.

You are right. We don't have the overiftting issue, so we don't need dropout. Unless you need uncertainty, as shown in this paper (https://www.sciencedirect.com/science/article/pii/S0021999119305340).

I came across this discussion in search of how to make my results more accurate. I thought I was applying mini-batch to my code, but after reading this post, I just want to make sure I'm implementing it correctly. Do I still need to modify the function train_next_batch(self, batch_size=None) in the class class PDE to be able to use mini-batch? Or has the source code been updated to incorporate mini-batch without doing so (given this discussion is 2 years back).

Currently, I'm just using these lines:

n = 10
activation = f"LAAF-{n} silu"  # "LAAF-10 silu"

resampler = dde.callbacks.PDEResidualResampler(period=100)
net = dde.maps.FNN([3] + [32] * 10 + [1], activation, "Glorot normal")
model = dde.Model(data, net)
model.compile("adam", lr=5e-6, loss_weights = [1, 1, 100])
model.train(epochs=20000, batch_size = 32, callbacks=[resampler], display_every=1000)
dde.optimizers.set_LBFGS_options(maxcor=50, ftol=1e-20, maxiter=1e5)
model.compile("L-BFGS-B",loss_weights = [10, 1, 1])#You can set ftol so that L-BFGS will stop once the training loss is smaller than ftol
losshistory, train_state = model.train(epochs = 50000, batch_size = 32)

Is this sufficient or do I need to modify train_next_batch(self, batch_size=None)?

from deepxde.

lululxvi commented on June 1, 2024

The discussion is out of date. PDEResidualResampler is the correct way to do mini-batch for PDE residual points. @engsbk, you didn't use PDEResidualResampler in a right way. See example https://deepxde.readthedocs.io/en/latest/demos/pinn_forward/diffusion.1d.resample.html

from deepxde.

engsbk commented on June 1, 2024

The discussion is out of date. PDEResidualResampler is the correct way to do mini-batch for PDE residual points. @engsbk, you didn't use PDEResidualResampler in a right way. See example https://deepxde.readthedocs.io/en/latest/demos/pinn_forward/diffusion.1d.resample.html

Thank you for the clarification @lululxvi !
One more question about the diffusion example:
resampler = dde.callbacks.PDEResidualResampler(period=100)
I understood that the resampling is done every 100 training epochs, correct? If so, then where can I specify the batch size?

from deepxde.

nkoudounas commented on June 1, 2024

Hello

I am trying to integrate mini-batch training by spliting the datasets in to 2 mini batches . I am well aware of the PDE and Model class and I am pretty sure what I have done is correct (by shuffling the domain and bc points before each epoch and then splitting the bc in order to have the same portion of bc and domain in each mini-batch). Also i keep num_bcs constant in order to use bc.error() correctly. My problem is that I keep stuck in a certain BC in the meaning it's loss is not going down. I would like to ask why PDEresampler is the correct way to go and splitting the dataset is not working (at least in my case)

from deepxde.

alabaykazakh commented on June 1, 2024

It is the num_domain when you define PDE or TimePDE

Hello @lululxvi,

If I understood you correctly, the dde.callbacks.PDEPointResampler provides mini-batches in the following manner. For example, I have a geometry interval [-1,1], and I need to train my network on the num_domain = 1KK points inside this interval to reach the required accuracy. To avoid OOM problems/exceeding the computational limits, I can simply set num_domain = 1K and dde.callbacks.PDEPointResampler will pick random 1K points inside [-1,1] every period. So finally if I resample 1K enough times, will PINN "learn" a lot of intermediate values inside my [-1,1] interval and, therefore, interpolate just as well as the whole single-batch 1KK dataset?

If yes, should I train longer/use smaller periods to let the network learn these intermediate values better?

from deepxde.

nkoudounas commented on June 1, 2024

dde.callbacks.PDEPointResampler can sample mini-batch for PDEs and BCs.

IMHO mini batch here is misleading. This callback resample the datapoints used for training and feed them all at once into the NN. In the long run it may have the same effect in the gradients as with mini batch training, but this is another discussion!

from deepxde.

Code discussion about deepxde HOT 22 CLOSED

Comments (22)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent