Comments (22)
It is easy to use mini-batch. Let me first explain the details of the following code for generating the training points:
@run_if_all_none("train_x", "train_y")
def train_next_batch(self, batch_size=None):
self.train_x = self.train_points()
self.train_x = np.vstack((self.bc_points(), self.train_x))
self.train_y = self.soln(self.train_x) if self.soln else None
return self.train_x, self.train_y
self.train_x = self.train_points()
. This line will generate all the points (including inside domain, on boundary, and anchors) according to the arguments specified inPDE(...)
. The order of these points does not matter. Assume we haveN
points in total.self.train_x = np.vstack((self.bc_points(), self.train_x))
. For each BC/IC, we first pick up the points for this BC/IC from all theN
points in Step 1, and then make a copy of these points and insert them in the beginning. Assume we picked upN1
points for BC1,N2
points for BC2, etc.- Therefore, the training points are organized as follows: the first
N1
points is for BC1, then the followingN2
points for BC2, ..., and finallyN
points for PDE.
Note: Certain points will be repeated more than once, if they are used for both PDE and BCs.
Next, let us see how to implement mini-batch:
- A call to
train_next_batch()
will re-generate all the training points. IfPDE
is initialized withtrain_distribution="random"
(which is the default behavior), a call totrain_next_batch()
will re-generate a new set of random training points. train_next_batch()
is called in each SGD iteration, and thus in each iteration, we will re-generate the whole dataset, which is expensive if the dataset is large. To address this issue, I userun_if_all_none
decorator to check whethertrain_x
isNone
, if it isNone
, we runtrain_next_batch
, otherwise we return directlytrain_x
andtrain_y
. Hence, we only runtrain_next_batch
once to generate the dataset.
So, to use mini-batch,
- The most easiest way is that you just remove
run_if_all_none
decorator, and then in each SGD iteration, different points will be use. - A more efficient way is that we modify
train_next_batch
, such that we re-generate the points (for example) every 10000 iterations. Or we generate a large dataset, and split into several batched, and then loop over them
from deepxde.
It is the num_domain
when you define PDE
or TimePDE
from deepxde.
dde.callbacks.PDEPointResampler
can sample mini-batch for PDEs and BCs.
from deepxde.
If I understood you correctly, the
dde.callbacks.PDEPointResampler
provides mini-batches in the following manner. For example, I have a geometry interval [-1,1], and I need to train my network on thenum_domain = 1KK
points inside this interval to reach the required accuracy. To avoid OOM problems/exceeding the computational limits, I can simply setnum_domain = 1K
anddde.callbacks.PDEPointResampler
will pick random 1K points inside [-1,1] everyperiod
. So finally if I resample 1K enough times, will PINN "learn" a lot of intermediate values inside my [-1,1] interval and, therefore, interpolate just as well as the whole single-batch 1KK dataset?
Yes.
If yes, should I train longer/use smaller periods to let the network learn these intermediate values better?
Probably yes, there are hyperparameters you should tune.
from deepxde.
- I haven't tried multiple GPUs, but it should be not hard. TF supports multiple GPUs, and you can also use Horovod. This may require a small modification of the code.
- Which problem you are solving? For most problems, I believe you don't need so many points. Compared to finite difference/element, we need much less points to achieve the same accuracy. You can also use
residual-based adaptive refinement
method to reduce the number of points, see the paper (https://arxiv.org/abs/1907.04502). - I don't use mini batch, due to the previous reason. But it is very easy to use mini batch. The only thing to modify is the function
train_next_batch(self, batch_size=None)
in the class classPDE
. As you can see, I actually have the argumentbatch_size
. During training, we calltrain_next_batch
for each SGD iteration. Currently, I just ignorebatch_size
, and return the whole dataset. - You are right. We don't have the overiftting issue, so we don't need dropout. Unless you need uncertainty, as shown in this paper (https://www.sciencedirect.com/science/article/pii/S0021999119305340).
from deepxde.
I was just trying to implement the NSFnets paper(March 2020) as part of my learning process. They have used very large number of points in the domain (144,000). The GPU that I am using has 12 GB memory and I have 4 such GPU's. If I use just one then I am get OOM error for more than 40,000 points.
The results I am getting for less points are not satisfactory so I wanted to use more number of points. Also currently I am using using anchors functionality in your code to specify points rather than specifying num_domains and num_boundary and num_initial because I felt that uniform number of points in time and space might lead to lower loss values.
data = dde.data.TimePDE( geomtime, 3, pde, [ic1, ic2, bc1, bc2, bc4, bc5, bc8, bc9, bc10], anchors=points)
The total training loss for me never goes below 10^-2 while in the paper I see the loss is lot smaller (in the range of 10^-4)
I have read the paper you mentioned in point 2. I will try to read the paper mentioned in point 4.
from deepxde.
Unless you are trying to do something very similar to this paper, this paper may not be a good example for learning, because they are solving complex equations.
I am not involved in the NSFnets paper, but here are some details that might be useful for you to repeat their results.
- They use a single GPU with mini batch.
- They use random points in the paper. Generally random points are not worse than uniform points.
- To achieve a good result, some details need to be taken care of, e.g., learning rate scheduling, batch size, etc. All these details should be carefully tuned.
@xiaoweijin Welcome to add more details here!
from deepxde.
@lululxvi Regarding your comment 3 about implementing mini batch. Currently your code is structured in such a way where the position of points in the generated dataset (train_x) determine what error (IC, Dirichlet, Neuman) function would be be used.
So if I specify just batch_size there then the code logic would break probably because in each of the batches based on the position of points losses are getting calculated.
Also if someone is implementing batch_size then they should ideally sample initial, boundary and domain points in each of the batches so that the batch is representative of entire dataset.
Due to these reasons I felt that it would be easier trying to run on multiple GPU's with less code modification compared to implementing mini batch.
Also in pde.py file there is a decorator 'run_if_any_none("train_x", "train_y")'
What exactly is this step doing? Is there any parameter that we are trying to set/initialize here? Thanks for your inputs.
from deepxde.
If you want to use multiple GPUs, I don't have much experience of TensorFlow support of multiple GPUs, but if you use Horovod, then the code can be modified easily (maybe several lines of codes).
from deepxde.
Regarding mini-batch you mentioned 2 points:
-
Remove decorator
run_if_all_none
. This ended up being too slow even for very small number of points. It is more almost similar to running on a GPU -
This seems like a good idea. I will try to implement it.
I also tried using Horovod but it threw up few errors. I believe horovod and its dependencies are installed correctly since hvd.init()
worked perfectly. Will try to google the errors if not will reach out to you.
I also tried to add decay
parameter in compile stage. decay: String. Name of decay to the initial learning rate.
I tried using decay = inverse time
and threw up an error. I looked deeper and found that decay[1], decay[2]
parameters are needed for using this.
Don't you think this part of your code requires some modifications. Thanks again for your inputs.
from deepxde.
For the second approach of mini-batch, make sure that each time you have the same N1
, N2
, etc.
Another useful information is that according to the author of the NSFnets paper, large batch size does not lead to better results.
The docs for decay
is wrong... Here it is:
decay: List. Name and parameters of decay to the initial learning rate. One of the following options:
- inverse time decay: ["inverse time", decay_steps, decay_rate]
- cosine decay: ["cosine", decay_steps, alpha]
from deepxde.
I tried running it on multiple GPU's using Horovod and made the following changes in the model.py
file
import horovod.tensorflow as hvd
# Initialize Horovod
hvd.init()
def train(self,....):
if self.train_state.step == 0:
print("Initializing variables...")
self.sess.run(tf.global_variables_initializer())
###horovod
self.hooks = [hvd.BroadcastGlobalVariablesHook(0)]
...
...
print("Training model...\n")
# Save checkpoints only on worker 0 to prevent other workers from corrupting them.
self.checkpoint_dir = '/tmp/train_logs' if hvd.rank() == 0 else None
def _open_tfsession(self):
tfconfig = tf.ConfigProto()
tfconfig.gpu_options.allow_growth = True
---> tfconfig.gpu_options.visible_device_list = str(hvd.local_rank())
---> self.config = tfconfig
self.sess = tf.Session(config=tfconfig)
self.saver = tf.train.Saver(max_to_keep=None)
self.train_state.set_tfsession(self.sess)
Then I made the following change in train.py
file in get_train_op
function:
with tf.control_dependencies(update_ops):
lr =lr*hvd.size()
optim = _get_optimizer(optimizer, lr)
train_op = hvd.DistributedOptimizer(optim).minimize(loss, global_step=global_step)
return train_op
With only the following changes the code ran on multiple GPU's and all of them were not working in sync and you get the epoch output being printer 4 times for each GPU. I ran the file using following command:
horovodrun -np 4 -H localhost:4 python file.py
I was following the horovod docs which show an example for tf1
After that I made the following change in the model.py
file:
I replaced the following code:
def _train_sgd(self, epochs, display_every, uncertainty):
for i in range(epochs):
self.callbacks.on_epoch_begin()
self.callbacks.on_batch_begin()
self.train_state.set_data_train(
*self.data.train_next_batch(self.batch_size)
)
self.sess.run(self.train_op, feed_dict=self._get_feed_dict(True, True, 0, self.train_state.X_train, self.train_state.y_train),)
with
def _train_sgd(self, epochs, display_every, uncertainty):
for i in range(epochs):
self.callbacks.on_epoch_begin()
self.callbacks.on_batch_begin()
self.train_state.set_data_train(
*self.data.train_next_batch(self.batch_size)
)
with tf.train.MonitoredTrainingSession(checkpoint_dir=self.checkpoint_dir, config=self.config, hooks=self.hooks) as mon_sess:
while not mon_sess.should_stop():
# Perform synchronous training.
mon_sess.run(self.sess.run(self.train_op, feed_dict=self._get_feed_dict(True, True, 0, self.train_state.X_train, self.train_state.y_train ),))
It throws the following error at the MonitoredTrainingSession step above:
"Global step should be created to use StepCounterHook."
Do you have any idea why this happens? Thanks.
from deepxde.
@XuhuiM Any idea?
from deepxde.
@kpratik41 The error could be from MonitoredTrainingSession
. Could you try not to use MonitoredTrainingSession
? See https://horovod.readthedocs.io/en/latest/tensorflow.html.
Another thing is that in different GPUs, you should use different data for PDE/BC, but keeping the size the same. So PDE.train_next_batch()
should return different training points for different GPUs, but keeping the size the same, i.e., N
, N1
, N2
, ... should be the same. See an example here https://github.com/XuhuiM/Distributed-training-Horovod/blob/master/pinn_hvd_data.py. Note, it is not necessary to use all the BC points in all GPUs.
from deepxde.
Thank you very much for your amazing work. Had a few questions.
- I was able to successfully run it on single GPU but it gave me an OOM error when i increased the domain points to a large value. How do I make it run on a multiple GPU (using TF 1.14). Could you provide me some direction regarding this? Have you ever ran this code on multiple GPU's?
How did you manage to run it on a single GPU? Did you have to change the code? Or simply run it on a machine with GPU?
I have a NVIDIA Geforce RTX 2060 SUPER on my machine, however the code is still taking a significant time to run...
I don't notice the change in runtime when I run it on a normal CPU ?
Thanks.
from deepxde.
If you correctly install the GPU-version TF with all the libraries, the code will run on GPU without any change. Check your GPU usage to see if the GPU is being used or not.
from deepxde.
- I haven't tried multiple GPUs, but it should be not hard. TF supports multiple GPUs, and you can also use Horovod. This may require a small modification of the code.
- Which problem you are solving? For most problems, I believe you don't need so many points. Compared to finite difference/element, we need much less points to achieve the same accuracy. You can also use
residual-based adaptive refinement
method to reduce the number of points, see the paper (https://arxiv.org/abs/1907.04502).- I don't use mini batch, due to the previous reason. But it is very easy to use mini batch. The only thing to modify is the function
train_next_batch(self, batch_size=None)
in the class classPDE
. As you can see, I actually have the argumentbatch_size
. During training, we calltrain_next_batch
for each SGD iteration. Currently, I just ignorebatch_size
, and return the whole dataset.- You are right. We don't have the overiftting issue, so we don't need dropout. Unless you need uncertainty, as shown in this paper (https://www.sciencedirect.com/science/article/pii/S0021999119305340).
I came across this discussion in search of how to make my results more accurate. I thought I was applying mini-batch to my code, but after reading this post, I just want to make sure I'm implementing it correctly. Do I still need to modify the function train_next_batch(self, batch_size=None)
in the class class PDE
to be able to use mini-batch? Or has the source code been updated to incorporate mini-batch without doing so (given this discussion is 2 years back).
Currently, I'm just using these lines:
n = 10
activation = f"LAAF-{n} silu" # "LAAF-10 silu"
resampler = dde.callbacks.PDEResidualResampler(period=100)
net = dde.maps.FNN([3] + [32] * 10 + [1], activation, "Glorot normal")
model = dde.Model(data, net)
model.compile("adam", lr=5e-6, loss_weights = [1, 1, 100])
model.train(epochs=20000, batch_size = 32, callbacks=[resampler], display_every=1000)
dde.optimizers.set_LBFGS_options(maxcor=50, ftol=1e-20, maxiter=1e5)
model.compile("L-BFGS-B",loss_weights = [10, 1, 1])#You can set ftol so that L-BFGS will stop once the training loss is smaller than ftol
losshistory, train_state = model.train(epochs = 50000, batch_size = 32)
Is this sufficient or do I need to modify train_next_batch(self, batch_size=None)
?
from deepxde.
The discussion is out of date. PDEResidualResampler
is the correct way to do mini-batch for PDE residual points. @engsbk, you didn't use PDEResidualResampler
in a right way. See example https://deepxde.readthedocs.io/en/latest/demos/pinn_forward/diffusion.1d.resample.html
from deepxde.
The discussion is out of date.
PDEResidualResampler
is the correct way to do mini-batch for PDE residual points. @engsbk, you didn't usePDEResidualResampler
in a right way. See example https://deepxde.readthedocs.io/en/latest/demos/pinn_forward/diffusion.1d.resample.html
Thank you for the clarification @lululxvi !
One more question about the diffusion example:
resampler = dde.callbacks.PDEResidualResampler(period=100)
I understood that the resampling is done every 100 training epochs, correct? If so, then where can I specify the batch size?
from deepxde.
Hello
I am trying to integrate mini-batch training by spliting the datasets in to 2 mini batches . I am well aware of the PDE and Model class and I am pretty sure what I have done is correct (by shuffling the domain and bc points before each epoch and then splitting the bc in order to have the same portion of bc and domain in each mini-batch). Also i keep num_bcs constant in order to use bc.error() correctly. My problem is that I keep stuck in a certain BC in the meaning it's loss is not going down. I would like to ask why PDEresampler is the correct way to go and splitting the dataset is not working (at least in my case)
from deepxde.
It is the
num_domain
when you definePDE
orTimePDE
Hello @lululxvi,
If I understood you correctly, the dde.callbacks.PDEPointResampler
provides mini-batches in the following manner. For example, I have a geometry interval [-1,1], and I need to train my network on the num_domain = 1KK
points inside this interval to reach the required accuracy. To avoid OOM problems/exceeding the computational limits, I can simply set num_domain = 1K
and dde.callbacks.PDEPointResampler
will pick random 1K points inside [-1,1] every period
. So finally if I resample 1K enough times, will PINN "learn" a lot of intermediate values inside my [-1,1] interval and, therefore, interpolate just as well as the whole single-batch 1KK dataset?
If yes, should I train longer/use smaller periods to let the network learn these intermediate values better?
from deepxde.
dde.callbacks.PDEPointResampler
can sample mini-batch for PDEs and BCs.
IMHO mini batch here is misleading. This callback resample the datapoints used for training and feed them all at once into the NN. In the long run it may have the same effect in the gradients as with mini batch training, but this is another discussion!
from deepxde.
Related Issues (20)
- anchors problem (Additional points getting generated by BCs and ICs)
- Problems in calculating the flow around two-dimensional airfoils HOT 3
- Applying forces on geometries HOT 4
- Type mismatch when trying to use L-BFGS HOT 5
- Calculating the NN output in "losses" function of PDE class HOT 3
- Navier Stokes Dynamic surface Boundary Conditions HOT 15
- How to put higher order boundary conditions while solving 4th order time dependent pde (1d) HOT 1
- How to ignore reference solution HOT 1
- Failed to generate irregular domain via geometry.csg.CSGDifference() function. HOT 1
- how exactly does the train_x work? HOT 3
- Higher order derivatives by using dde.grad.jacobian and dde.grad.hessian HOT 7
- 2D Wave Equation With Initial Conditions and Disk Deformity. HOT 4
- Get a very small loss but the results are completely not correct HOT 3
- How to implement learning rate anealing? HOT 2
- AttributeError on auxiliary_var_fn obtaining residuals using PI-DeepONet HOT 3
- Inverse problem for a space and time dependent variable. HOT 3
- Heat conduction equation with contact boundary and output-dependent parameters. HOT 3
- L-BFGS iteration records HOT 2
- How do I train pideepon with my own data? HOT 2
- How to solve inverse problem with parameters in the boundary condition? HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from deepxde.