Comments (4)
I get the same error after updating and I simply tried a re-run of a session I just did which worked fine...
from everydream-trainer.
is happening the same to me, without .txt files in runpod
Sanity Checking DataLoader 0: 0%| | 0/2 [00:00<?, ?it/s]/opt/conda/lib/python3.7/site-packages/pytorch_lightning/utilities/data.py:73: UserWarning: Trying to infer the batch_size
from an ambiguous collection. The batch size we found is 5. To avoid any miscalculations, use self.log(..., batch_size=batch_size)
.
"Trying to infer the batch_size
from an ambiguous collection. The batch size we"
Epoch 0: 0%| | 0/676 [00:00<?, ?it/s]/opt/conda/lib/python3.7/site-packages/pytorch_lightning/trainer/connectors/logger_connector/result.py:231: UserWarning: You called self.log('global_step', ...)
in your training_step
but the value needs to be floating point. Converting it to torch.float32.
f"You called self.log({self.meta.name!r}, ...)
in your {self.meta.fx}
but the value needs to"
Epoch 0: 5%| | 32/676 [01:12<24:27, 2.28s/it, loss=0.151, v_num=0, train/lossTraining halted. Summoning checkpoint as last.ckpt
Traceback (most recent call last):
File "main.py", line 754, in
trainer.fit(model, data)
File "/opt/conda/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 771, in fit
self._fit_impl, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path
File "/opt/conda/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 723, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 811, in _fit_impl
results = self._run(model, ckpt_path=self.ckpt_path)
File "/opt/conda/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1236, in _run
results = self._run_stage()
File "/opt/conda/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1323, in _run_stage
return self._run_train()
File "/opt/conda/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1353, in _run_train
self.fit_loop.run()
File "/opt/conda/lib/python3.7/site-packages/pytorch_lightning/loops/base.py", line 204, in run
self.advance(*args, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/pytorch_lightning/loops/fit_loop.py", line 266, in advance
self._outputs = self.epoch_loop.run(self._data_fetcher)
File "/opt/conda/lib/python3.7/site-packages/pytorch_lightning/loops/base.py", line 204, in run
self.advance(*args, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 171, in advance
batch = next(data_fetcher)
File "/opt/conda/lib/python3.7/site-packages/pytorch_lightning/utilities/fetching.py", line 184, in next
return self.fetching_function()
File "/opt/conda/lib/python3.7/site-packages/pytorch_lightning/utilities/fetching.py", line 259, in fetching_function
self._fetch_next_batch(self.dataloader_iter)
File "/opt/conda/lib/python3.7/site-packages/pytorch_lightning/utilities/fetching.py", line 273, in _fetch_next_batch
batch = next(iterator)
File "/opt/conda/lib/python3.7/site-packages/pytorch_lightning/trainer/supporters.py", line 558, in next
return self.request_next_batch(self.loader_iters)
File "/opt/conda/lib/python3.7/site-packages/pytorch_lightning/trainer/supporters.py", line 570, in request_next_batch
return apply_to_collection(loader_iters, Iterator, next)
File "/opt/conda/lib/python3.7/site-packages/pytorch_lightning/utilities/apply_func.py", line 99, in apply_to_collection
return function(data, *args, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 652, in next
data = self._next_data()
File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1347, in _next_data
return self._process_data(data)
File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1373, in _process_data
data.reraise()
File "/opt/conda/lib/python3.7/site-packages/torch/_utils.py", line 461, in reraise
raise exception
AttributeError: Caught AttributeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
data = fetcher.fetch(index)
File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 49, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/workspace/everydream-trainer/main.py", line 193, in getitem
return self.data[idx]
File "/workspace/everydream-trainer/ldm/data/every_dream.py", line 70, in getitem
del self.image_train_items[j].image
AttributeError: image
from everydream-trainer.
fixed with the last commit, thanks
from everydream-trainer.
Yeah hotfix went out for that, should be g2g. Open a new issue if it happens again.
from everydream-trainer.
Related Issues (16)
- Running fine and training works but very slow saving of checkpoints HOT 1
- Please add option 'Do not resize' HOT 1
- Does the Micro mode support multiple aspect ratios? HOT 1
- ckpt file not saving when training has finished. HOT 2
- What needs to be done to support 2.0 HOT 2
- error when trying to train a model HOT 4
- Allow pruning script to prune with float32 instead if float16 HOT 4
- How to train a model for 2 or more people? HOT 14
- Sample generated images are always identical HOT 1
- Invalid load key
- Runpod notebook: 'str2optimizer8bit_blockwise' is not defined HOT 2
- 'Trainer' object has no attribute 'strategy' HOT 1
- RuntimeError on 3090Ti HOT 4
- Running out of Memory on a 3090 HOT 3
- Running but no checkpoints saved HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from everydream-trainer.