Comments (8)
Do we know where we use the google.protobuf? Is it specific to
l5kit
,zarr
, or, it was used bypytorch
?
it's specific to l5kit, we use it for the semantic map
from l5kit.
The downside is that each worker process will try to load its copy of the entire pytorch library, which somehow takes like 4GB of commit memory. During training, each worker can take upto 7GB of commit memory. So you need to turn on virtual memory for the buffer.
I see. I guess there is no "one solution for them all" in this case :)
from l5kit.
Hi @RocketFlash
DistributedDataParallel
has never been tested with L5Kit, so I'm not surprise that it's not working. I'll try to take a look into it!
from l5kit.
I got the same error when running on Windows. I didn't use DistributedDataParallel but just simply follow the example notebook with num_workers > 0
.
from l5kit.
I got the same error when running on Windows
Same version of python (3.8)?
from l5kit.
I got the same error when running on Windows
Same version of python (3.8)?
Oh, I am using python 3.7. Is that why?
from l5kit.
FYI, This is the kind of error that I got. I am not using DistributedDataParallel.
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<timed exec> in <module>
c:\users\louis\appdata\local\programs\python\python37\lib\site-packages\torch\utils\data\dataloader.py in __iter__(self)
289 return _SingleProcessDataLoaderIter(self)
290 else:
--> 291 return _MultiProcessingDataLoaderIter(self)
292
293 @property
c:\users\louis\appdata\local\programs\python\python37\lib\site-packages\torch\utils\data\dataloader.py in __init__(self, loader)
735 # before it starts, and __del__ tries to join but will get:
736 # AssertionError: can only join a started process.
--> 737 w.start()
738 self._index_queues.append(index_queue)
739 self._workers.append(w)
c:\users\louis\appdata\local\programs\python\python37\lib\multiprocessing\process.py in start(self)
110 'daemonic processes are not allowed to have children'
111 _cleanup()
--> 112 self._popen = self._Popen(self)
113 self._sentinel = self._popen.sentinel
114 # Avoid a refcycle if the target function holds an indirect
c:\users\louis\appdata\local\programs\python\python37\lib\multiprocessing\context.py in _Popen(process_obj)
221 @staticmethod
222 def _Popen(process_obj):
--> 223 return _default_context.get_context().Process._Popen(process_obj)
224
225 class DefaultContext(BaseContext):
c:\users\louis\appdata\local\programs\python\python37\lib\multiprocessing\context.py in _Popen(process_obj)
320 def _Popen(process_obj):
321 from .popen_spawn_win32 import Popen
--> 322 return Popen(process_obj)
323
324 class SpawnContext(BaseContext):
c:\users\louis\appdata\local\programs\python\python37\lib\multiprocessing\popen_spawn_win32.py in __init__(self, process_obj)
87 try:
88 reduction.dump(prep_data, to_child)
---> 89 reduction.dump(process_obj, to_child)
90 finally:
91 set_spawning_popen(None)
c:\users\louis\appdata\local\programs\python\python37\lib\multiprocessing\reduction.py in dump(obj, file, protocol)
58 def dump(obj, file, protocol=None):
59 '''Replacement for pickle.dump() using ForkingPickler.'''
---> 60 ForkingPickler(file, protocol).dump(obj)
61
62 #
TypeError: can't pickle google.protobuf.pyext._message.RepeatedCompositeContainer objects
Do we know where we use the google.protobuf? Is it specific to l5kit
, zarr
, or, it was used by pytorch
?
from l5kit.
FYI, I have a workaround, which is wrapping around the Dataset into another class that only construct the dataset and the rasterizer when the object has been loaded to the workers. My code for doing that in a mydataset.py
script:
class MyTrainDataset:
def __init__(self, cfg, dm):
self.cfg = cfg
self.dm = dm
def initialize(self, worker_id):
print('initialize called with worker_id', worker_id)
from l5kit.data import ChunkedDataset
from l5kit.dataset import AgentDataset #, EgoDataset
from l5kit.rasterization import build_rasterizer
rasterizer = build_rasterizer(self.cfg, self.dm)
train_cfg = self.cfg["train_data_loader"]
train_zarr = ChunkedDataset(self.dm.require(train_cfg["key"])).open(cached=False) # try to turn off cache
self.dataset = AgentDataset(self.cfg, train_zarr, rasterizer)
def __len__(self):
# NOTE: You have to figure out the actual length beforehand since once the rasterizer and/or AgentDataset been
# constructed, you cannot pickle it anymore! So we can't compute the size from the real dataset. However,
# DataLoader require the len to determine the sampling.
return 22496709
def __getitem__(self, index):
return self.dataset[index]
from torch.utils.data import get_worker_info
def my_dataset_worker_init_func(worker_id):
worker_info = get_worker_info()
dataset = worker_info.dataset
dataset.initialize(worker_id)
Then you can load it in the training jupyter notebook as
from mydataset import MyTrainDataset, my_dataset_worker_init_func
train_dataset = MyTrainDataset(cfg, dm)
train_dataloader = DataLoader(
train_dataset,
shuffle=True,
batch_size=16,
num_workers=2,
persistent_workers=True,
worker_init_fn=my_dataset_worker_init_func,
)
tr_it = iter(train_dataloader)
The downside is that each worker process will try to load its copy of the entire pytorch library, which somehow takes like 4GB of commit memory. During training, each worker can take upto 7GB of commit memory. So you need to turn on virtual memory for the buffer.
from l5kit.
Related Issues (20)
- How to visualize ego prediction of agents track?
- How to use trained model with live data and own map?
- Highway scenes
- In windows anaconda An error occurs when the sim_outs = sim_loop.unroll(scenes_to_unroll) statement is executed in SimNet ReferenceError: weakly-referenced object no longer exists HOT 2
- Consulting the meanings of several data fields.
- ReferenceError: weakly-referenced object no longer exists HOT 2
- Question on Traffic light face
- Error in the examples/visualisation/visualise_data.ipynb HOT 1
- SafetyNet source code HOT 1
- cannot find .zarr files HOT 2
- FileNotFoundError: [Errno 2] No such file or directory: './agent_motion_config.yaml' HOT 1
- l5kit on mac m1 chip HOT 1
- ReferenceError: weakly-referenced object no longer exists HOT 1
- prediction model
- Example perception solution
- How can I transform the vehicle centroid coord to google map coord? prediction dataset HOT 1
- AgentDataset(cfg, zarr_dataset, rast) Low number of returns
- About the raw data of this dataset HOT 1
- Conner of the download of dataset HOT 1
- THIS PROJECT WAS DISCONTINUED AND IT IS NOT ACTIVELY MAINTAINED
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from l5kit.