Comments (13)
Crash happened when running this cell:
Random performance without fine-tuning
get_accuracy(params_repl)
from vision_transformer.
Hi Tyler
Do you run out of CPU RAM or GPU/TPU RAM?
Also, how much RAM do you have?
(You can check with !free -mh
)
The provided Colab works fine with CIFAR datasets and the default settings (default Colab currently has 12G of RAM).
from vision_transformer.
I cannot run the code on colab and I have the same problem when I try to use colab which crashed the blocks.
from vision_transformer.
I just checked, the Colab runs fine (at least up to the "Fine-tune" section that is below the "Random performance without fine-tuning." comment that you point out above).
from vision_transformer.
I find an error at the last two steps of Fine Tune:
The world's simplest training loop.
Completes in ~20 min on the TPU runtime.
for step, batch, lr_repl in zip(
tqdm.notebook.trange(1, total_steps + 1),
ds_train.as_numpy_iterator(),
lr_iter
):
opt_repl, loss_repl, update_rngs = update_fn_repl(
opt_repl, lr_repl, batch, update_rngs)
Thank you very much! I'm looking forward to your reply.
from vision_transformer.
Hi Tyler,
Crash happened when running this cell:
Random performance without fine-tuning
get_accuracy(params_repl)
Did you run
TPU setup : Boilerplate for connecting JAX to TPU
?
If you double click this cell, you will find there are multiple codes there:
#@markdown TPU setup : Boilerplate for connecting JAX to TPU.
import os
if 'google.colab' in str(get_ipython()) and 'COLAB_TPU_ADDR' in os.environ:
# Make sure the Colab Runtime is set to Accelerator: TPU.
import requests
if 'TPU_DRIVER_MODE' not in globals():
url = 'http://' + os.environ['COLAB_TPU_ADDR'].split(':')[0] + ':8475/requestversion/tpu_driver0.1-dev20191206'
resp = requests.post(url)
TPU_DRIVER_MODE = 1
# The following is required to use TPU Driver as JAX's backend.
from jax.config import config
config.FLAGS.jax_xla_backend = "tpu_driver"
config.FLAGS.jax_backend_target = "grpc://" + os.environ['COLAB_TPU_ADDR']
print('Registered TPU:', config.FLAGS.jax_backend_target)
else:
print('No TPU detected. Can be changed under "Runtime/Change runtime type".')
I think these codes will register your TPU.
from vision_transformer.
I find an error at the last two steps of Fine Tune:
The world's simplest training loop.
Completes in ~20 min on the TPU runtime.
for step, batch, lr_repl in zip(
tqdm.notebook.trange(1, total_steps + 1),
ds_train.as_numpy_iterator(),
lr_iter
):opt_repl, loss_repl, update_rngs = update_fn_repl(
opt_repl, lr_repl, batch, update_rngs)Thank you very much! I'm looking forward to your reply.
To be specific, import flax.optim as optim
from vision_transformer.
Thanks for your reply. That does work!
Did you know why the official optimizer cannot work?
I would like to ask whether you use this code on multi-host tpu (such as v3-32, v3-64).
Thank you very much!
from vision_transformer.
I just re-ran the code in the provided Colab on TPU and it worked without problem.
@Tylersuard can you try again?
(I think it was some temporary regression in the Colab setup and/or JAX code that was independent from the code in this repo)
from vision_transformer.
@andsteing I just tried again, same issue:
The world's simplest training loop.
Completes in ~20 min on the TPU runtime.
for step, batch, lr_repl in zip(
tqdm.notebook.trange(1, total_steps + 1),
ds_train.as_numpy_iterator(),
lr_iter
):
opt_repl, loss_repl, update_rngs = update_fn_repl(
opt_repl, lr_repl, batch, update_rngs)
Response:
Your session crashed after using all available RAM.
Log:
Apr 21, 2021, 6:52:28 AM | WARNING | WARNING:root:kernel 0cb38d64-9225-470e-badb-c668f208fe42 restarted |
---|---|---|
Apr 21, 2021, 6:52:28 AM | INFO | KernelRestarter: restarting kernel (1/5), keep random ports |
Apr 21, 2021, 6:27:14 AM | WARNING | 2021-04-21 13:27:14.614322: W tensorflow/core/kernels/data/cache_dataset_ops.cc:798] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset will be discarded. This can happen if you have an input pipeline similar to dataset.cache().take(k).repeat() . You should use dataset.take(k).cache().repeat() instead. |
Apr 21, 2021, 6:23:38 AM | INFO | Adapting to protocol v5.1 for kernel 0cb38d64-9225-470e-badb-c668f208fe42 |
Apr 21, 2021, 6:23:37 AM | INFO | Kernel started: 0cb38d64-9225-470e-badb-c668f208fe42 |
Apr 21, 2021, 6:23:29 AM | INFO | Use Control-C to stop this server and shut down all kernels (twice to skip confirmation). |
from vision_transformer.
Just to confirm:
- You're loading this notebook here : https://colab.sandbox.google.com/github/google-research/vision_transformer/blob/master/vit_jax.ipynb
- You're using a Runtime type = TPU kernel
- You start with a fresh kernel.
- You run end-to-end without modifications.
Because I tried multiple times and was not able to reproduce your error.
from vision_transformer.
I used a slightly different link, the one shown in this repo's readme:
from vision_transformer.
@andsteing Ok, I got it to work. For some reason, it does not work with the "high-ram" instance option, but it does work with the regular option. Thank you for your help.
from vision_transformer.
Related Issues (20)
- ModuleNotFoundError: No module named 'aqt' HOT 4
- Problem with kmnist dataset HOT 1
- Fine-Tuning HOT 3
- ERROR: Could not find a version that satisfies the requirement tensorflow_text (from vit-jax) (from versions: none) ERROR: No matching distribution found for tensorflow_text HOT 1
- If the weights of vit-base trained with dropout available? HOT 1
- How do I download the vit_base_patch8_384.pth
- Package versions' confliction [Windows] HOT 2
- Question about commercial usage of LiT model checkpoints
- Shouldn't accumulate_gradient pass rng_key?
- GPU Requirement to use vision transformer HOT 1
- flax.errors.CallCompactUnboundModuleError
- ViT
- can export the pretrained model to onnx or pytorch? HOT 2
- Vision transformer
- Import error in Jax (colab) HOT 3
- fine-tune imagenet21k_ViT-B_16.npz with pre_logits? HOT 2
- KeyError: 'embedding/kernel is not a file in the archive' HOT 1
- Hyperparameter issues HOT 1
- All attempts to get a Google authentication bearer token failed, returning an empty token.
- Where is ViT-22B?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from vision_transformer.