theovercomer8 / captionr Goto Github PK
View Code? Open in Web Editor NEWGIT/BLIP/CLIP Caption tool
License: MIT License
GIT/BLIP/CLIP Caption tool
License: MIT License
This issue on the LAVIS repository salesforce/LAVIS#118 shows someone who managed to fit BLIP2 onto his 24 GB VRAM card. Would it be possible to apply that to this repo?
Using the --output folder generates a path error in Linux. (Not sure if this is the same in Windows?)
Traceback (most recent call last):
File "/home/machinelearning/tools/captionr/captionr/captionr_class.py", line 261, in process_img
with open(outputfilename, "w", encoding="utf8") as file:
FileNotFoundError: [Errno 2] No such file or directory: "[PosixPath('/home/machinelearning/datasets/processing/output_folder')]/image_123456.txt"
I got it working by changing the line.
else:
if isinstance(config.output, pathlib.PosixPath):
dirname = str(config.output)
Not sure if that is a good fix though?
INFO:root:Loaded ViT-H-14 model config.
INFO:root:Loading pretrained ViT-H-14 weights (laion2b_s32b_b79k).
ERROR:root:Error loading cached table artists: invalid load key, 'E'.
and then process stops
Preprocessing artists: 0%| | 0/10 [00:00<?, ?it/s]
Not sure why, but on quite a few of my zips I get a " An error occurred while downloading the file: A UTF-8 locale is required. Got ANSI_X3.4-1968 " This is bypassed and works fine by doing a unzip -j /ziplocation -d /content/dataset and going directly to caption cell.
Hey! Thanks for your hard work creating captionr. Unfortunately, I'm seeing this error on both Windows native and WSL. Any help is appreciated!
Python version is 3.8.10.
Command:
python captionr.py /mnt/d/model_training/deltron/images/500px/people --blip2_question_file /mnt/d/model_training/deltron/captions/blip2/question.txt --prepend_text "a photo of " --existing=skip --cap_length=75 --blip_pass --use_blip2 --blip2_model blip2_opt/pretrain_opt6.7b --clip_model_name=ViT-L-14/openai --uniquify_tags --device=cuda --extension=txt
Exception:
ERROR:root:Exception during BLIP captioning
Traceback (most recent call last):
File "/mnt/d/model_training/code/captionr/captionr/captionr_class.py", line 139, in process_img
new_caption = config._blip.caption(img)
File "/mnt/d/model_training/code/captionr/captionr/blip2_cap.py", line 22, in caption
return self.model.generate({"image": image})[0]
File "/mnt/d/model_training/code/captionr/venv/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/mnt/d/model_training/code/captionr/venv/lib/python3.8/site-packages/lavis/models/blip2_models/blip2_opt.py", line 213, in generate
outputs = self.opt_model.generate(
File "/mnt/d/model_training/code/captionr/venv/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/mnt/d/model_training/code/captionr/venv/lib/python3.8/site-packages/transformers/generation/utils.py", line 1490, in generate
return self.beam_search(
File "/mnt/d/model_training/code/captionr/venv/lib/python3.8/site-packages/transformers/generation/utils.py", line 2749, in beam_search
outputs = self(
File "/mnt/d/model_training/code/captionr/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/mnt/d/model_training/code/captionr/venv/lib/python3.8/site-packages/lavis/models/blip2_models/modeling_opt.py", line 1037, in forward
outputs = self.model.decoder(
File "/mnt/d/model_training/code/captionr/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/mnt/d/model_training/code/captionr/venv/lib/python3.8/site-packages/lavis/models/blip2_models/modeling_opt.py", line 703, in forward
inputs_embeds = torch.cat([query_embeds, inputs_embeds], dim=1)
RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 25 but got size 5 for tensor number 1 in the list.
Pip packages:
`Package Version
altair 4.2.2
antlr4-python3-runtime 4.9.3
asttokens 2.2.1
attrs 22.2.0
backcall 0.2.0
backports.zoneinfo 0.2.1
blinker 1.5
blip-vit 0.0.3
blis 0.7.9
braceexpand 0.1.7
cachetools 5.3.0
catalogue 2.0.8
certifi 2022.12.7
cfgv 3.3.1
charset-normalizer 3.1.0
click 8.1.3
cmake 3.26.0
confection 0.0.4
contexttimer 0.3.3
contourpy 1.0.7
cycler 0.11.0
cymem 2.0.7
decorator 5.1.1
decord 0.6.0
distlib 0.3.6
einops 0.6.0
entrypoints 0.4
executing 1.2.0
fairscale 0.4.4
filelock 3.10.0
fonttools 4.39.2
ftfy 6.1.1
gitdb 4.0.10
GitPython 3.1.31
huggingface-hub 0.13.2
identify 2.5.21
idna 3.4
imageio 2.26.0
importlib-metadata 6.0.0
importlib-resources 5.12.0
iopath 0.1.10
ipython 8.11.0
jedi 0.18.2
Jinja2 3.1.2
jsonschema 4.17.3
kaggle 1.5.13
kiwisolver 1.4.4
langcodes 3.3.0
lazy-loader 0.1
Levenshtein 0.20.9
lit 15.0.7
markdown-it-py 2.2.0
MarkupSafe 2.1.2
matplotlib 3.7.1
matplotlib-inline 0.1.6
mdurl 0.1.2
mpmath 1.3.0
murmurhash 1.0.9
networkx 3.0
nodeenv 1.7.0
numpy 1.24.2
omegaconf 2.3.0
open-clip-torch 2.16.0
opencv-python-headless 4.5.5.64
opendatasets 0.1.22
packaging 23.0
pandas 1.5.3
parso 0.8.3
pathy 0.10.1
pexpect 4.8.0
pickleshare 0.7.5
Pillow 9.4.0
pip 21.1.1
pkgutil-resolve-name 1.3.10
platformdirs 3.1.1
plotly 5.13.1
portalocker 2.7.0
pre-commit 3.2.0
preshed 3.0.8
prompt-toolkit 3.0.38
protobuf 3.20.3
ptyprocess 0.7.0
pure-eval 0.2.2
pyarrow 11.0.0
pycocoevalcap 1.2
pycocotools 2.0.6
pydantic 1.10.6
pydeck 0.8.0
Pygments 2.14.0
Pympler 1.0.1
pyparsing 3.0.9
pyrsistent 0.19.3
python-dateutil 2.8.2
python-Levenshtein 0.20.9
python-magic 0.4.27
python-slugify 8.0.1
pytz 2022.7.1
pytz-deprecation-shim 0.1.0.post0
PyWavelets 1.4.1
PyYAML 6.0
rapidfuzz 2.13.7
regex 2022.10.31
requests 2.28.2
rich 13.3.2
salesforce-lavis 1.0.0
scikit-image 0.20.0
scipy 1.9.1
semver 2.13.0
sentencepiece 0.1.97
setuptools 56.0.0
six 1.16.0
smart-open 6.3.0
smmap 5.0.0
spacy 3.5.1
spacy-legacy 3.0.12
spacy-loggers 1.0.4
srsly 2.4.6
stack-data 0.6.2
streamlit 1.20.0
sympy 1.11.1
tenacity 8.2.2
text-unidecode 1.3
thefuzz 0.19.0
thinc 8.1.9
tifffile 2023.3.15
timm 0.4.12
tokenizers 0.13.2
toml 0.10.2
toolz 0.12.0
torch 2.0.0+cu117
torchvision 0.15.1+cu117
tornado 6.2
tqdm 4.65.0
traitlets 5.9.0
transformers 4.28.0.dev0
triton 2.0.0
typer 0.7.0
typing-extensions 4.5.0
tzdata 2022.7
tzlocal 4.3
urllib3 1.26.15
validators 0.20.0
virtualenv 20.21.0
wasabi 1.1.1
watchdog 2.3.1
wcwidth 0.2.6
webdataset 0.2.43
wheel 0.40.0
zipp 3.15.0`
DeepDanbooru is a caption model trained on tags from boorus. Mostly anime, but it does work incredibly well with non-anime images too. Booru-style prompts (example: 1girl, simple background, watercolor, ...) work very well in text2img.
DeepDanbooru is originally used on danbooru to tag images, but now is also used to fine-tune SD.
Since then a couple modifications (like removing underscores in tags) have been made to it, to work better with SD.
See demo and links to models here:
https://huggingface.co/spaces/SmilingWolf/wd-v1-4-tags
getting this when using blip = true
ERROR:root:Exception during BLIP captioning
Traceback (most recent call last):
File "/content/gdrive/MyDrive/captionr/captionr/captionr_class.py", line 139, in process_img
new_caption = config._blip.caption(img)
File "/content/gdrive/MyDrive/captionr/captionr/blip_cap.py", line 56, in caption
caption = self.blip_model.generate(
File "/usr/local/lib/python3.8/dist-packages/blip/models/blip.py", line 156, in generate
outputs = self.text_decoder.generate(input_ids=input_ids,
File "/usr/local/lib/python3.8/dist-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/transformers/generation/utils.py", line 1490, in generate
return self.beam_search(
File "/usr/local/lib/python3.8/dist-packages/transformers/generation/utils.py", line 2749, in beam_search
outputs = self(
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/blip/models/med.py", line 886, in forward
outputs = self.bert(
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/blip/models/med.py", line 781, in forward
encoder_outputs = self.encoder(
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/blip/models/med.py", line 445, in forward
layer_outputs = layer_module(
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/blip/models/med.py", line 361, in forward
cross_attention_outputs = self.crossattention(
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/blip/models/med.py", line 277, in forward
self_outputs = self.self(
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/blip/models/med.py", line 178, in forward
attention_scores = torch.matmul(query_layer, key_layer.transpose(-1, -2))
RuntimeError: The size of tensor a (8) must match the size of tensor b (64) at non-singleton dimension 0
I have fail_phrases that are hitting doing one pass but it's still writing the caption txt files. Can I have it not write any caption (entire file) if the fail phrase is trigged? in other words, I want to skip captioning completely if any of the fail_phrases are found.
e.g.
python captionr.py I:\AI\tmp_1024x768\sharpened\AI\vtmp\faces-first-half\ --existing=skip --cap_length=30 --git_pass --clip_model_name=ViT-H-14/laion2b_s32b_b79k --clip_flavor --clip_max_flavors=8 --clip_method=interrogate_fast --fail_phrases="a fat,birth,a sign that says,fetus,writing that says,that says,with the word,neatly coming out,extreme face contortion,extremely bizarre disturbing,cell phone,latina,very creepy,membrane,bbw,wrinkled,checknig her phone,sac,severe,blurry,out of focus,multiple eyes,morphing,very very surreal,kaleidoscopic,cursed image,horror photo,many eyes on head,multiple eyes,boy,seen through a kaleidoscope,vomit,face morph" --uniquify_tags --device=cuda --extension=txt
Thanks- I love captionr!
p.s. checknig her phone is not a typo!
it is looking for the pkl in one folder up, not the project
File "G:\captionr\captionr\clip_interrogator.py", line 76, in load_clip_model
with open(os.path.join(config.cache_path,'ViT-H-14_laion2b_s32b_b79k_flavors.pkl'), 'wb') as fd:
FileNotFoundError: [Errno 2] No such file or directory: 'G:\data\ViT-H-14_laion2b_s32b_b79k_flavors.pkl'
looking in G:\data
should be looking in G:\captionr\data\
Hi,
I'm finding I'm getting the following error in Linux Ubuntu Jammy.
I'm getting the same error in Python3.8 and 3.10 with virtual venv.
INFO:root:PREVIEW MODE ENABLED. No caption files will be written.
0%| | 0/3 [00:00<?, ?it/s]ERROR:root:Exception during BLIP captioning
Traceback (most recent call last):
File "/home/machinelearning/tools/captionr/captionr/captionr_class.py", line 139, in process_img
new_caption = config._blip.caption(img)
File "/home/machinelearning/tools/captionr/captionr/blip_cap.py", line 48, in caption
size = self.config.blip_image_eval_size
AttributeError: 'BLIP' object has no attribute 'config'
Other models seem to be working fine including Coca, git, clip.
I tried it out for the first time today. When I try out your sample arguments I get this error.
RuntimeError: Model config for coca_ViT-L-14 not found.
How can I fix this?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.