theovercomer8 / captionr Goto Github PK

View Code? Open in Web Editor NEW

135.0 135.0 15.0 141.14 MB

GIT/BLIP/CLIP Caption tool

License: MIT License

Jupyter Notebook 45.38% Python 54.62%

captionr's People

Contributors

Stargazers

Watchers

Forkers

jvkap c00renut kaz03 hassan-sd mediocreatmybest jingzhengli adambear younyokel leotsem couldnt-find-good-name rockerboo pipinstallyp

captionr's Issues

Request: BLIP2 memory optimizations?

This issue on the LAVIS repository salesforce/LAVIS#118 shows someone who managed to fit BLIP2 onto his 24 GB VRAM card. Would it be possible to apply that to this repo?

Error with Posix Path with --output

Using the --output folder generates a path error in Linux. (Not sure if this is the same in Windows?)

Traceback (most recent call last):
  File "/home/machinelearning/tools/captionr/captionr/captionr_class.py", line 261, in process_img
    with open(outputfilename, "w", encoding="utf8") as file:
FileNotFoundError: [Errno 2] No such file or directory: "[PosixPath('/home/machinelearning/datasets/processing/output_folder')]/image_123456.txt"

I got it working by changing the line.

                  else:
                        if isinstance(config.output, pathlib.PosixPath):
                            dirname = str(config.output)

Not sure if that is a good fix though?

Error loading Cache Table Invalid Load Key

I opened a couple of different colabs accounts today both with different zips, and both after a couple of runtime restarts were still exhibiting this behavior from the stock colab link on github.

Was hoping you may be able to offer insight. Thank you!

Colab version - Error at launch

Haven't had time to test the Colab notebook yet, I have the following error as soon as I launch it:

im getting this error ERROR:root:Error loading cached table artists: invalid load key, 'E'.

INFO:root:Loaded ViT-H-14 model config.
INFO:root:Loading pretrained ViT-H-14 weights (laion2b_s32b_b79k).
ERROR:root:Error loading cached table artists: invalid load key, 'E'.
and then process stops
Preprocessing artists: 0%| | 0/10 [00:00<?, ?it/s]

" An error occurred while downloading the file: A UTF-8 locale is required. Got ANSI_X3.4-1968

Not sure why, but on quite a few of my zips I get a " An error occurred while downloading the file: A UTF-8 locale is required. Got ANSI_X3.4-1968 " This is bypassed and works fine by doing a unzip -j /ziplocation -d /content/dataset and going directly to caption cell.

RuntimeError: Sizes of tensors must match except in dimension 1. [WSL on Windows]

Hey! Thanks for your hard work creating captionr. Unfortunately, I'm seeing this error on both Windows native and WSL. Any help is appreciated!

Python version is 3.8.10.

Command:

python captionr.py /mnt/d/model_training/deltron/images/500px/people --blip2_question_file /mnt/d/model_training/deltron/captions/blip2/question.txt --prepend_text "a photo of " --existing=skip --cap_length=75 --blip_pass --use_blip2 --blip2_model blip2_opt/pretrain_opt6.7b --clip_model_name=ViT-L-14/openai --uniquify_tags --device=cuda --extension=txt

Exception:

ERROR:root:Exception during BLIP captioning
Traceback (most recent call last):
  File "/mnt/d/model_training/code/captionr/captionr/captionr_class.py", line 139, in process_img
    new_caption = config._blip.caption(img)
  File "/mnt/d/model_training/code/captionr/captionr/blip2_cap.py", line 22, in caption
    return self.model.generate({"image": image})[0]
  File "/mnt/d/model_training/code/captionr/venv/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/mnt/d/model_training/code/captionr/venv/lib/python3.8/site-packages/lavis/models/blip2_models/blip2_opt.py", line 213, in generate
    outputs = self.opt_model.generate(
  File "/mnt/d/model_training/code/captionr/venv/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/mnt/d/model_training/code/captionr/venv/lib/python3.8/site-packages/transformers/generation/utils.py", line 1490, in generate
    return self.beam_search(
  File "/mnt/d/model_training/code/captionr/venv/lib/python3.8/site-packages/transformers/generation/utils.py", line 2749, in beam_search
    outputs = self(
  File "/mnt/d/model_training/code/captionr/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/d/model_training/code/captionr/venv/lib/python3.8/site-packages/lavis/models/blip2_models/modeling_opt.py", line 1037, in forward
    outputs = self.model.decoder(
  File "/mnt/d/model_training/code/captionr/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/d/model_training/code/captionr/venv/lib/python3.8/site-packages/lavis/models/blip2_models/modeling_opt.py", line 703, in forward
    inputs_embeds = torch.cat([query_embeds, inputs_embeds], dim=1)
RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 25 but got size 5 for tensor number 1 in the list.

Pip packages:

`Package Version

altair 4.2.2
antlr4-python3-runtime 4.9.3
asttokens 2.2.1
attrs 22.2.0
backcall 0.2.0
backports.zoneinfo 0.2.1
blinker 1.5
blip-vit 0.0.3
blis 0.7.9
braceexpand 0.1.7
cachetools 5.3.0
catalogue 2.0.8
certifi 2022.12.7
cfgv 3.3.1
charset-normalizer 3.1.0
click 8.1.3
cmake 3.26.0
confection 0.0.4
contexttimer 0.3.3
contourpy 1.0.7
cycler 0.11.0
cymem 2.0.7
decorator 5.1.1
decord 0.6.0
distlib 0.3.6
einops 0.6.0
entrypoints 0.4
executing 1.2.0
fairscale 0.4.4
filelock 3.10.0
fonttools 4.39.2
ftfy 6.1.1
gitdb 4.0.10
GitPython 3.1.31
huggingface-hub 0.13.2
identify 2.5.21
idna 3.4
imageio 2.26.0
importlib-metadata 6.0.0
importlib-resources 5.12.0
iopath 0.1.10
ipython 8.11.0
jedi 0.18.2
Jinja2 3.1.2
jsonschema 4.17.3
kaggle 1.5.13
kiwisolver 1.4.4
langcodes 3.3.0
lazy-loader 0.1
Levenshtein 0.20.9
lit 15.0.7
markdown-it-py 2.2.0
MarkupSafe 2.1.2
matplotlib 3.7.1
matplotlib-inline 0.1.6
mdurl 0.1.2
mpmath 1.3.0
murmurhash 1.0.9
networkx 3.0
nodeenv 1.7.0
numpy 1.24.2
omegaconf 2.3.0
open-clip-torch 2.16.0
opencv-python-headless 4.5.5.64
opendatasets 0.1.22
packaging 23.0
pandas 1.5.3
parso 0.8.3
pathy 0.10.1
pexpect 4.8.0
pickleshare 0.7.5
Pillow 9.4.0
pip 21.1.1
pkgutil-resolve-name 1.3.10
platformdirs 3.1.1
plotly 5.13.1
portalocker 2.7.0
pre-commit 3.2.0
preshed 3.0.8
prompt-toolkit 3.0.38
protobuf 3.20.3
ptyprocess 0.7.0
pure-eval 0.2.2
pyarrow 11.0.0
pycocoevalcap 1.2
pycocotools 2.0.6
pydantic 1.10.6
pydeck 0.8.0
Pygments 2.14.0
Pympler 1.0.1
pyparsing 3.0.9
pyrsistent 0.19.3
python-dateutil 2.8.2
python-Levenshtein 0.20.9
python-magic 0.4.27
python-slugify 8.0.1
pytz 2022.7.1
pytz-deprecation-shim 0.1.0.post0
PyWavelets 1.4.1
PyYAML 6.0
rapidfuzz 2.13.7
regex 2022.10.31
requests 2.28.2
rich 13.3.2
salesforce-lavis 1.0.0
scikit-image 0.20.0
scipy 1.9.1
semver 2.13.0
sentencepiece 0.1.97
setuptools 56.0.0
six 1.16.0
smart-open 6.3.0
smmap 5.0.0
spacy 3.5.1
spacy-legacy 3.0.12
spacy-loggers 1.0.4
srsly 2.4.6
stack-data 0.6.2
streamlit 1.20.0
sympy 1.11.1
tenacity 8.2.2
text-unidecode 1.3
thefuzz 0.19.0
thinc 8.1.9
tifffile 2023.3.15
timm 0.4.12
tokenizers 0.13.2
toml 0.10.2
toolz 0.12.0
torch 2.0.0+cu117
torchvision 0.15.1+cu117
tornado 6.2
tqdm 4.65.0
traitlets 5.9.0
transformers 4.28.0.dev0
triton 2.0.0
typer 0.7.0
typing-extensions 4.5.0
tzdata 2022.7
tzlocal 4.3
urllib3 1.26.15
validators 0.20.0
virtualenv 20.21.0
wasabi 1.1.1
watchdog 2.3.1
wcwidth 0.2.6
webdataset 0.2.43
wheel 0.40.0
zipp 3.15.0`

Add model based on DeepDanbooru

DeepDanbooru is a caption model trained on tags from boorus. Mostly anime, but it does work incredibly well with non-anime images too. Booru-style prompts (example: 1girl, simple background, watercolor, ...) work very well in text2img.

DeepDanbooru is originally used on danbooru to tag images, but now is also used to fine-tune SD.
Since then a couple modifications (like removing underscores in tags) have been made to it, to work better with SD.

See demo and links to models here:
https://huggingface.co/spaces/SmilingWolf/wd-v1-4-tags

The size of tensor a (8) must match the size of tensor b (64) at non-singleton dimension 0

getting this when using blip = true

ERROR:root:Exception during BLIP captioning
Traceback (most recent call last):
  File "/content/gdrive/MyDrive/captionr/captionr/captionr_class.py", line 139, in process_img
    new_caption = config._blip.caption(img)
  File "/content/gdrive/MyDrive/captionr/captionr/blip_cap.py", line 56, in caption
    caption = self.blip_model.generate(
  File "/usr/local/lib/python3.8/dist-packages/blip/models/blip.py", line 156, in generate
    outputs = self.text_decoder.generate(input_ids=input_ids,
  File "/usr/local/lib/python3.8/dist-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/transformers/generation/utils.py", line 1490, in generate
    return self.beam_search(
  File "/usr/local/lib/python3.8/dist-packages/transformers/generation/utils.py", line 2749, in beam_search
    outputs = self(
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/blip/models/med.py", line 886, in forward
    outputs = self.bert(
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/blip/models/med.py", line 781, in forward
    encoder_outputs = self.encoder(
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/blip/models/med.py", line 445, in forward
    layer_outputs = layer_module(
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/blip/models/med.py", line 361, in forward
    cross_attention_outputs = self.crossattention(
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/blip/models/med.py", line 277, in forward
    self_outputs = self.self(
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/blip/models/med.py", line 178, in forward
    attention_scores = torch.matmul(query_layer, key_layer.transpose(-1, -2))
RuntimeError: The size of tensor a (8) must match the size of tensor b (64) at non-singleton dimension 0

don't write captions for fail phrases

I have fail_phrases that are hitting doing one pass but it's still writing the caption txt files. Can I have it not write any caption (entire file) if the fail phrase is trigged? in other words, I want to skip captioning completely if any of the fail_phrases are found.

e.g.

python captionr.py I:\AI\tmp_1024x768\sharpened\AI\vtmp\faces-first-half\ --existing=skip --cap_length=30 --git_pass --clip_model_name=ViT-H-14/laion2b_s32b_b79k --clip_flavor --clip_max_flavors=8 --clip_method=interrogate_fast --fail_phrases="a fat,birth,a sign that says,fetus,writing that says,that says,with the word,neatly coming out,extreme face contortion,extremely bizarre disturbing,cell phone,latina,very creepy,membrane,bbw,wrinkled,checknig her phone,sac,severe,blurry,out of focus,multiple eyes,morphing,very very surreal,kaleidoscopic,cursed image,horror photo,many eyes on head,multiple eyes,boy,seen through a kaleidoscope,vomit,face morph" --uniquify_tags --device=cuda --extension=txt

Thanks- I love captionr!

p.s. checknig her phone is not a typo!

Wrong path

it is looking for the pkl in one folder up, not the project
File "G:\captionr\captionr\clip_interrogator.py", line 76, in load_clip_model
with open(os.path.join(config.cache_path,'ViT-H-14_laion2b_s32b_b79k_flavors.pkl'), 'wb') as fd:
FileNotFoundError: [Errno 2] No such file or directory: 'G:\data\ViT-H-14_laion2b_s32b_b79k_flavors.pkl'

looking in G:\data
should be looking in G:\captionr\data\

BLIP error with config

Hi,

I'm finding I'm getting the following error in Linux Ubuntu Jammy.
I'm getting the same error in Python3.8 and 3.10 with virtual venv.

INFO:root:PREVIEW MODE ENABLED. No caption files will be written.
  0%|                                                                                                                                                         | 0/3 [00:00<?, ?it/s]ERROR:root:Exception during BLIP captioning
Traceback (most recent call last):
  File "/home/machinelearning/tools/captionr/captionr/captionr_class.py", line 139, in process_img
    new_caption = config._blip.caption(img)
  File "/home/machinelearning/tools/captionr/captionr/blip_cap.py", line 48, in caption
    size = self.config.blip_image_eval_size
AttributeError: 'BLIP' object has no attribute 'config'

Other models seem to be working fine including Coca, git, clip.

I'm getting this error. RuntimeError: Model config for coca_ViT-L-14 not found.

I tried it out for the first time today. When I try out your sample arguments I get this error.

RuntimeError: Model config for coca_ViT-L-14 not found.

How can I fix this?