Git Product home page Git Product logo

captionr's People

Contributors

theovercomer8 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

captionr's Issues

Error with Posix Path with --output

Using the --output folder generates a path error in Linux. (Not sure if this is the same in Windows?)

Traceback (most recent call last):
  File "/home/machinelearning/tools/captionr/captionr/captionr_class.py", line 261, in process_img
    with open(outputfilename, "w", encoding="utf8") as file:
FileNotFoundError: [Errno 2] No such file or directory: "[PosixPath('/home/machinelearning/datasets/processing/output_folder')]/image_123456.txt"

I got it working by changing the line.

                  else:
                        if isinstance(config.output, pathlib.PosixPath):
                            dirname = str(config.output)


Not sure if that is a good fix though?

Error loading Cache Table Invalid Load Key

I opened a couple of different colabs accounts today both with different zips, and both after a couple of runtime restarts were still exhibiting this behavior from the stock colab link on github.

Was hoping you may be able to offer insight. Thank you!

Error

RuntimeError: Sizes of tensors must match except in dimension 1. [WSL on Windows]

Hey! Thanks for your hard work creating captionr. Unfortunately, I'm seeing this error on both Windows native and WSL. Any help is appreciated!

Python version is 3.8.10.

Command:

python captionr.py /mnt/d/model_training/deltron/images/500px/people --blip2_question_file /mnt/d/model_training/deltron/captions/blip2/question.txt --prepend_text "a photo of " --existing=skip --cap_length=75 --blip_pass --use_blip2 --blip2_model blip2_opt/pretrain_opt6.7b --clip_model_name=ViT-L-14/openai --uniquify_tags --device=cuda --extension=txt

Exception:

ERROR:root:Exception during BLIP captioning
Traceback (most recent call last):
  File "/mnt/d/model_training/code/captionr/captionr/captionr_class.py", line 139, in process_img
    new_caption = config._blip.caption(img)
  File "/mnt/d/model_training/code/captionr/captionr/blip2_cap.py", line 22, in caption
    return self.model.generate({"image": image})[0]
  File "/mnt/d/model_training/code/captionr/venv/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/mnt/d/model_training/code/captionr/venv/lib/python3.8/site-packages/lavis/models/blip2_models/blip2_opt.py", line 213, in generate
    outputs = self.opt_model.generate(
  File "/mnt/d/model_training/code/captionr/venv/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/mnt/d/model_training/code/captionr/venv/lib/python3.8/site-packages/transformers/generation/utils.py", line 1490, in generate
    return self.beam_search(
  File "/mnt/d/model_training/code/captionr/venv/lib/python3.8/site-packages/transformers/generation/utils.py", line 2749, in beam_search
    outputs = self(
  File "/mnt/d/model_training/code/captionr/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/d/model_training/code/captionr/venv/lib/python3.8/site-packages/lavis/models/blip2_models/modeling_opt.py", line 1037, in forward
    outputs = self.model.decoder(
  File "/mnt/d/model_training/code/captionr/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/d/model_training/code/captionr/venv/lib/python3.8/site-packages/lavis/models/blip2_models/modeling_opt.py", line 703, in forward
    inputs_embeds = torch.cat([query_embeds, inputs_embeds], dim=1)
RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 25 but got size 5 for tensor number 1 in the list.

Pip packages:

`Package Version


altair 4.2.2
antlr4-python3-runtime 4.9.3
asttokens 2.2.1
attrs 22.2.0
backcall 0.2.0
backports.zoneinfo 0.2.1
blinker 1.5
blip-vit 0.0.3
blis 0.7.9
braceexpand 0.1.7
cachetools 5.3.0
catalogue 2.0.8
certifi 2022.12.7
cfgv 3.3.1
charset-normalizer 3.1.0
click 8.1.3
cmake 3.26.0
confection 0.0.4
contexttimer 0.3.3
contourpy 1.0.7
cycler 0.11.0
cymem 2.0.7
decorator 5.1.1
decord 0.6.0
distlib 0.3.6
einops 0.6.0
entrypoints 0.4
executing 1.2.0
fairscale 0.4.4
filelock 3.10.0
fonttools 4.39.2
ftfy 6.1.1
gitdb 4.0.10
GitPython 3.1.31
huggingface-hub 0.13.2
identify 2.5.21
idna 3.4
imageio 2.26.0
importlib-metadata 6.0.0
importlib-resources 5.12.0
iopath 0.1.10
ipython 8.11.0
jedi 0.18.2
Jinja2 3.1.2
jsonschema 4.17.3
kaggle 1.5.13
kiwisolver 1.4.4
langcodes 3.3.0
lazy-loader 0.1
Levenshtein 0.20.9
lit 15.0.7
markdown-it-py 2.2.0
MarkupSafe 2.1.2
matplotlib 3.7.1
matplotlib-inline 0.1.6
mdurl 0.1.2
mpmath 1.3.0
murmurhash 1.0.9
networkx 3.0
nodeenv 1.7.0
numpy 1.24.2
omegaconf 2.3.0
open-clip-torch 2.16.0
opencv-python-headless 4.5.5.64
opendatasets 0.1.22
packaging 23.0
pandas 1.5.3
parso 0.8.3
pathy 0.10.1
pexpect 4.8.0
pickleshare 0.7.5
Pillow 9.4.0
pip 21.1.1
pkgutil-resolve-name 1.3.10
platformdirs 3.1.1
plotly 5.13.1
portalocker 2.7.0
pre-commit 3.2.0
preshed 3.0.8
prompt-toolkit 3.0.38
protobuf 3.20.3
ptyprocess 0.7.0
pure-eval 0.2.2
pyarrow 11.0.0
pycocoevalcap 1.2
pycocotools 2.0.6
pydantic 1.10.6
pydeck 0.8.0
Pygments 2.14.0
Pympler 1.0.1
pyparsing 3.0.9
pyrsistent 0.19.3
python-dateutil 2.8.2
python-Levenshtein 0.20.9
python-magic 0.4.27
python-slugify 8.0.1
pytz 2022.7.1
pytz-deprecation-shim 0.1.0.post0
PyWavelets 1.4.1
PyYAML 6.0
rapidfuzz 2.13.7
regex 2022.10.31
requests 2.28.2
rich 13.3.2
salesforce-lavis 1.0.0
scikit-image 0.20.0
scipy 1.9.1
semver 2.13.0
sentencepiece 0.1.97
setuptools 56.0.0
six 1.16.0
smart-open 6.3.0
smmap 5.0.0
spacy 3.5.1
spacy-legacy 3.0.12
spacy-loggers 1.0.4
srsly 2.4.6
stack-data 0.6.2
streamlit 1.20.0
sympy 1.11.1
tenacity 8.2.2
text-unidecode 1.3
thefuzz 0.19.0
thinc 8.1.9
tifffile 2023.3.15
timm 0.4.12
tokenizers 0.13.2
toml 0.10.2
toolz 0.12.0
torch 2.0.0+cu117
torchvision 0.15.1+cu117
tornado 6.2
tqdm 4.65.0
traitlets 5.9.0
transformers 4.28.0.dev0
triton 2.0.0
typer 0.7.0
typing-extensions 4.5.0
tzdata 2022.7
tzlocal 4.3
urllib3 1.26.15
validators 0.20.0
virtualenv 20.21.0
wasabi 1.1.1
watchdog 2.3.1
wcwidth 0.2.6
webdataset 0.2.43
wheel 0.40.0
zipp 3.15.0`

Add model based on DeepDanbooru

DeepDanbooru is a caption model trained on tags from boorus. Mostly anime, but it does work incredibly well with non-anime images too. Booru-style prompts (example: 1girl, simple background, watercolor, ...) work very well in text2img.

DeepDanbooru is originally used on danbooru to tag images, but now is also used to fine-tune SD.
Since then a couple modifications (like removing underscores in tags) have been made to it, to work better with SD.

See demo and links to models here:
https://huggingface.co/spaces/SmilingWolf/wd-v1-4-tags

The size of tensor a (8) must match the size of tensor b (64) at non-singleton dimension 0

getting this when using blip = true

ERROR:root:Exception during BLIP captioning
Traceback (most recent call last):
  File "/content/gdrive/MyDrive/captionr/captionr/captionr_class.py", line 139, in process_img
    new_caption = config._blip.caption(img)
  File "/content/gdrive/MyDrive/captionr/captionr/blip_cap.py", line 56, in caption
    caption = self.blip_model.generate(
  File "/usr/local/lib/python3.8/dist-packages/blip/models/blip.py", line 156, in generate
    outputs = self.text_decoder.generate(input_ids=input_ids,
  File "/usr/local/lib/python3.8/dist-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/transformers/generation/utils.py", line 1490, in generate
    return self.beam_search(
  File "/usr/local/lib/python3.8/dist-packages/transformers/generation/utils.py", line 2749, in beam_search
    outputs = self(
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/blip/models/med.py", line 886, in forward
    outputs = self.bert(
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/blip/models/med.py", line 781, in forward
    encoder_outputs = self.encoder(
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/blip/models/med.py", line 445, in forward
    layer_outputs = layer_module(
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/blip/models/med.py", line 361, in forward
    cross_attention_outputs = self.crossattention(
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/blip/models/med.py", line 277, in forward
    self_outputs = self.self(
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/blip/models/med.py", line 178, in forward
    attention_scores = torch.matmul(query_layer, key_layer.transpose(-1, -2))
RuntimeError: The size of tensor a (8) must match the size of tensor b (64) at non-singleton dimension 0

don't write captions for fail phrases

I have fail_phrases that are hitting doing one pass but it's still writing the caption txt files. Can I have it not write any caption (entire file) if the fail phrase is trigged? in other words, I want to skip captioning completely if any of the fail_phrases are found.

e.g.

python captionr.py I:\AI\tmp_1024x768\sharpened\AI\vtmp\faces-first-half\ --existing=skip --cap_length=30 --git_pass --clip_model_name=ViT-H-14/laion2b_s32b_b79k --clip_flavor --clip_max_flavors=8 --clip_method=interrogate_fast --fail_phrases="a fat,birth,a sign that says,fetus,writing that says,that says,with the word,neatly coming out,extreme face contortion,extremely bizarre disturbing,cell phone,latina,very creepy,membrane,bbw,wrinkled,checknig her phone,sac,severe,blurry,out of focus,multiple eyes,morphing,very very surreal,kaleidoscopic,cursed image,horror photo,many eyes on head,multiple eyes,boy,seen through a kaleidoscope,vomit,face morph" --uniquify_tags --device=cuda --extension=txt

Thanks- I love captionr!

p.s. checknig her phone is not a typo!

Wrong path

it is looking for the pkl in one folder up, not the project
File "G:\captionr\captionr\clip_interrogator.py", line 76, in load_clip_model
with open(os.path.join(config.cache_path,'ViT-H-14_laion2b_s32b_b79k_flavors.pkl'), 'wb') as fd:
FileNotFoundError: [Errno 2] No such file or directory: 'G:\data\ViT-H-14_laion2b_s32b_b79k_flavors.pkl'

looking in G:\data
should be looking in G:\captionr\data\

BLIP error with config

Hi,

I'm finding I'm getting the following error in Linux Ubuntu Jammy.
I'm getting the same error in Python3.8 and 3.10 with virtual venv.

INFO:root:PREVIEW MODE ENABLED. No caption files will be written.
  0%|                                                                                                                                                         | 0/3 [00:00<?, ?it/s]ERROR:root:Exception during BLIP captioning
Traceback (most recent call last):
  File "/home/machinelearning/tools/captionr/captionr/captionr_class.py", line 139, in process_img
    new_caption = config._blip.caption(img)
  File "/home/machinelearning/tools/captionr/captionr/blip_cap.py", line 48, in caption
    size = self.config.blip_image_eval_size
AttributeError: 'BLIP' object has no attribute 'config'

Other models seem to be working fine including Coca, git, clip.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.