keras-team / keras-cv Goto Github PK

Industry-strength Computer Vision workflows with Keras

License: Other

Python 83.10% Shell 0.30% Jupyter Notebook 15.31% Dockerfile 0.01% Starlark 0.29% Smarty 0.01% C++ 0.99%

keras-cv's Introduction

KerasCV

KerasCV is a library of modular computer vision components that work natively with TensorFlow, JAX, or PyTorch. Built on Keras 3, these models, layers, metrics, callbacks, etc., can be trained and serialized in any framework and re-used in another without costly migrations. See "Configuring your backend" below for more details on multi-framework KerasCV.

KerasCV can be understood as a horizontal extension of the Keras API: the components are new first-party Keras objects that are too specialized to be added to core Keras. They receive the same level of polish and backwards compatibility guarantees as the core Keras API, and they are maintained by the Keras team.

Our APIs assist in common computer vision tasks such as data augmentation, classification, object detection, segmentation, image generation, and more. Applied computer vision engineers can leverage KerasCV to quickly assemble production-grade, state-of-the-art training and inference pipelines for all of these common tasks.

Quick Links

Installation

KerasCV supports both Keras 2 and Keras 3. We recommend Keras 3 for all new users, as it enables using KerasCV models and layers with JAX, TensorFlow and PyTorch.

Keras 2 Installation

To install the latest KerasCV release with Keras 2, simply run:

pip install --upgrade keras-cv tensorflow

Keras 3 Installation

There are currently two ways to install Keras 3 with KerasCV. To install the latest changes for KerasCV and Keras, you can use our nightly package.

pip install --upgrade keras-cv-nightly tf-nightly

To install the stable versions of KerasCV and Keras 3, you should install Keras 3 after installing KerasCV. This is a temporary step while TensorFlow is pinned to Keras 2, and will no longer be necessary after TensorFlow 2.16.

pip install --upgrade keras-cv tensorflow
pip install --upgrade keras

Important

Keras 3 will not function with TensorFlow 2.14 or earlier.

Configuring your backend

If you have Keras 3 installed in your environment (see installation above), you can use KerasCV with any of JAX, TensorFlow and PyTorch. To do so, set the KERAS_BACKEND environment variable. For example: so by setting the KERAS_BACKEND environment variable. For example:

export KERAS_BACKEND=jax

Or in Colab, with:

import os
os.environ["KERAS_BACKEND"] = "jax"

import keras_cv

Important

Make sure to set the KERAS_BACKEND before import any Keras libraries, it will be used to set up Keras when it is first imported.

Once that configuration step is done, you can just import KerasCV and start using it on top of your backend of choice:

import keras_cv
import keras

filepath = keras.utils.get_file(origin="https://i.imgur.com/gCNcJJI.jpg")
image = np.array(keras.utils.load_img(filepath))
image_resized = keras.ops.image.resize(image, (640, 640))[None, ...]

model = keras_cv.models.YOLOV8Detector.from_preset(
    "yolo_v8_m_pascalvoc",
    bounding_box_format="xywh",
)
predictions = model.predict(image_resized)

Quickstart

import tensorflow as tf
import keras_cv
import tensorflow_datasets as tfds
import keras

# Create a preprocessing pipeline with augmentations
BATCH_SIZE = 16
NUM_CLASSES = 3
augmenter = keras_cv.layers.Augmenter(
    [
        keras_cv.layers.RandomFlip(),
        keras_cv.layers.RandAugment(value_range=(0, 255)),
        keras_cv.layers.CutMix(),
    ],
)

def preprocess_data(images, labels, augment=False):
    labels = tf.one_hot(labels, NUM_CLASSES)
    inputs = {"images": images, "labels": labels}
    outputs = inputs
    if augment:
        outputs = augmenter(outputs)
    return outputs['images'], outputs['labels']

train_dataset, test_dataset = tfds.load(
    'rock_paper_scissors',
    as_supervised=True,
    split=['train', 'test'],
)
train_dataset = train_dataset.batch(BATCH_SIZE).map(
    lambda x, y: preprocess_data(x, y, augment=True),
        num_parallel_calls=tf.data.AUTOTUNE).prefetch(
            tf.data.AUTOTUNE)
test_dataset = test_dataset.batch(BATCH_SIZE).map(
    preprocess_data, num_parallel_calls=tf.data.AUTOTUNE).prefetch(
        tf.data.AUTOTUNE)

# Create a model using a pretrained backbone
backbone = keras_cv.models.EfficientNetV2Backbone.from_preset(
    "efficientnetv2_b0_imagenet"
)
model = keras_cv.models.ImageClassifier(
    backbone=backbone,
    num_classes=NUM_CLASSES,
    activation="softmax",
)
model.compile(
    loss='categorical_crossentropy',
    optimizer=keras.optimizers.Adam(learning_rate=1e-5),
    metrics=['accuracy']
)

# Train your model
model.fit(
    train_dataset,
    validation_data=test_dataset,
    epochs=8,
)

Contributors

If you'd like to contribute, please see our contributing guide.

To find an issue to tackle, please check our call for contributions.

We would like to leverage/outsource the Keras community not only for bug reporting, but also for active development for feature delivery. To achieve this, here is the predefined process for how to contribute to this repository:

Contributors are always welcome to help us fix an issue, add tests, better documentation.
If contributors would like to create a backbone, we usually require a pre-trained weight set with the model for one dataset as the first PR, and a training script as a follow-up. The training script will preferably help us reproduce the results claimed from paper. The backbone should be generic but the training script can contain paper specific parameters such as learning rate schedules and weight decays. The training script will be used to produce leaderboard results. Exceptions apply to large transformer-based models which are difficult to train. If this is the case, contributors should let us know so the team can help in training the model or providing GCP resources.
If contributors would like to create a meta arch, please try to be aligned with our roadmap and create a PR for design review to make sure the meta arch is modular.
If contributors would like to create a new input formatting which is not in our roadmap for the next 6 months, e.g., keypoint, please create an issue and ask for a sponsor.
If contributors would like to support a new task which is not in our roadmap for the next 6 months, e.g., 3D reconstruction, please create an issue and ask for a sponsor.

Thank you to all of our wonderful contributors!

Pretrained Weights

Many models in KerasCV come with pre-trained weights. With the exception of StableDiffusion and the standard Vision Transformer, all of these weights are trained using Keras and KerasCV components and training scripts in this repository. While some models are not trained with the same parameters or preprocessing pipeline as defined in their original publications, the KerasCV team ensures strong numerical performance. Performance metrics for the provided pre-trained weights can be found in the training history for each documented task. An example of this can be found in the ImageNet classification training history for backbone models. All results are reproducible using the training scripts in this repository.

Historically, many models have been trained on image datasets rescaled via manually crafted normalization schemes. The most common variant of manually crafted normalization scheme is subtraction of the imagenet mean pixel followed by standard deviation normalization based on the imagenet pixel standard deviation. This scheme is an artifact of the days of manual feature engineering, but is no longer required to score state of the art scores using modern deep learning architectures. Due to this, KerasCV is standardized to operate on images that have been rescaled using a simple 1/255 rescaling layer. This can be seen in all KerasCV training pipelines and code examples.

Custom Ops

Note that in some of the 3D Object Detection layers, custom TF ops are used. The binaries for these ops are not shipped in our PyPi package in order to keep our wheels pure-Python.

If you'd like to use these custom ops, you can install from source using the instructions below.

Installing KerasCV with Custom Ops from Source

Installing custom ops from source requires the Bazel build system (version >= 5.4.0). Steps to install Bazel can be found here.

git clone https://github.com/keras-team/keras-cv.git
cd keras-cv

python3 build_deps/configure.py

bazel build build_pip_pkg
export BUILD_WITH_CUSTOM_OPS=true
bazel-bin/build_pip_pkg wheels

pip install wheels/keras_cv-*.whl

Note that GitHub actions exist to release KerasCV with custom ops, but are currently disabled. You can use these actions in your own fork to create wheels for Linux (manylinux2014), MacOS (both x86 and ARM), and Windows.

Disclaimer

KerasCV provides access to pre-trained models via the keras_cv.models API. These pre-trained models are provided on an "as is" basis, without warranties or conditions of any kind. The following underlying models are provided by third parties, and are subject to separate licenses: StableDiffusion, Vision Transformer

Citing KerasCV

If KerasCV helps your research, we appreciate your citations. Here is the BibTeX entry:

@misc{wood2022kerascv,
  title={KerasCV},
  author={Wood, Luke and Tan, Zhenyu and Stenbit, Ian and Bischof, Jonathan and Zhu, Scott and Chollet, Fran\c{c}ois and Sreepathihalli, Divyashree and Sampath, Ramesh and others},
  year={2022},
  howpublished={\url{https://github.com/keras-team/keras-cv}},
}

keras-cv's People

Stargazers

Watchers

Forkers

baerxxl matthamoros aniketmaurya gaimjkp pentanol2 geogubd codeur66 chenghuige samuelmarks global-localhost global19 snapbuy isabella232 k-tanjirou dedsec-9 afcarl ekmixon qlzh727 lukewood bhack mattdangerw chjort sampathweb sebastian-sz navan0 kartik4949 strongdiamond artu1999 maxpark parikshit14 elilaird leondgarse frostbyte012 kashyapdevesh quantumalaviya kksinghal ahmadmhmdsy lucasdavid liangsh0208 xyzhang7 blackhat-coder adhadse mrutyunjay01 supun-bandara adityagandhamal ritamgit-alt ashwin-ravi dipsivenkatesh zhiqwang xinsuinizhuan pranavjadhav001 belabiedredouane ornob39 mbrukman markub3327 onlyforreg johnypark johko abhi-glitchhg beresandras adityakane2001 andreped sahilkhose chunxiangzheng ianstenbit atuleu rishit-dagli techthiyanes ricardoprins ayyucedemirbas hrithiknambiar rmallof ayulockin shivalikasingh95 loganwu0526 glomquyet jordan663994 silvererudite tanzhenyu akucia zaccharieramzi barseghyanartur geko1100 pritraj90 xialibing isipi karimzade beci leonardoroma kechan davidlandup0 pashu123 jacoverster wannafiy mimustriurus gustheman noelv94 avrilfanomar deterministic-algorithms-lab https-github-com-db11051998

keras-cv's Issues

Add Swin-Transformer to keras.applications

If you open a GitHub issue, here is our policy:

It must be a bug, a feature request, or a significant problem with the documentation (for small docs fixes please send a PR instead).
The form below must be filled out.

Here's why we have that policy:.

Keras developers respond to issues. We want to focus on work that benefits the whole community, e.g., fixing bugs and adding features. Support only helps individuals. GitHub also notifies thousands of people when issues are filed. We want them to see you communicating an interesting problem, rather than being redirected to Stack Overflow.

System information.

TensorFlow version (you are using): 2.6
Are you willing to contribute it (Yes/No): Yes

Describe the feature and the current behavior/state.

Describe the feature clearly here. Be sure to convey here why the requested feature is needed. Any brief description of the use-case would help.

Paper: Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Original Code: https://github.com/microsoft/Swin-Transformer?utm_source=catalyzex.com

It's a variant of the transformer model and achieves state-of-the-art performance or comparable performance with the best CNN-based models. It also contains enough citations (~250 at this moment) for addition to the package.

On ImageNet-1K and 22K, below is the comparable results with EfficientNet (CNN) models.

-	Img Size	Top 1K acc	-	Img Size	Top 1K acc	Top 22K acc
E3	300	81.6	EfficientNetV2-S	-	83.9	84.9
E5	456	83.6	EfficientNetV2-M	-	85.1	86.2
E7	600	84.3	EfficientNetV2-L	-	85.7	86.8
-	-	-	EfficientNetV2-XL	-	-	87.3
Swin-T	224	81.3	Swin-B	224	-	85.2
Swin-S	224	83.0	Swin-B	384	-	86.4
Swin-B	224	83.5	Swin-L	384	-	87.3
Swin-B	384	84.5	-	-	-	-

Will this change the current api? How?
Yes. It will change as follows

tensorflow.keras.applications.SwinT
tensorflow.keras.applications.SwinS
tensorflow.keras.applications.SwinB
tensorflow.keras.applications.SwinL

Who will benefit from this feature?
Keras users.

Contributing

Do you want to contribute a PR? (yes/no): yes.
If yes, please read this page for instructions
Briefly describe your candidate solution(if contributing):

Reorder channel layer for smooth native RGB - BGR

This is a commonly performed operation in image processing making it a good fit for keras-cv. While this is not a super complicated operation (it's really just a lambda into a tf.gather) the resulting code is more readable. Let's include this in keras-cv.

Originally discussed here: keras-team/keras#15705

Re the design of the layer signature...

The current proposal is to implement a layer that looks like this:

rgb2bgr_layer = tf.keras.layers.ReorderChannel(order=[2, 1, 0], axis=-1)

Perhaps we actually may want to use a einsum inspired syntax, such as:

rgb2bgr_layer = tf.keras.layers.ReorderChannel('rgb->bgr', axis=-1)

This would be really readable to anyone stumbling upon a new codebase, and generalizes quite well to any number of channels.

Please feel free to comment below with any additional thoughts

End to end model.fit() performance testing w/ COCO metrics

Some comments on this:

we can use a Keras callback to track the runtimes of each step
we must omit the first batch, as this will include the Keras compilation step
we can test the model runtime with/without each metric, this will give us a reference as to how expensive the metrics are. No metric should increase runtime by more than a factor of 2x

Anchor free models

Please evaluate to extract common components/utils that could support Anchor free models:
tensorflow/hub#424

Simple Copy-Paste Augmentation (Instance Segmentation)

Demo:

Ref: https://github.com/tensorflow/tpu/tree/master/models/official/detection/projects/copy_paste

Re-evaluate the need for `label_smoothing` in CutMix

I'm pretty sure we can remove this and rely on keras.losses.CategoricalCrossentropy's label_smoothing. I don't think it makes any numerical difference to do it earlier in the pipeline.

Implement numerical correctness test for COCO metrics

This can be done by hard coding some predictions, running through cocoeval.py, then testing the known values against the Keras COCO metrics

WideResNet architecture

Best resnet for Cifar10/Cifar100

https://arxiv.org/abs/1605.07146

RandAug preprocessing layer

https://keras.io/examples/vision/randaugment/

Can you tell me the road map of keras-cv？

I'm very interested in this warehouse. Is it an upgraded version of keras app? Like torchvision?

Mosaic Augmentation for Object Detection

Mosaic Augmentation

Paper: YOLOv4: Optimal Speed and Accuracy of Object Detection (Figure 3).
Yolo-V4 Citation: 3718

What is it?
The idea is to take 4 random samples and create a single sample of mosaic fashion with them. For example:

Starter

PyTorch Implementation:
TensorFlow Implementation: google/automl/efficientdet/aug/mosaic.py

CutOut augmentation

Add `ResNeXt [50, 101]` to keras.applications

System information.

TensorFlow version (you are using): 2.5
Are you willing to contribute it (Yes/No) : Yes

Describe the feature and the current behavior/state.

ResNeXt is a well-known classification model that is weirdly missing from the keras.applications. The idea that is presented here is great and I think it's fit for the model member in the keras package.

Will this change the current api? How?
Yes.

from tensorflow.keras.applications.resnext50 import ResNeXt50
from tensorflow.keras.applications.resnext101 import ResNeXt101

Who will benefit from this feature?
ML engineers and researcher who uses tf.keras.

Others
Others implementation: https://github.com/qubvel/classification_models

ResNet34 architecture

Vectorize/Optimize the COCORecall metric

Lots of the for loops in the COCORecall update_state method could be reduced to tf.einsum() calls. Additionally, we should be able to vectorize area computation in the iou function, along with some other operations in the overall computation of the metric. tf.TensorArrays are quite slow, so we should look to remove those if possible.

Document weight management for KerasCV

This includes:

default values
code locations
GCS upload
how to reproduce weights
where to store reproduction scripts

GridMask Augmentation

Paper: GridMask Data Augmentation
Citation: ~70
Code: https://github.com/google/automl/blob/master/efficientdet/aug/gridmask.py

Demo:

@LukeWood Is it possible to add a Discussion tab, like.

Document "Auxiliary Tasks"

These are not the focus of KerasCV, but are in scope of creating state of the art models for one of the focus tasks.

A great example of this is model visualization

Create nightly PyPi release

The nightly release will be published every commit to master via a GitHub action

Efficientnet block/layer

Make internal utilities private

We don’t want people depending on, for example, fill_utils. This is intended for internal use only

ResNet-RS block/layer

Standardized / preferred way to implement blocks and models.

Given the models requirements are being gathered in the discussion, is there a preferred way to implement them?

There are multiple ways, to implement blocks and models:

keras.applications way -> Models and blocks are functional Example.
Model Garden way -> Model is functional, but blocks are layer subclasses. (even though the model is a direct subclass, it does not override call method).
Blocks are layer subclasses, and models are model subclasses of keras.layers.Layer and keras.Model respectively. Both implement call method.

Each way has it's own benefits and drawbacks. Is one of the above preferred? Or maybe something entirely different?

RandomErasing preprocessing layer

Write up style guide

CutMix augmentation for Object Detection task.

[Reposting from here as a future reference for potential contributor].

CutMix - Paper - Cited by ~ 865
MixUp - Paper - Cited by ~ 2675

Currently, this aug can be applied to classification tasks but as the kerascv sets the target to general vision task, that's why to support vision tasks like object detection, we may need to add utility to accept the bbox_params argument too.

example of mixup-object-detection - region=full-images

example of mixup-object-detection - region=random

example of cutmix-object-detection

ref: https://www.kaggle.com/ankursingh12/data-augmentation-for-object-detection
ref: https://www.kaggle.com/shonenkov/oof-evaluation-mixup-efficientdet

Create a CONTRIBUTING.MD guide

From a PR review:

Do we want to have a paper citation treshold or any other evalustion metric?

We want an adoption metric of some kind, and a citation threshold (~50) is a convenient metric.

What about the component maintainership policiy? Can we scale over the community maintainership/ownership of the components?

We'll try. Someone who contributes a component will be called to fix issues with it if any arise.

I think also that we could not go to only accumulate components over time how we are going to evaluate and handle deprecations?

Deprecation decisions are always the result of cost/benefits analysis. The picture varies from component to component and over the lifetime of the repo. It's a case by case basis.

I suppose that we could partially use the README.MD for this and partially CONTRIBUTING.MD

Yes, this is mostly information that should go in the contributors' guide.

ResNeXT block/layer

SLIC Layer: Superpixel

Image Augment layer using SLIC
https://ieeexplore.ieee.org/document/6205760 cited by 7880
Implementation in skimage https://github.com/scikit-image/scikit-image/blob/v0.19.0/skimage/segmentation/slic_superpixels.py#L110-L385

Sharpen images by using a unsharp mask or something better I am unaware of.

TF Addons CV components

Are you interested to cover or migrate some of the CV components in the TF Addons namespace?

I suppose the starting point could be to review tf.addons.image as we are already duplicated in this repo things like coutout/randomcoutout etc..:
https://www.tensorflow.org/addons/api_docs/python/tfa/image

P.s. This is a parallel ticket of keras-team/keras-nlp#11

AugMix image augmentation

Paper: https://arxiv.org/abs/1912.02781 Cited by 308
Code:

src 1: https://github.com/google-research/augmix
src 2: https://github.com/AakashKumarNain/AugMix_TF2

Create documentation for KerasCV API

We can likely autogenerate this

GaussianBlur preprocessing layer

Channel/Spatial/Element wise Attention Modules

In the timm package, it provides some soft attention modules to building network blocks and I think it's a good fit here, for example:

and many others.

ConvNeXt Architecture

Sayak is working on this

fill_utils tests

This module came about due to some shared functionality that was originally tested within the layers. As long as this is shared, it would be good to unit test it.

SaltAndPepper preprocessing layer

Explore model visualization tools

e.x.

https://tf-explain.readthedocs.io/en/latest/

Random Erasing as a KPL

Random Erasing [1] is another essential augmentation transform for controlling the amount of regularization when training models on ImageNet. It's also used by the recent line of papers [2, 3]. Short explanation: https://paperswithcode.com/method/random-erasing.

[1] Random Erasing: https://arxiv.org/abs/1708.04896
[2] ResNet Strikes Back: https://arxiv.org/abs/2110.00476
[3] A ConvNet for the 2020s: https://arxiv.org/abs/2201.03545

COCOMeanAveragePrecision metric

Look into some quickstart guides

for example:
"Looking to preprocess image data for classification? Try this CutMix + RandAugment pipeline"

Purpose, scope and road-map of this repository

I have explained my questions in more detail here. I would appreciate if you could answer them for this repository (i.e. keras-cv) as well.

Including All Components to Train State of the Art Imagenet-1k models

https://pytorch.org/blog/how-to-train-state-of-the-art-models-using-torchvision-latest-primitives

Two papers to reference:

ResNet-RS
ResNet Strikes Back

Implement profiling for COCORecall

Understanding which for loops are costing the most will guide the optimization/vectorization effort. Profiling can be done using the TensorFlow profiler:

https://www.tensorflow.org/guide/profiler

RandomCutMix, RandomMixUp augmentation

These two well-known and very effective augmentation methods are widely used among ML practitioners. In keras, we can find some general augmentation . And these two augmentation methods can be included as advanced augmentation layers.

CutMix - Paper - Cited by ~ 865
MixUp - Paper - Cited by ~ 2675

Let's add these to
keras_cv.layers.RandomCutMix and keras_cv.layers.RandomMixUp.

COCORecall should support XLA compilation

Current error when running in mirrored mode:

ValueError: SyncOnReadVariable does not support assign_add in cross-replica context when aggregation is set to tf.VariableAggregation.SUM.

If we can compile to XLA we can also run on TPUs.

ResNet18

Note: Owen is already working on this.