iondv / tensorflow-dataset Goto Github PK

IONDV application for forming and marking up datasets and dataset based checking of the results of differently set tensorflow models training

License: Apache License 2.0

JavaScript 91.20% Shell 0.79% HTML 6.10% CSS 1.90%

dataset iondv iondv-app tensorflow tensorflow-models

tensorflow-dataset's Introduction

Эта страница на Русском

IONDV. Tensorflow dataset app

IONDV. Tensorflow dataset app - is an application based on the IONDV. Framework (code repository) to accumulate data, normalize and mark up images, create, train and compare Tensor Flow models without programming for standard functionality. It’s also possible to fully customize the logic in the form of model modifications and dataset processing using the Node.js code development.

The application uses the Fashion-MNIST dataset as an example for data import.

IONDV. Framework in brief

IONDV. Framework is a node.js open source application that implements the functionality of a digital tool platform for rapid development of web applications and micro-services based on metadata and can be extended with modules. The main purpose of the complex of solutions is to speed up the development of accounting web applications (ERP) using low-code technology. This platform consists of the following open-source components: the IONDV. Framework, the modules and the ready-made applications expanding its functionality, as well as the Studio (repository) open source visual development environment to create metadata of an application. The UML-scheme modeled applications can be launched in 80 seconds.

For more details, see IONDV. Framework site.
Documentation is available at Github repository.

Demo

Watch a brief video about creating and marking up a dataset, creating a neural network model, teaching a model and verifying the quality of recognition - all without a single line of code - https://www.youtube.com/watch?v=529TwrJoEKQ

Demo access to the system without registration: https://tensorflow-dataset.iondv.com. User login demo, password ion-demo. Demo mode has learning capabilities restricted to 30 patterns per run and dataset imports limited to 50 objects per run. You can build the app locally at your computer and use it without restrictions. Read the instruction below.

Description of features

Dataset is formed as system objects that can be created, loaded, deleted, or edited.

For each object, it is possible to determine whether it is included in the training sample or in the sample for verification.

The system allows creating models and parameterizing them, as well as creating various trained models on the current dataset.

Trained models can be downloaded in tensorflow format.

For trained models, reports can be generated to compare the recognition quality.

TODO

How to use the demo

Fill the database with objects for training

The first thing to do is to fill the database with objects for training. The easiest way is to import ready-made objects. To do this, go to the datasets navigation tab and open the fashion-mnist set. This set should contain all import settings and two files attached:

2k_train. csv for network training (consisting of a sample of 2000 elements from fashion-mnist)
300_test.csv for training verification (300 elements from fashion-mnist). To import, click the Import data button in the object window. The import takes place in the background window. It takes about 30-40 seconds for all 2300 objects to be imported (you can check it at the Object navigation tab).

Train the model

The next step is to train the model. The Model tab should contain a ready-made model from the tensorflow review from IBM. You can make your own, but since the metadata for the model isn’t yet perfectly formed, it is better to use this one for the demo.

Having decided on the model, go to the Model snapshot tab. This tab is for snapshots of the model States and there should be one created in advance, you need to open it. The snapshot contains a link to the model and sets all 10 types of objects from fashion-mnist. If you only need to train for certain types, you can edit the list. Next, you need to compile the model in the snapshot editing window, click Compile. After that, the files of the compiled model should appear in the Model file and Weights file fields.

At this stage, you can start training by clicking the Teach button. The model is trained on data in parts, each part goes through 10 epochs. Part sizes and number of epochs are set in IMPORT_BATCH_SIZE, TRAINING_BATCH_SIZE and EPOCHS. The training and verification logs are soon also to be developed.

Object recognition

When the model is finished training, it can be used for object recognition. For this, a new recognition object is created in the objectPrediction tab. You need to attach a snapshot of the model in the Model snapshot field. In the Object field, a new object is created (or an existing one is attached). Then fill out a in the Object field. It can be any name. Then you need to attach a picture in the image or normalized image fields. If the picture is not normalized before loading, then it should be in image. You can set crop settings in the crop settings field. After the name and picture have been set, the object must be saved (without closing).

After saving, click the Verify button in the upper-right corner of the screen to normalize the image and check for its uniqueness. If the verification is passed, the object will be added to the set. After that it can be used it in training. For this you need to set the type in the Type field and specify Learn or Check in the State field depending on whether the model will be trained or verified with this image.

When everything is ready, close the object, return to object Prediction and click save.

After that, the "Process with tensorflow" action will become available. When click this button, the network will try to recognize the object based on what it has learned. In the "Logs" field it will be written for each type how likely it is that the object belongs to this type. The "Prediction" field will display the probability with which the network is confident in the decision, and the "recognized type" field shows the type that the object was eventually assigned to.

If the results are unsatisfactory, the network needs more examples for training. You can download the entire fashion-mnist dataset, which contains 61,000 elements. You can find the instructions in prebuild/readme.txt, paragraph 1. The sets will be loaded to a subfolder. You need to attach them to the dataset object and import and retrain the model after that.

Loading and recognizing an object

Create an instance of the object class, upload the image to image field, or, if the image is already normalized, upload it to normalizedImage. After loading, follow the workflow using the Verify button. If necessary, the image will be normalized or checked for its uniqueness in the set. If everything is successful, the Verified field will be ticked. After that, the object can be recognized.

Importing data from a ready-made set

In the Dataset class specify the data source and click the object editing button at the top of the window. In demo mode (NODE_ENV = demo) 50 random patterns are loaded (or less, if the random selection fell on a pattern that is already in the set).

the amount is regulated in lib/util/importFromDataset.js by the DEMO_IMPORT_LIMIT constant.

Description of metadata

See metadata structure at the picture

The functions of the main classes:

object – dataset data object – contains an original image, a normalized image, the classification type specified by operator, and a link to the prediction results.
object prediction – a class that links the data object to the training result – contains a link with training result and the object, prediction percentage, type, and logs.
learning result – contains the date of creation and editing, trained model file, and logs.
models – contains information about the model type, compilation parameters, and a collection of related layers.
layers – model layer – contains information about the layer name, activation, content, and others, depending on the specified layer.
dataset – contains information about the name, type of source, source of training and testing, and type of marked label.

Building the application

Building this application on linux may require the g++ to be installed (to build tfjs-node). Usually it can be found in the OS' package tree by itself or contained in a basic development bundle like build-essential on Ubuntu.

Ubuntu g++ installation example:

apt install g++

apt install build-essential

Configuring the application

Maximum file size.

Allows to set the maximum size of uploaded files. The configuration is performed for the "File" type attribute on the view form.

Example:

"options": {
    "maxSize": 256000000
}

The size is indicated in KB.

Allowed file types.

Allows to specify valid extensions for uploaded files. The configuration is performed for the "allowedFileTypes" property of the "File collection" attribute on the class form.

Example:

"allowedFileTypes":  ["csv", "zip"]

Licence Contact us Russian

tensorflow-dataset's People

Contributors

Stargazers

Watchers

tensorflow-dataset's Issues

Упростить работу с обученной моделью

1 rest сервис куда можно новую картинку кинуть и он в ответ отдатся ее классификацию и вероятность - т.е. создать новый объект классификации, если картинки нет - картинку создаст.
2 формочку на портале - с тем же самым. т.е. загрузить картинку, и все что в рест

Надо реализовать выбор активной модели, если ещё не сделали

Реализовать сборщик картинок с яндекса, гугла и бинга, код MODAIB-17

В тензор-флоу нужно реализовать:

класс задача сбора картинок. Метаданные:
- выпадающий список яндекс,гугул, бинг.
- поисковый текст
- десятичное поле - кол-во первых скачиваемых картинок.
- поле параметров поиска. Пока реад-онли все время, т.е. не реализуем функцию.
нужно доработать класс объектов, если там еще нет - добавить хешфункцию уникальности картинки. Т.е. чтобы повторные не закачивать.

Делаем на объекте бизнес-процесс. Запустить сборку. Процесс одношаговый., назад не предусматривать. Объект на шаге собрано - все поля заблокированы для реадактирования.

При запуске БП соответственно - запускаем модель парсера через пупитер - модель берем из фрахтов https://github.com/iondv/freight-quote . Засовываем туда пока два парсера, яндекса и гугла картинки. Запускаем поиск, скачиваем первые кол-во картинок, которое указано. Создаем объекты для обучения.

Делаем отсечку. Первую при проверке на режим NODE_ENV=demo - 10 картинок.
даваемое в деплой, по умолчанию 100 картинок.

Реализовать второй тип обучения тензорфлоу - на детекцию Код MODAIB-18

Сейчас в проекте реализован только тип на классификацию. Та же разметка (выделение квадратом для кропирования) которую мы делаем - они могут использоваться для детекции. Там чуть сложнее - я дам модели. Но можно в гугле поискать.

Суть что этот квадрат это разметка изображения для детекции. Изображение имеет смысл нормировать под типовые размеры 640x480 или 320x320 - номируем по длинной стороне, оставшее заполняем черным.Нормирование для детекции делаем на основе параметра нейронной сети для детекции - т.е. там выпаающий список. Существущее нормирование по размерам изображение перезатираем.

Тип нейронной сети - нужно текущий класс тиа - доработать - создать родительски класс и наследников. Наследников соответственно два:

классифицирующая модель
детектирующая модель.

Что еще сразу нужно доработать. Надо посмотреть - как стурктура модели нейроннной сети сейчас задается. Надо сделать так, чтобы можно было эти модели подключать - т.е. выибрать из существующей или создавать новую. Т.е. возможно сделать слой абстракции к сети и ссылку соответственно на модель.а обученный результат уже хранится в здесь.

В реадми добавить описание

Второй этап доработок

при импорте из набора случайным образом надо гарантировать импорт минимум 50 элементов - не понятно как быть с оптимизацией, потому как возможно очень большие куски бд перебирать придется
проверять количество объектов каждого типа и обучать модель типу только когда количество объектов для него больше минимального (параметр) - стоит ли, с учетом того, что модель потом можно дообучать; вторая проблема - от типов зависит количество выходов модели, если ему просто не обучать, то синапсы все равно останутся и могут быть с ненулевыми весами сделано - в пакете для обучения должно быть примерно равное количество объектов каждого типа
загрузка логов в поле объекта сделано
при загрузке файлов модели в объект вручную через веб форму, если уже был загружен файл с таким же именем, то сервер добавляет к его имени цифру и ломается все взаимодействие с моделью, так как имена жестко прописываются в json. Постоянно следить за именами самостоятельно будет проблематично.
? еще наверно надо доработать метаданные составляющих модели (слоев, параметров и тд), они сейчас в большинстве своем текстом задаются?
обучение одному типу по принципу "принадлежит"/"не принадлежит"
модель не обучается на наборах из больше, чем 31к элементов (точнее после определенного момента результаты обучения полностью сходят на нет и сеть определяет все объекты в один класс, причем точность тем не менее отображается как 1.0, подозрение на импорт объектов из бд либо на их подготовку после получения - получение изображения, преобразование в тензор и т.д.)
ограничить количество элементов в обучении в демо режиме сделано

Параметризация модели tensorflow

При формировании сети tensorflow код выглядит так

// Define the model architecture
const buildModel = function () {
  const model = tf.sequential();
  // add the model layers
  model.add(tf.layers.conv2d({
    inputShape: [imageWidth, imageHeight, imageChannels],
    filters: 8,
    kernelSize: 5,
    padding: 'same',
    activation: 'relu'
  }));
  model.add(tf.layers.maxPooling2d({
    poolSize: 2,
    strides: 2
  }));
  model.add(tf.layers.conv2d({
    filters: 16,
    kernelSize: 5,
    padding: 'same',
    activation: 'relu'
  }));
  model.add(tf.layers.maxPooling2d({
    poolSize: 3,
    strides: 3
  }));
  model.add(tf.layers.flatten());
  model.add(tf.layers.dense({
    units: numOfClasses,
    activation: 'softmax'
  }));
  // compile the model
  model.compile({
    optimizer: 'adam',
    loss: 'categoricalCrossentropy',
    metrics: ['accuracy']
  });
  return model;
}

Т.е. параметризируем:

тип модели - из примера это sequential - предзаданный справончик из значений библиотеки tensorflow
параметры компиляции модели

    optimizer: 'adam',
    loss: 'categoricalCrossentropy',
    metrics: ['accuracy']

параметры слоев - каждый слой - это объек в коллекции параметров. Для каждого слоя предлагаетс сделать отдельный наследник от базового класса слоя и в нём уже разные параметры моделей. Т.е. есть класс baseLayer, у него наследники.

Наследники класса baseLayer

conv2d для него параметры

   inputShape: [imageWidth, imageHeight, imageChannels],
    filters: 8,
    kernelSize: 5,
    padding: 'same',
    activation: 'relu'

maxPooling2d параметры

 poolSize: 2,
    strides: 2

и т.д.

Реализовать второй тип обучения тензорфлоу - на детекцию. Код MODAIB-18

классифицирующая модель
детектирующая модель.

В реадми добавить описание

Доработка параметров обучения

В создаваемом объекте обучения нужно доработать модель источников обучения.
Т.е. в класс добавляем типы, по которым обучаем модель.

Если коллекция пустая - обучаем на всех типах, которые есть.
Если не пустая - только на выбранных.

Также нужно добавить вариант обучение модели на одном типе, а остальные использовать как тип с кодом "none" - т.е. не является обучаемым типом. Для этого предлагает модель доработать двумя атрибутами - "обучение на один тип" и при его выборе показывать поле ссылку на тип, а коллекцию скрывать. Все остальные типы используем для обучения "none" в случайном порядке.

Этот тип обучения нужен - для мультиагентной сети - когда мы обучаем на распознавание только одного признака. Тогда можно классифицировать картинку на разные признаки, условно и футболку и брюки и кроссовки. Если это не решать задачей детекции

Развитие приложения fasion-goods

После осмысления первой итерации в iondv/fashion-goods#3 решили доводить приложение в более универсальную сторону формирования датасет и обучения моделей. Т.е. из распознавания

в формирование набора данных классифицированного.

Соответственно доработка метаданных:

goods - добавляем атрибут чем является загруженный объект - частью датасета для обучения, для проверки или распознаваемым товаром - выпадающий список или логические атрибуты.
добавялем к полю исходного ихображения поле с нормированным изображением
добавляем к объекту действие - нормирование - берет исходное изображение и сохраняет нормированное, проставляет логический атрибут возможности обучения, если параметры соответствуют, если нет значение false (значение по умолчанию, оператором изменить нельзя)
К полю загружаемого объекта подключаем простейший редактор - для кадрирования изображения - ОТДЕЛЬНАЯ ЗАДАЧА.
В параметрах деплой задаем размеры картинок к которым нормируем.
Создаем класс - обучение. В этом классе кнопку - обучить модель - соответственно на основе данных для обучения проводится обучение модели. Обученная модель сохраняется в объекте класса! У класса атрибуты - параметры тенсозор флоу - т.е. их берем не с деплой а с модели.
Модифицируем кнопку TensorFlow Process - распознавание на основе ссылки на модель обучения - т.е. уезжает в класс связку.
8 Создаем коллекцию обучений - промежуточный класс - связывающий товар и выбранную модель для обучения. В нём делаем кнопку распознать.

Делаем в отдельном репозитории