Git Product home page Git Product logo

tfpackagetext's Introduction

TFpackageText

Table of Contents

Prerequisites

  • python >= 3.7
  • tensorflow == 1.15
  • fastapi == 0.75.2

Structure

.
├── Dockerfile                              # API service on containers
├── Jenkinsfile                             # Jenkins file used for automation
├── README.md
├── requirements.txt                        # required libraries
├── startup_api.sh                          # start API service
├── ..........
└── textMG                                  # source directory
    ├── APIs                                # API codes
    │   ├── api_logger                      # API logger
    │   ├── base_models                     # pydantic BaseModel
    │   ├── db                              # SQLAlchemy table definition
    │   ├── html                            # html files
    │   └── routers                         # API routers
    │       ├── data_generator.py
    │       ├── health_check.py
    │       ├── tensorflow_service.py       # all training/eval/pred/refer APIs 
    │       ├── users.py                    # user management APIs
    │       ├── users_token.py              # get secret keys db and token releated components
    │       └── .....
    ├── configs                             # args config files
    │   ├── config.py
    │   └── config_multiGPU.py
    ├── datasets                            # for tools used for data loading&processing
    │   ├── data_loader.py                  
    │   ├── dataset.py                      # tools for data manipulating
    │   └── generator.py                    # generator for data batch 
    ├── embeddings                          # word2vec model embeddings 
    ├── logs                                # logs file
    ├── models                              # deep learning models
    ├── test                                # testing for each components
    ├── tf_serving                          # tensorflow serving grpc and restful
    ├── utils                               # utils tools
    ├── main_multiGPU.py                    # main entrypoint for model training/eval/pred
    ├── api_run.py                          # main entrypoint for API services 
    ├── startup.sh                          # another entrypoint for model training/eval/pred
    └── ..........

Getting Started

Installation

$ pip install -r requirements.txt

or use it as a library

python setup.py sdist
pip install TFpackageText-0.0.1.tar.gz

Usage

from textMG import models, datasets, APIs

Model Training evaluating prediction

It is initially trained on multi-GPU environment

$ ./starup.sh -m [train|evel|pred]

or via APIs:

http://localhost:5000/api/v1/models/train
http://localhost:5000/api/v1/models/eval
http://localhost:5000/api/v1/models/pred
http://localhost:5000/api/v1/models/infer
http://localhost:5000/api/v1/models/pro_cons_infer

Rewrite pretrained model interface

original source code is trained via tf.estimater which is hard to debug and manipulate transformer models. The new interface will be able to run purely with session.

class Bert_module:
    def __init__(self, *args, **kwargs):
        self.bert = BertModel
        self.kwargs = kwargs

    def __call__(self, num_labels, max_len, hidden_size, reuse=tf.AUTO_REUSE, is_training=True, dropout=0.3, *args, **kwargs):
        with tf.variable_scope('bert_classification', reuse=reuse):
            output_weights = tf.get_variable(
                "output_weights", [num_labels, hidden_size],
                initializer=tf.truncated_normal_initializer(stddev=0.02))

set 'is_pretrained' to be True in order to use pretrained model as embeddings parser.add_argument('--is_pretrained', type=bool, default=True)

API calls

  1. Start API:
$ python api_run.py

SwaggerUI

or start API in a container

$ docker build -t api_docker .
$ docker run -t -p 80:5000 -v /home/projects/TFpackageText/textMG:/home/projects --name=api_docker_service api_docker

or run api on production

$ gunicorn api_run:app --workers 2 --worker-class uvicorn.workers.UvicornWorker --bind 0.0.0.0:5000 --log-level debug
  1. Authentication:
generate a secret key and save it:
$ openssl rand -hex 32

generate hashed password:
$ from passlib.context import CryptContext
$ hashed_password = CryptContext(schemes=["bcrypt"], deprecated="auto").hash('admin')

create an admin user:
$ insert into users_v(id, first_name,last_name,email,hashed_password) values(1, 'admin', 'v', '[email protected]', '$hashed_password')

add secret key, algothrim .. into db:
$ insert into token_users_v(SECRET_KEY, ALGORITHM, ACCESS_TOKEN_EXPIRE_MINUTES, uid) values('ec039bcb8eada04c0de3cea9ced7440b949f6e2dadafd61ae1f8b687c214b4a9', 'HS256', 30, 1);

authorize yourself with below username/passwd on Swagger docs: http://localhost:5000/docs:
$ username: admin
$ password: admin

generate JWT tokens used in "-H 'Authorization: Bearer JWT tokens'":
$ curl -X 'POST' \
  'http://localhost:5000/token' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/x-www-form-urlencoded' \
  -d 'grant_type=&username=admin&password=admin&scope=&client_id=&client_secret='

Swagger docs

http://localhost:5000/docs

Data Pipeline

data can be loaded directly

# load data from disk and generator.
generator = Generator()
iter = generator.get_next_patch(batch=2)
el = next(iter)

or via tf.data.Dataset.from_generator

train_dataset = tf.data.Dataset.from_generator(generator.get_next_patch,
                                               output_types={generator.input: tf.float32,
                                                             generator.output: tf.float32})
train_dataset = train_dataset.repeat().batch(args.batch_size * args.num_gpusORcpus).prefetch(1)

or via api

curl -X 'GET' \
  'http://localhost:5000/api/models/batch_load/2' \
  -H 'accept: application/json'
  -H 'Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJ2aWM3IiwiZXhwIjoxNjU0NzA2NDk0fQ.VPNdR2Ow0vwl4Yx3JYmErSdCZP6G8RufSUAk1lavR2M'

Tensorflow serving

after training, the model for tf servering is saved to different path for restful and grpc.

# call build_SavedModel to build a SavedModel for tensor serving
build_SavedModel(conv_net, args.export_path_serving, args.savedmodel_version, X_batch_numGPU, best_sess)
$ docker pull tensorflow-serving:1.15.0
$ nohup docker run --name container_serving -p 8501:8501 -p 8500:8500 \
    ........
    --batching_parameters_file=/models/batch.config >  /home/projects/tf1_serving_log.out 2>&1 &

call api for reference via tfserving

curl -X 'POST' \
  'http://localhost:5000/api/v1/models/infer' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJ2aWM3IiwiZXhwIjoxNjU0NzA2NDk0fQ.VPNdR2Ow0vwl4Yx3JYmErSdCZP6G8RufSUAk1lavR2M' \
  -d '{
  "batch_infer": [
    "string"
  ]
}'

or call reference api in producer/consumer model

curl -X 'POST' \
  'http://localhost:5000/api/v1/models/pro_cons_infer' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJ2aWM3IiwiZXhwIjoxNjU0NzA2NDk0fQ.VPNdR2Ow0vwl4Yx3JYmErSdCZP6G8RufSUAk1lavR2M' \
  -d '{
  "batch_infer": [
    "string"
  ]
}'

Project Automation

edit Jenkinsfile to automate the process of training/evaluating/referring

node("jekins-agent1"){
    stage("Git clone"){
        sh 'cd /opt/jenkins/workspace/  && rm -rf TFpackegeText'
        sh 'git clone [email protected]:victorbai2/TFpackageText.git && tar -czvf package.tar.gz'
    }
    .......

Examples

predict:

curl -X 'POST' \
  'http://localhost:5000/api/v1/models/pred' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJ2aWM3IiwiZXhwIjoxNjU0NzA2NDk0fQ.VPNdR2Ow0vwl4Yx3JYmErSdCZP6G8RufSUAk1lavR2M' \
  -d '{
  "inquiry": [
    "在北京举办...", "三点不足导致球队...."...
  ]
}'

Response Body:

{
  "data": {
    "records": {
      "0": {
        "category": "entertainment",
        "label": "1",
        "pred_logits": "[0.17890048027038574, 0.23429206013679504, 0.220640629529953, 0.17155364155769348, 0.1946132332086563]"
      },
      "1": {
        "category": "entertainment",
        "label": "1",
        "pred_logits": "[0.17201262712478638, 0.24676887691020966, 0.22940053045749664, 0.15629342198371887, 0.19552461802959442]"
      }
    },
    "time_used": "0.519162654876709"
  },
  "code": 200,
  "message": "success",
  "error": false
}

load data batch(batch_size=2):

curl -X 'GET' \
  'http://localhost:5000/api/v1/models/batch_load/2' \
  -H 'accept: application/json' \
  -H 'Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJ2aWM3IiwiZXhwIjoxNjU0NzA2NDk0fQ.VPNdR2Ow0vwl4Yx3JYmErSdCZP6G8RufSUAk1lavR2M' \
  
Response Body:
{
  "data": {
    "batch": {
      "x_input": [
        [701,101,..],
        [1842,2182,...]
      ],
      "y_output": [
        [0,0,0,0,1],
        [0,0,0,0,1],
      ]
    },
    "input_shape": [2,30],
    "output_shape": 2
  },
  "code": 200,
  "message": "success",
  "error": false
}

Simple pressure test

$ ab -p tensorAPI.json  -T 'application/json' -H 'Content-Type: application/json'  \
  -H 'Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJ2aWM3IiwiZXhwIjoxNjU0NzA2NDk0fQ.VPNdR2Ow0vwl4Yx3JYmErSdCZP6G8RufSUAk1lavR2M' \
  -c 500 -n 500 http://localhost:5000/api/v1/models/infer
$ ab -p tensorAPI.json  -T 'application/json' -H 'Content-Type: application/json'  \
  -H 'Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJ2aWM3IiwiZXhwIjoxNjU0NzA2NDk0fQ.VPNdR2Ow0vwl4Yx3JYmErSdCZP6G8RufSUAk1lavR2M' \
  -c 500 -n 500 http://localhost:5000/api/v1/models/pro_cons_infer

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

License

VB

tfpackagetext's People

Contributors

victorbai2 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

tfpackagetext's Issues

"value_error.jsondecode","ctx":{"msg":"Expecting property name enclosed in double quotes

got an error as below, but I verified the data format and it does not seem to have any problem.
curl -X 'POST' \ 'http://192.168.1.14:5000/api/pred' \ -H 'accept: application/json' \ -H 'Content-Type: application/json' \ -d '{[ "首届草根情中日乒乓球友谊赛于年在日本横滨举行完", "文体旅游局佳兆业文化弘金地体育承办至今已举办 五届完", "克沃格尔以比击败澳大利亚名将莫顿她是继和年后第三度赢得此项赛事冠军", "手术" ] }'

{"detail":[{"loc":["body",1],"msg":"Expecting property name enclosed in double quotes: line 1 column 2 (char 1)","type":"value_error.jsondecode","ctx":{"msg":"Expecting property name enclosed in double quotes","doc":"{[\n \"首届草根情中日乒乓球友谊赛于年在日本横滨举行完\", \"文体旅游局佳兆业文化弘金地体育承办至今已举办五届完\", \"克沃格尔以比击败澳大利亚名将莫顿她是继和年后第三度赢得此项赛事冠军\", \"手术\"\n ]\n}","pos":1,"lineno":1,"colno":2}}]}%

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.