Git Product home page Git Product logo

p-o-entrepreneurship-team-a-code's Introduction

P&O Entrepreneurship - Team A - Virtual Company Assistant (code) a.k.a Cluster

About this project

This is the code base repository of our bachelor's thesis project.

Modules

The chatbot and its helper services depend on (and therefore all communicate with) the Cluster Connector Server, to which a connection is established using the cluster-connector (Python connector) or ClusterClient (C#/.NET Core connector) libraries. All connector related code can be found in the Cluster Connector repository. Another part of Cluster is the Cluster Moderator, a tool to be used by someone who moderates the questions and answers provided to the chatbot by the user.

Documentation

Documentation can be found at Clusterdocs.

p-o-entrepreneurship-team-a-code's People

Contributors

yvesdhondt avatar willemcossey avatar vrolixthomas avatar louiscallens avatar heckej avatar robinparmentier avatar martijnvandoorennajz avatar dependabot[bot] avatar

Watchers

 avatar

Forkers

yvesdhondt

p-o-entrepreneurship-team-a-code's Issues

Implement server-chatbot communication protocol

Because the communication protocol for server-chatbot communication hadn't been decided entirely, it hasn't been implemented correctly in the ClusterClient message models.
Now the protocol is kind of clear and will be implemented as described on the wiki.

Question about `ProcessNLPMatchQuestionsResponse`

        /// <summary>
        /// Logica: krijgt MatchedQuestionsModel binnen
        /// en beslist of er een goede match is + het antwoord op deze vraag
        /// 
        /// </summary>
        /// <param name="matchQuestionModels"></param>
        /// <returns>
        /// 
        /// Resultaat bij Bernd is dan
        /// 1) Er is een goede match
        /// 2) We moeten verderZoeken
        /// 3) Er is geen match en vragen zijn op
        /// 
        /// Ik laat aan jullie om te beslissen hoe dit 1 model / meerdere modellen eruit zullen zien
        /// return model; aan het functie is het enige dat ik nodig heb :)
        /// 
        /// </returns>
        public static Object ProcessNLPMatchQuestionsResponse(List<MatchQuestionModelResponse> matchQuestionModels)
        {

            return null;
        }

As you can see above, the ProcessNLPMatchQuestionsResponse gets a list of MatchQuestionModelResponses as input. Given the definition of the MatchQuestionModelResponse:

    [Serializable]
    public class MatchQuestionModelResponse : BaseModel
    {
        private int _question_id = -1;
        private MatchQuestionModelInfo[] _possible_matches = null;
        private int _msg_id = -1;

        public int question_id { get => _question_id; set => _question_id = value; }
        public MatchQuestionModelInfo[] possible_matches { get => _possible_matches; set => _possible_matches = value; }
        public int msg_id { get => _msg_id; set => _msg_id = value; }

        public bool IsComplete()
        {
            return possible_matches != null && _question_id != -1 && _msg_id != -1;
        }
    }

shouldn't the task of ProcessNLPMatchQuestionsResponse just be to check ONE MatchQuestionModelResponse for a match in its MatchQuestionModelInfos instead of checking a list of MatchQuestionModelResponses? So basically:

        public static Object ProcessNLPMatchQuestionsResponse(MatchQuestionModelResponse matchQuestionModel)
        {

            return null;
        }

NLP offensiveness Logic processing

similar to #44

Handle what to do with received offensive models.

  1. Create logic models for (plain C# won't be send as JSON.. yet?)
  • Representing a sentence that is offensive
  • Representing a sentence that is non-offensive
  1. Discuss with chatbot and moderator what to do with both offensive and non-offensive sentences.
    Moderator, manual review along with the user-id for example?
    Chatbot, notification to user with warning?

First aid to questions on architecture

Whenever you are wondering to which component some (new) feature/functionality should belong, you can follow these rules of thumb and start a discussion below in case of uncertainties:

The feature

  • implements a decision making process -> logic
  • implements (part of) the communication protocol between the server and modules/servicies (chatbot, NLP ...):
    • Server side -> cluster connector, cluster api
    • Client side in Python -> cluster-connector (a.k.a Python connector, Python api)
    • Client side in C# -> ClusterClient (a.k.a C# connector)
  • implements part of the dialog with a user -> chatbot
  • provides some 'intelligent' functionality to analyse user input -> NLP tools
  • ...

Update `SendQuestion()` documentation

SendQuestion() currently throws a TimeoutException when no response was received from the server before a timeout occurred. However, as this is an async method, the exception cannot be catched by its caller. Therefore it's easiest at the moment to simply return null.

Support request-response flow to retrieve unanswered questions for user

The ClusterClient module currently expects the server to push questions that should be answered, i.e. it should send them whenever available. However, as in the last meeting it appeared to be more clear for the logic to handle this situation using a plain old request-response flow.
The way ClusterClient works now is described in this figure:
afbeelding

The way a request-response flow works as proposed is shown here:
afbeelding

Some extra notes:

  • In the first scenario the server can decide to wait for a bunch of questions to bundle them and send them all at once. Then only for every group of, say X, questions, a message should be sent in one way. If the buffer set in ClusterClient happens to be empty when a user wants to answer questions, it can still send a request to the server. ClusterClient could decide itself whether an extra request is needed, because it knows when it has requested last, so it can set a reasonable timeout between two 'hard' requests to give the server some time. Also it could keep track of the unanswered questions asked by users (which it already does, but those questions can still be answered by the NLP at that time), so it basically knows already whether there's any chance the server will ever send unanswered questions. However when the server sends an 'sorry, have to ask forum', ClusterClient will consider this to be the answer, unless this type of answer could be distinguished from real answers.
  • In the second scenario 2 messages are sent over the websocket connection for every user who wants to answer a question. This seems less efficient to me then the first scenario, though more comprehensible, perhaps.

ClusterLogic NLP Responses

being worked on
Fill out ClusterLogic > NLPHandler > ProcessNLPResponse

Response meaning, the Server receives the 'Response' of the NLP tool.

Example:

public static Object ProcessNLPMatchQuestionsResponse(List<MatchQuestionModelResponse> matchQuestionModels)
        {
            return null;
        }

With MatchQuestionModelResponse being:

    [Serializable]
    public class MatchQuestionModelResponse : BaseModel
    {
        private int _question_id = -1;
        private MatchQuestionModelInfo[] _possible_matches = null;
        private int _msg_id = -1;

        public int question_id { get => _question_id; set => _question_id = value; }
        public MatchQuestionModelInfo[] possible_matches { get => _possible_matches; set => _possible_matches = value; }
        public int msg_id { get => _msg_id; set => _msg_id = value; }

        public bool IsComplete()
        {
            return possible_matches != null && _question_id != -1 && _msg_id != -1;
        }
    }

    [Serializable]
    public class MatchQuestionModelInfo : BaseModel
    {
        
        private int _question_id = -1;
        private float _prob = -1;

        public int question_id { get => _question_id; set => _question_id = value; }
        public float prob { get => _prob; set => _prob = value; }

        public bool IsComplete()
        {
            return _question_id != -1 && prob != -1;
        }
    }

A C# representation of the NLP response returning possible related and matching questions

'matchQuestionModels' is a list containing all models the server has received. Please process these models based on their data. Example:
The above represents a model in which the server receives a list of questions and probabilities to represent similarity with a given question. The logic SHOULD decide which question can be used as a similar question to the given one.

TODO:

  1. prepare and return a model representing that either:
  • Represents a similar question to the given one
  • Represent a model in which no matching question could be found
  1. what to do with this model (decide on paper first)
    --> Discuss with chatbot and Forum, questions without similarities will probably be send to the forum. Questions with similarities will probably be processed by the server and answers to the highest matching question will be send to the chatbot to be send to the user.

Improve security communication NLP/chatbot - Cluster Connector

Currently connections to the Cluster API server can (and should) be done over HTTPS (SSL), which is encrypts the information sent over the connection. However, there's is no need to authenticate to use the API, so no matter the strength of SSL encryption, we currently don't control who is using the API. Proposed solutions contain:

  • extending the message protocol to use a hash of the message and some secret, which is known by the client (NLP/chatbot) and server only
  • using some kind of RSA signing (in which case the server would keep a list of trusted public keys)
  • only allowing msg_ids that are expected (i.e. which have been generated by the server and not yet sent by the client)
  • HTTP header authentication
  • ...

More efficient implementation get_next_request()

get_next_request() currently doesn't ask the server for new tasks when there are still tasks in the task list (hidden variable). A possibility to improve the efficiency of get_next_request() is to return a pending task when it is immediately available in the task list and ask the server for tasks in a seperate thread.

Blacklist support

The NLP tools need to retrieve a blacklist containing offensive words from the server. This list changes over time, so it cannot store the list permanently.
A few possiblities are the following:

  • the server sends the list along with match questions request every time
  • the NLP connector sends a request to the server to check whether it has the most up to date blacklist and if it doesn't, the server sends the blacklist
  • the server keeps a hash of the date when the blacklist was updated for the last time and sends this hash along with every match questions request. The NLP connector saves this hash and compares it with every request. When the stored hash differs from the one sent by the server, the NLP connector sends a request to the server asking for the updated blacklist and caches the response along with the newest hash. Instead of a hash, the date itself could be used or just an incremented number of course.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.