Git Product home page Git Product logo

cocoa's Introduction

CoCoA (Collaborative Communicating Agents)

CoCoA is a dialogue framework written in Python, providing tools for data collection through a text-based chat interface and model development in PyTorch (largely based on OpenNMT).

This repo contains code for the following tasks:

  • MutualFriends: two agents, each with a private list of friends with multiple attributes (e.g. school, company), try to find their mutual friends through a conversation.
  • CraigslistBargain: a buyer and a seller negotiate the price of an item for sale on Craigslist.
  • DealOrNoDeal: two agents negotiate to split a group of items with different points among them. The items are books, hats and balls.

Papers:

Note: We have not fully integrated the MutualFriends task with the cocoa package. For now please refer to the mutualfriends branch for the ACL 2017 paper.


Installation

Dependencies: Python 2.7, PyTorch 0.4.1.

NOTE: MutualFriends still depends on Tensorflow 1.2 and uses different leanring modules. See details on the mutualfriends branch.

pip install -r requirements.txt
python setup.py develop

Main concepts/classes

Schema and scenarios

A dialogue is grounded in a scenario. A schema defines the structure of scenarios. For example, a simple scenario that specifies the dialogue topic is

Topic
Artificial Intelligence

and its schema (in JSON) is

{
    "attributes": [
        "value_type": "topic",
        "name": "Topic"
    ]
}

Systems and sessions

A dialogue agent is instantiated in a session which receives and sends messages. A system is used to create multiple sessions (that may run in parallel) of a specific agent type. For example, system = NeuralSystem(model) loads a trained model and system.new_session() is called to create a new session whenever a human user is available to chat.

Events and controllers

A dialogue controller takes two sessions and have them send/receive events until the task is finished or terminated. The most common event is message, which sends some text data. There are also task-related events, such as select in MutualFriends, which sends the selected item.

Examples and datasets

A dialogue is represented as an example which has a scenario, a series of events, and some metadata (e.g. example id). Examples can be read from / write to a JSON file in the following structure:

examples.json
|--[i]
|  |--"uuid": "<uuid>"
|  |--"scenario_uuid": "<uuid>"
|  |--"scenario": "{scenario dict}"
|  |--"agents": {0: "agent type", 1: "agent type"}
|  |--"outcome": {"reward": R}
|  |--"events"
|     |--[j]
|        |--"action": "action"
|        |--"data": "event data"
|        |--"agent": agent_id
|        |--"time": "event sent time"

A dataset reads in training and testing examples from JSON files.

Code organization

CoCoA is designed to be modular so that one can add their own task/modules easily. All tasks depend on the cocoa pacakge. See documentation in the task folder for task-specific details.

Data collection

We provide basic infrastructure (see cocoa.web) to set up a website that pairs two users or a user and a bot to chat in a given scenario.

Generate scenarios

The first step is to create a .json schema file and then (randomly) generate a set of scenarios that the dialogue will be situated in.

The website pairs a user with another user or a bot (if available). A dialogue scenario is displayed and the two agents can chat with each other. Users are then directed to a survey to rate their partners (optional). All dialogue events are logged in a SQL database.

Our server is built by Flask. The backend (cocoa/web/main/backend.py) contains code for pairing, logging, dialogue quality check. The frontend code is in task/web/templates.

To deploy the web server, run

cd <name-of-your-task>;
PYTHONPATH=. python web/chat_app.py --port <port> --config web/app_params.json --schema-path <path-to-schema> --scenarios-path <path-to-scenarios> --output <output-dir>
  • Data and log will be saved in <output-dir>. Important: note that this will delete everything in <output-dir> if it's not empty.
  • --num-scenarios: total number of scenarios to sample from. Each scenario will have num_HITs / num_scenarios chats. You can also specify ratios of number of chats for each system in the config file. Note that the final result will be an approximation of these numbers due to concurrent database calls.

To collect data from Amazon Mechanical Turk (AMT), workers should be directed to the link http://your-url:<port>/?mturk=1. ?mturk=1 makes sure that workers will receive a Mturk code at the end of the task to submit the HIT.

Dump data from the SQL database to a JSON file (see Examples and datasets for the JSON structure).

cd <name-of-your-task>;
PYTHONPATH=. python ../scripts/web/dump_db.py --db <output-dir>/chat_state.db --output <output-dir>/transcripts/transcripts.json --surveys <output-dir>/transcripts/surveys.json --schema <path-to-schema> --scenarios-path <path-to-scenarios> 

Render JSON transcript to HTML:

PYTHONPATH=. python ../scripts/visualize_transcripts.py --dialogue-transcripts <path-to-json-transcript> --html-output <path-to-output-html-file> --css-file ../chat_viewer/css/my.css

Other options for HTML visualization:

  • --survey-transcripts: path to survey.json if survey is enabled during data collection.
  • --survey-only: only visualize dialgoues with submitted surveys.
  • --summary: statistics of the dialogues.

Dialogue agents

To add an agent for a task, you need to implement a system <name-of-your-task>/systems/<agent-name>_system.py and a session <name-of-your-task>/sessions/<agent-name>_session.py.

Model training and testing

See documentation in the under each task (e.g., ./craigslistbargain).

Evaluation

To deploy bots to the web interface, add the "models" field in the website config file, e.g.

"models": {
    "rulebased": {
        "active": true,
        "type": "rulebased",
    }
}

See also set up the web server.

cocoa's People

Contributors

anushabala avatar derekchen14 avatar hhexiy avatar mihail911 avatar percyliang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cocoa's Issues

KeyError: 'post_id'

read_examples: data/train.json
Traceback (most recent call last):
File "parse_dialogue.py", line 60, in
examples = read_examples(args.transcripts, args.max_examples, Scenario)
File "/home/prateekagarwal/cocoa/cocoa/core/dataset.py", line 120, in read_examples
examples.append(Example.from_dict(raw, Scenario))
File "/home/prateekagarwal/cocoa/cocoa/core/dataset.py", line 29, in from_dict
scenario = Scenario.from_dict(None, raw['scenario'])
File "/home/prateekagarwal/cocoa/craigslistbargain/core/scenario.py", line 34, in from_dict
return Scenario(raw['uuid'], raw['post_id'], raw['category'], None, scenario_attributes, [KB.from_dict(scenario_attributes, kb) for kb in raw['kbs']])
KeyError: 'post_id'

Getting this error on running the step which parses training data.
I simply installed the requirements and then started running the steps mentioned under building the bot.

How to solve problems in Chat with the bot

I'm trying Chat with the bot in the command line interface, and web interface.
When then, I faced some problems.

In the command line interface, TypeError was occurred.
I can't solve this error in myself.
I'm describing later about this error in detail.

In the web interface, no error was occurred, but I can't understand
what to do after the contents described below are output.

What should I do in each section?

Preparation of inputting commands

Stored pre-trained models below:

craigslistbargain/
 ├ checkpoint/
 │ └ lf2lf/
 │   └ config.json
 │   └ model_best.pt
 ├ mappings/
 │ └ lf2lf/
 │   └ kb.glove.pt
 │   └ vocab.pkl
 ├ model.pkl
 ├ price_tracker.pkl
 ├ templates.pkl

Command line interface

input:

PYTHONPATH=. python ../scripts/generate_dataset.py --schema-path data/craigslist-schema.json --scenarios-path data/dev-scenarios.json --results-path bot-chat-transcripts.json --max-examples 20 --agents rulebased cmd --price-tracker price_tracker.pkl --agent-checkpoints checkpoint/lf2lf/model_best.pt "" --max-turns 20 --random-seed 1 --sample --temperature 0.2

output:

[nltk_data] Downloading package punkt to /home/mocchaso/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to
[nltk_data]     /home/mocchaso/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
Traceback (most recent call last):
  File "../scripts/generate_dataset.py", line 70, in <module>
    for name, model_path in zip(args.agents, args.agent_checkpoints)]
  File "/mnt/c/users/administrator/my_graduation_research/cocoa_src/craigslistbargain/systems/__init__.py", line 14, in get_system
    templates = Templates.from_pickle(args.templates)
  File "/mnt/c/users/administrator/my_graduation_research/cocoa_src/cocoa/model/generator.py", line 81, in from_pickle
    templates = read_pickle(path)
  File "/mnt/c/users/administrator/my_graduation_research/cocoa_src/cocoa/core/util.py", line 28, in read_pickle
    with open(path, 'rb') as fin:
TypeError: coercing to Unicode: need string or buffer, NoneType found

Web interface

input

PYTHONPATH=. python web/chat_app.py --port 5000 --config web/app_params.json --schema-path data/craigslist-schema.json --scenarios-path data/dev-scenarios.json --output web_output

output:

[nltk_data] Downloading package punkt to /home/mocchaso/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
You are using pip version 9.0.1, however version 18.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
You are using pip version 9.0.1, however version 18.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
1 systems loaded
human
App setup complete

Natural Language Generation for Dealornodeal

When I run self-play mode in the task of Dealornodeal, two agent will communicate in the dialogue act level. How can I turn those dialogue acts into natural language? Thank you!

TypeError: where() takes at most 2 arguments (3 given)

PYTHONPATH=. python src/main.py --schema-path data/schema.json --scenarios-path data/scenarios.json --train-examples-paths data/train.json --test-examples-paths data/dev.json --stop-words data/common_words.txt --min-epochs 10 --checkpoint checkpoint --rnn-type lstm --learning-rate 0.5 --optimizer adagrad --print-every 50 --model attn-copy-encdec --gpu 1 --rnn-size 100 --grad-clip 0 --num-items 12 --batch-size 32 --stats-file stats.json --entity-encoding-form type --entity-decoding-form type --node-embed-in-rnn-inputs --msg-aggregation max --word-embed-size 100 --node-embed-size 50 --entity-hist-len -1 --learned-utterance-decay
/ve_tf0.11_py2/venv/lib/python2.7/site-packages/fuzzywuzzy/fuzz.py:35: UserWarning: Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning
warnings.warn('Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning')
read_examples: data/train.json
read_examples: data/dev.json
Building lexicon...
Created lexicon: 522092 phrases mapping to 1314 entities, 3.291269 entities per phrase
Using rule-based lexicon...
3.96 s
test: 0 dialogues out of 0 examples
train: 7257 dialogues out of 8967 examples
dev: 878 dialogues out of 1083 examples
Vocabulary size: 8435
Traceback (most recent call last):
File "src/main.py", line 110, in
model = build_model(schema, mappings, model_args)
File "//cocoa/src/model/encdec.py", line 69, in build_model
model = GraphEncoderDecoder(encoder_word_embedder, decoder_word_embedder, graph_embedder, encoder, decoder, pad, select)
File "//cocoa/src/model/encdec.py", line 760, in init
super(GraphEncoderDecoder, self).init(encoder_word_embedder, decoder_word_embedder, encoder, decoder, pad, select, scope)
File "//cocoa/src/model/encdec.py", line 639, in init
self.build_model(encoder_word_embedder, decoder_word_embedder, encoder, decoder, scope)
File "//cocoa/src/model/encdec.py", line 659, in build_model
encoder.build_model(encoder_word_embedder, encoder_input_dict, time_major=False)
File "//cocoa/src/model/encdec.py", line 283, in build_model
super(GraphEncoder, self).build_model(word_embedder, input_dict, time_major=time_major, scope=scope)
File "//cocoa/src/model/encdec.py", line 193, in build_model
inputs = self._build_rnn_inputs(word_embedder, time_major)
File "//cocoa/src/model/encdec.py", line 267, in _build_rnn_inputs
word_embeddings = word_embedder.embed(self.inputs, zero_pad=True)
File "//cocoa/src/model/word_embedder.py", line 17, in embed
embeddings = tf.where(inputs == self.pad, tf.zeros_like(embeddings), embeddings)
TypeError: where() takes at most 2 arguments (3 given)

IndexError: too many indices for array

When I try to chat with the bot in the web interface, I got the error below:

Traceback (most recent call last):
  File "/home/.local/lib/python2.7/site-packages/gevent/pywsgi.py", line 976, in handle_one_response
    self.run_application()
  File "/home/.local/lib/python2.7/site-packages/gevent/pywsgi.py", line 923, in run_application
    self.result = self.application(self.environ, self.start_response)
  File "/home/.local/lib/python2.7/site-packages/flask/app.py", line 1997, in __call__
    return self.wsgi_app(environ, start_response)
  File "/home/.local/lib/python2.7/site-packages/flask_socketio/__init__.py", line 42, in __call__
    start_response)
  File "/home/.local/lib/python2.7/site-packages/engineio/middleware.py", line 67, in __call__
    return self.wsgi_app(environ, start_response)
  File "/home/.local/lib/python2.7/site-packages/flask/app.py", line 1985, in wsgi_app
    response = self.handle_exception(e)
  File "/home/.local/lib/python2.7/site-packages/flask/app.py", line 1540, in handle_exception
    reraise(exc_type, exc_value, tb)
  File "/home/.local/lib/python2.7/site-packages/flask/app.py", line 1982, in wsgi_app
    response = self.full_dispatch_request()
  File "/home/.local/lib/python2.7/site-packages/flask/app.py", line 1614, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/home/.local/lib/python2.7/site-packages/flask/app.py", line 1517, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/home/.local/lib/python2.7/site-packages/flask/app.py", line 1612, in full_dispatch_request
    rv = self.dispatch_request()
  File "/home/.local/lib/python2.7/site-packages/flask/app.py", line 1598, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/home/cocoa/cocoa/web/views/chat.py", line 48, in check_inbox
    event = backend.receive(uid)
  File "/home/cocoa/cocoa/web/main/backend.py", line 838, in receive
    controller.step(self)
  File "/home/cocoa/cocoa/core/controller.py", line 109, in step
    event = session.send()
  File "/home/cocoa/cocoa/sessions/timed_session.py", line 60, in send
    self.queued_event.append(self.session.send())
  File "/home/cocoa/craigslistbargain/sessions/neural_session.py", line 64, in send
    tokens = self.generate()
  File "/home/cocoa/craigslistbargain/sessions/neural_session.py", line 158, in generate
    output_data = self.generator.generate_batch(batch, gt_prefix=self.gt_prefix, enc_state=enc_state)
  File "/home/cocoa/cocoa/neural/generator.py", line 119, in generate_batch
    for b in range(batch_size)]
  File "/home/cocoa/cocoa/neural/generator.py", line 109, in get_bos
    bos = batch.decoder_inputs[gt_prefix-1][b].data.cpu().numpy()[0]
IndexError: too many indices for array
2019-03-10T12:46:45Z {'REMOTE_PORT': '49500', 'HTTP_HOST': 'xxx', 'REMOTE_ADDR': '::ffff:xxx', (hidden keys: 24)} failed with IndexError

I don't know the reason for it, and my command is:

PYTHONPATH=. python web/chat_app.py --host 0.0.0.0 --port 5000 --config web/app_params_allsys.json --schema-path data/craigslist-schema.json --scenarios-path data/scenarios.json --price-tracker-model data/price_tracker.pkl --templates data/templates.pkl --policy data/model.pkl

Other informations:

System: Ubuntu18.04
Python: 2.7

Please let me know if there is any solutions for this problem, thanks.

NameError when lunching web server

NameError: global name 'datetime' is not defined in cocoa/craigslistbargain/options.py.

It occurs when I run the following script:

$PYTHONPATH=. python web/chat_app.py --port 8081 --config web/app_params.json --schema-path data/craigslist-schema.json --scenarios-path data/train-scenarios.json --output output

SL(act)+ Rule model

Hi,

How can I replicate SL(act)+Rule model? Could you please provide me the commands? Thank you!

Modify System class and add Session class

  • Add new_session() function (and any other required functions) to System
  • System should load up the model at startup
  • Session should provide an interface to send/receive messages from the model

Integration with AMT for crowdsourcing

Hi,

Thank you for providing this code.

Could you please share some example/script as to how we integrate this system with AMT for crowdsourcing dialogues?

Thanks.

Modular approach question

Hi,

I am trying to reproduce the SL/RL(act) model and the overall example.

Reading the paper and running the code I noticed you refer to a Hybrid Policy (paper) which is basically the SL(act)+rule and there is also a hybrid type of agents which refers to what I want to reproduce.

When I am going through your code for the Modular approach, I noticed that when RL is applied, you specify the pt-neural type of agents:

mkdir checkpoint/lf2lf-margin; PYTHONPATH=. python reinforce.py --schema-path data/craigslist-schema.json \ --scenarios-path data/train-scenarios.json \ --valid-scenarios-path data/dev-scenarios.json \ --price-tracker price_tracker.pkl \ --agent-checkpoints checkpoint/lf2lf/model_best.pt checkpoint/lf2lf/model_best.pt \ --model-path checkpoint/lf2lf-margin \ --optim adagrad --learning-rate 0.001 \ --agents pt-neural pt-neural \ --report-every 500 --max-turns 20 --num-dialogues 5000 \ --sample --temperature 0.5 --max-length 20 --reward margin

Later, at the End-to-End approach, you mention that in order to run the RL finetune: "We just need to change the agent type to --agents hybrid hybrid".

So, my question is that, shouldn't those two be at the exact opposite side, meaning the Modular approach with hybrid type agents and the End-to-End approach with pt-neural?

I might be also missing something here - something that I haven't understood correctly. I would really appreciate your kind help.

Thank you in advance!

Model name problem

Could you please clarify the meaning of each model in the code of paper "Decoupling Strategy and Generation in Negotiation Dialogues"?

  1. "rulebased"
  2. "hybrid"
  3. "cmd"
  4. "fb-neural"
  5. "pt-neural"

Thank you very much!

What the soulution to ImportError: cannot import name LIWC ?

Hi,He He!
I've learned a lot from running and reading your code!
However,I meet a problem when i setup the web server i.g. when i python ../scripts/visualize_transcripts.py
I get:

Traceback (most recent call last):
File "../scripts/visualize_transcripts.py", line 4, in
from analysis.visualizer import Visualizer
File "/data/linshuai/cocoa/craigslistbargain/analysis/visualizer.py", line 8, in
from analyze_strategy import StrategyAnalyzer
File "/data/linshuai/cocoa/craigslistbargain/analysis/analyze_strategy.py", line 25, in
from liwc import LIWC
ImportError: cannot import name LIWC

i've install the liwc via pip install.However,it doesn't seem to with LIWC.
And i wonder the use of this line of code:
self.liwc = LIWC.from_pkl(liwc_path)

Appreciating for your help!

IndexError: index 0 is out of bounds for axis 0 with size 0

Hi
try to run python src/scripts/generate_scenarios.py --schema-path data/schema.json --scenarios-path data/scenarios.json --num-scenarios 500 --random-attributes --random-items --alphas 0.3 1 3

I had error:
Traceback (most recent call last):
File "src/scripts/generate_scenarios.py", line 158, in
s = generate_scenario(schema)
File "src/scripts/generate_scenarios.py", line 93, in generate_scenario
index = random_multinomial(distrib)
File "/home/ ... /cocoa/src/basic/util.py", line 11, in random_multinomial
accum += probs[i]
IndexError: index 0 is out of bounds for axis 0 with size 0

Question: Cannot execute craigslistbargain/web/chat_app.py

Hello.

In order to try to use cocoa system, I added to write paths below to /home/(user_name)/.pyenv/versions/anaconda3-5.3.0/envs/py27/lib/python2.7/site-packages/easy-install.pth.

  • (auto-written by setup.py) /mnt/c/users/(admin_name)/cocoa-master
  • /mnt/c/users/(user_name)/cocoa-master/cocoa
  • /mnt/c/users/(user_name)/cocoa-master/onmt
  • /mnt/c/users/(user_name)/cocoa-master/craigslistbargain
  • /mnt/c/users/(user_name)/cocoa-master/craigslistbargain/web
  • /mnt/c/users/(user_name)/cocoa-master/craigslistbargain/core

After that, I ran craigslistbargain/web/chat_app.py on Python 2.7.15.
However, this execution failed. The error is as follows:

Traceback (most recent call last):
  File "chat_app.py", line 19, in <module>
    from core.scenario import Scenario
ImportError: No module named scenario

What should I do?
I would appreciate if you could teach the solution.

Environment

  • Ubuntu 18.04 (Linux Subsystem of Windows 10 Education)
  • Python 2.7.15 on Anaconda virtual environment
    • Pytorch 0.4.1.post2

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.