cdqa-suite / cdqa Goto Github PK

⛔ [NOT MAINTAINED] An End-To-End Closed Domain Question Answering System.

Home Page: https://cdqa-suite.github.io/cdQA-website/

License: Apache License 2.0

Python 100.00%

reading-comprehension question-answering deep-learning natural-language-processing information-retrieval bert artificial-intelligence nlp pytorch transformers

cdqa's Introduction

cdQA: Closed Domain Question Answering

An End-To-End Closed Domain Question Answering System. Built on top of the HuggingFace transformers library.

⛔ [NOT MAINTAINED] This repository is no longer maintained, but is being kept around for educational purposes. If you want a maintained alternative to cdQA check out: https://github.com/deepset-ai/haystack

cdQA in details

If you are interested in understanding how the system works and its implementation, we wrote an article on Medium with a high-level explanation.

We also made a presentation during the #9 NLP Breakfast organised by Feedly. You can check it out here.

Installation
Getting started
Notebook Examples
Deployment
- Manual
Contributing
References
LICENSE

Installation

With pip

pip install cdqa

From source

git clone https://github.com/cdqa-suite/cdQA.git
cd cdQA
pip install -e .

Hardware Requirements

Experiments have been done with:

CPU 👉 AWS EC2 t2.medium Deep Learning AMI (Ubuntu) Version 22.0
GPU 👉 AWS EC2 p3.2xlarge Deep Learning AMI (Ubuntu) Version 22.0 + a single Tesla V100 16GB.

Getting started

Preparing your data

Manual

To use cdQA you need to create a pandas dataframe with the following columns:

title	paragraphs
The Article Title	[Paragraph 1 of Article, ... , Paragraph N of Article]

With converters

The objective of cdqa converters is to make it easy to create this dataframe from your raw documents database. For instance the pdf_converter can create a cdqa dataframe from a directory containing .pdf files:

from cdqa.utils.converters import pdf_converter

df = pdf_converter(directory_path='path_to_pdf_folder')

You will need to install Java OpenJDK to use this converter. We currently have converters for:

pdf
markdown

We plan to improve and add more converters in the future. Stay tuned!

Downloading pre-trained models and data

You can download the models and data manually from the GitHub releases or use our download functions:

from cdqa.utils.download import download_squad, download_model, download_bnpp_data

directory = 'path-to-directory'

# Downloading data
download_squad(dir=directory)
download_bnpp_data(dir=directory)

# Downloading pre-trained BERT fine-tuned on SQuAD 1.1
download_model('bert-squad_1.1', dir=directory)

# Downloading pre-trained DistilBERT fine-tuned on SQuAD 1.1
download_model('distilbert-squad_1.1', dir=directory)

Training models

Fit the pipeline on your corpus using the pre-trained reader:

import pandas as pd
from ast import literal_eval
from cdqa.pipeline import QAPipeline

df = pd.read_csv('your-custom-corpus-here.csv', converters={'paragraphs': literal_eval})

cdqa_pipeline = QAPipeline(reader='bert_qa.joblib') # use 'distilbert_qa.joblib' for DistilBERT instead of BERT
cdqa_pipeline.fit_retriever(df=df)

If you want to fine-tune the reader on your custom SQuAD-like annotated dataset:

cdqa_pipeline = QAPipeline(reader='bert_qa.joblib') # use 'distilbert_qa.joblib' for DistilBERT instead of BERT
cdqa_pipeline.fit_reader('path-to-custom-squad-like-dataset.json')

Save the reader model after fine-tuning:

cdqa_pipeline.dump_reader('path-to-save-bert-reader.joblib')

Making predictions

To get the best prediction given an input query:

cdqa_pipeline.predict(query='your question')

To get the N best predictions:

cdqa_pipeline.predict(query='your question', n_predictions=N)

There is also the possibility to change the weight of the retriever score versus the reader score in the computation of final ranking score (the default is 0.35, which is shown to be the best weight on the development set of SQuAD 1.1-open)

cdqa_pipeline.predict(query='your question', retriever_score_weight=0.35)

Evaluating models

In order to evaluate models on your custom dataset you will need to annotate it. The annotation process can be done in 3 steps:

Convert your pandas DataFrame into a json file with SQuAD format:

from cdqa.utils.converters import df2squad

json_data = df2squad(df=df, squad_version='v1.1', output_dir='.', filename='dataset-name')

Use an annotator to add ground truth question-answer pairs:

Please refer to our cdQA-annotator, a web-based annotator for closed-domain question answering datasets with SQuAD format.

Evaluate the pipeline object:

from cdqa.utils.evaluation import evaluate_pipeline

evaluate_pipeline(cdqa_pipeline, 'path-to-annotated-dataset.json')

Evaluate the reader:

from cdqa.utils.evaluation import evaluate_reader

evaluate_reader(cdqa_pipeline, 'path-to-annotated-dataset.json')

Notebook Examples

We prepared some notebook examples under the examples directory.

You can also play directly with these notebook examples using Binder or Google Colaboratory:

Notebook	Hardware	Platform
[1] First steps with cdQA	CPU or GPU
[2] Using the PDF converter	CPU or GPU
[3] Training the reader on SQuAD	GPU

Binder and Google Colaboratory provide temporary environments and may be slow to start but we recommend them if you want to get started with cdQA easily.

Deployment

Manual

You can deploy a cdQA REST API by executing:

export dataset_path=path-to-dataset.csv
export reader_path=path-to-reader-model

FLASK_APP=api.py flask run -h 0.0.0.0

You can now make requests to test your API (here using HTTPie):

http localhost:5000/api query=='your question here'

If you wish to serve a user interface on top of your cdQA system, follow the instructions of cdQA-ui, a web interface developed for cdQA.

Contributing

Read our Contributing Guidelines.

References

Type	Title	Author	Year
📹 Video	Stanford CS224N: NLP with Deep Learning Lecture 10 – Question Answering	Christopher Manning	2019
📰 Paper	Reading Wikipedia to Answer Open-Domain Questions	Danqi Chen, Adam Fisch, Jason Weston, Antoine Bordes	2017
📰 Paper	Neural Reading Comprehension and Beyond	Danqi Chen	2018
📰 Paper	BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding	Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova	2018
📰 Paper	Contextual Word Representations: A Contextual Introduction	Noah A. Smith	2019
📰 Paper	End-to-End Open-Domain Question Answering with BERTserini	Wei Yang, Yuqing Xie, Aileen Lin, Xingyu Li, Luchen Tan, Kun Xiong, Ming Li, Jimmy Lin	2019
📰 Paper	Data Augmentation for BERT Fine-Tuning in Open-Domain Question Answering	Wei Yang, Yuqing Xie, Luchen Tan, Kun Xiong, Ming Li, Jimmy Lin	2019
📰 Paper	Passage Re-ranking with BERT	Rodrigo Nogueira, Kyunghyun Cho	2019
📰 Paper	MRQA: Machine Reading for Question Answering	Jonathan Berant, Percy Liang, Luke Zettlemoyer	2019
📰 Paper	Unsupervised Question Answering by Cloze Translation	Patrick Lewis, Ludovic Denoyer, Sebastian Riedel	2019
💻 Framework	Scikit-learn: Machine Learning in Python	Pedregosa et al.	2011
💻 Framework	PyTorch	Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan	2016
💻 Framework	Transformers: State-of-the-art Natural Language Processing for TensorFlow 2.0 and PyTorch.	Hugging Face	2018

LICENSE

Apache-2.0

cdqa's People

Contributors

Stargazers

Watchers

Forkers

ethiansjh bansicloud landsurveyorsunited christophersperl andymos66 swathygsb sunyancn zjms allensmile ewail sesamecode littlebrother-lyc luochengleo itspritish guptam thomasvn end0cr1ne harudori wengbenjue synlive shadowridgedev toronto899 anoop2019 catch-n-release indranil-hue lking12 xiaojia1234 mvbalaji maelfabien vinaybysani khisamnurx dv-b dsvrsec gabeochieng anudevi89 radsimu sayanbanerjee32 airob sk38897 altafr morningstar899 sarathsahadevan vinnytwice shourya1997 wangyoucao etrigger letslego srinivasadotnet applejenny66 priya997 tigermachinelearning deepakthandra mmehta-navomi openulam wqmike123 isaacho0 tamil-palace gandalf012 smslce pkufergus anddigital-jh royhuang9 karndeb swaption2009 dertilo jamontol nehabharambe rsjain1978 rogervaas sagarkrv rogelio-meza-t guruprasaad123 meghajain-1711 shijumylapra sumepr txz233 zhaimobile mdrakiburrahman andreshazard shaunlipy nyntofive darthsinistro tothniki shenyi666666 jacobcoro amirstudy flyyuan sivabuddi abhi1nandy2 fengweijp databill86 mtelaprolu renatoviolin sxjpage havingfun ryantatko ec9223 dragomirradev kireetigupta kboorla3

cdqa's Issues

Add comments + docstrings + changelog

code --diff cdqa/reader/run_squad.py cdqa/reader/bertqa_sklearn.py

Allows to compare sklearn wrapper vs. original script in vscode.

Find a robust method to get articles paragraphs

Implement download.py script (SQuAD fetch)

Here: https://github.com/fmikaelian/reading-comprehension/blob/develop/reading-comprehension/pipeline/download.py

Upload json version of BNP Paribas Newsroom dataset v.1.0 in release

Add github badges

https://shields.io/

pypi
codecov
notebook

Split run_squad.py in processing/train/predict

The idea is to use /reader/run_squad.py as a script to import without main() and to break the main() into subparts that would be added to processing/train/predict python scripts in /pipeline.

Set correct boolean values for parameters in sklearn wrapper (--lower-case, etc)

Add license

Fine-tune BERT on SQuAD train then SQuAD dev

Use output weights of first BERT fine-tuned on SQuAD train as input starter weights for new BERT fine-tuned on SQuAD dev.

https://github.com/huggingface/pytorch-pretrained-BERT#squad

predict() method should also give back index of document + paragraph

We could add a document index to squad_examples then test_examples?

squad_examples = generate_squad_examples(question=question,
                                         article_indices=article_indices,
                                         metadata=df)

test_processor = BertProcessor(bert_model='bert-base-uncased', do_lower_case=True, is_training=False)
test_examples, test_features = test_processor.fit_transform(X=squad_examples)

model = load(os.path.join('models/bert_qa_squad_v1.1_sklearn', 'bert_qa_squad_v1.1_sklearn.joblib'))
final_prediction, all_predictions, all_nbest_json, scores_diff_json = model.predict(X=(test_examples, test_features))

Add config for packaging, CI and tests

Wrong dataset filename in examples

Should be bnpp_newsroom-v1.0.csv

Add structure for samples and examples

Upload weights and metrics and update download.py script

Initiate README structure

Add utils script to convert pandas df (title, content) to SQuAD format

Discussion around BERTserini paper

See: https://export.arxiv.org/pdf/1902.01718

Explore BNP Paribas Newsroom dataset v.1.1

Generate some KPIs on the dataset we collected. Ideas of KPIs:

Number of paragraphs per article
Length of paragraphs
Paragraphs patterns

Before and after filtering.

Update run_squad with latest commits

Build a document retriever to find best article candidates

Move retriever example in /examples (currently in /samples)

Add document retriever script

Add DrQA's paper to references

https://arxiv.org/abs/1704.00051

Adapt retriever.py to BNP Paribas Newsroom dataset v.1.0

See issue #31

Article retrieval underperforms paragraph retrieval by a large margin.

Cannot load bert sklearn .joblib model

RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location='cpu' to map your storages to the CPU.

Prepare workshop on ai-station and collaboration setup

Ability to tune parameters at prediction time

Following huggingface/transformers#126:

Parameters predict_fp16, max_seq_length and predict_batch_size should be tunable at predict time:

For train:

python run_squad.py \
  --bert_model bert-base-uncased \
  --do_train \
  --do_predict \
  --do_lower_case \
  --train_file $SQUAD_DIR/train-v1.1.json \
  --predict_file $SQUAD_DIR/dev-v1.1.json \
  --train_batch_size 12 \
  --learning_rate 3e-5 \
  --num_train_epochs 2.0 \
  --max_seq_length 384 \
  --doc_stride 128 \
  --output_dir /tmp/debug_squad/

For predict with fp16:

python run_squad.py \
  --bert_model bert-base-uncased \
  --do_predict \
  --predict_fp16 \
  --do_lower_case \
  --train_file $SQUAD_DIR/train-v1.1.json \
  --predict_file $SQUAD_DIR/dev-v1.1.json \
  --predict_batch_size 128 \
  --learning_rate 3e-5 \
  --num_train_epochs 2.0 \
  --max_seq_length 384 \
  --doc_stride 128 \
  --output_dir /tmp/debug_squad/

Add Travis CI badge

Be compliant with the Github open source guide

Interesting links:

Find a name for our QA software

Usually QA software have the word QA, eg. DrQA.

Do you have any idea to name our software? I was thinking about words linked to the ability to answer everything, a bit mystic.

Let's brainstorm!

nbest_predictions.json is empty after predict()

Question: Who is the creator of Artificial Intelligence?

Predictions returned by predictions = model.predict(X=(test_examples, test_features)) are:

(OrderedDict([('2398202a-41b4-11e9-beaa-796013f1ec43', 'BNP Paribas'),
              ('239828b8-41b4-11e9-beaa-796013f1ec43',
               'Chronicle of a revolution'),
              ('2398294e-41b4-11e9-beaa-796013f1ec43',
               'machine learning, deep learning, language processing, etc.'),
              ('23983056-41b4-11e9-beaa-796013f1ec43', 'Edouard d’Archimbaud'),
              ('2398309c-41b4-11e9-beaa-796013f1ec43', 'AI'),
              ('239830e2-41b4-11e9-beaa-796013f1ec43', 'BNP Paribas'),
              ('23983128-41b4-11e9-beaa-796013f1ec43', 'Marvin Lee Minsky'),
              ('23983164-41b4-11e9-beaa-796013f1ec43',
               'Artificial Intelligence is in fact likely to surpass humans in performing tasks that require reasoning and learning.'),
              ('239831a0-41b4-11e9-beaa-796013f1ec43', 'Watson'),
              ('239831e6-41b4-11e9-beaa-796013f1ec43', 'Google'),
              ('2398322c-41b4-11e9-beaa-796013f1ec43', 'Accenture'),
              ('23983268-41b4-11e9-beaa-796013f1ec43', 'AI'),
              ('239832a4-41b4-11e9-beaa-796013f1ec43', 'Partnership on AI'),
              ('239832e0-41b4-11e9-beaa-796013f1ec43', 'BNP Paribas'),
              ('23983326-41b4-11e9-beaa-796013f1ec43', 'Edouard d’Archimbaud'),
              ('23983362-41b4-11e9-beaa-796013f1ec43', 'data scientists'),
              ('2398339e-41b4-11e9-beaa-796013f1ec43', 'Edouard d’Archimbaud'),
              ('239833e4-41b4-11e9-beaa-796013f1ec43',
               'AI system’s ability to learn “by example” or “by experience”.'),
              ('23983420-41b4-11e9-beaa-796013f1ec43',
               'Deep learning is a learning technology that uses artificial neural networks, which approximate human learning to process “raw data”.'),
              ('2398345c-41b4-11e9-beaa-796013f1ec43', 'Alan Turing'),
              ('23983498-41b4-11e9-beaa-796013f1ec43', 'TEDxParis'),
              ('239834d4-41b4-11e9-beaa-796013f1ec43', 'BNP Paribas'),
              ('23983510-41b4-11e9-beaa-796013f1ec43', 'BNP Paribas'),
              ('23983a60-41b4-11e9-beaa-796013f1ec43', 'change management'),
              ('23983ad8-41b4-11e9-beaa-796013f1ec43', 'BNP Paribas'),
              ('23983b1e-41b4-11e9-beaa-796013f1ec43', 'Julien Dinh'),
              ('23983f92-41b4-11e9-beaa-796013f1ec43', 'Julien Dinh')]),
 OrderedDict(),
 OrderedDict())

The ground truth is Marvin Lee Minsky, available in context 23983128-41b4-11e9-beaa-796013f1ec43:

{'context': 'One of the creators of Artificial Intelligence, Marvin Lee Minsky, notably defines it as “the construction of computer programs that engage in tasks that are, for now, more satisfactorily accomplished by humans because they require high-level mental processes”. ',
    'qas': [{'answers': [],
      'question': 'Who is the creator of Artificial Intelligence?',
      'id': '23983128-41b4-11e9-beaa-796013f1ec43'}]},

How to get the best answer from predictions (see #36) ?
What is nbest_predictions.json (empty in my case) ?

Originally posted by @fmikaelian in #33 (comment)

How to allow comparision of BERT answer span predictions for final ranking?

to allow comparison and aggregation of results from different segments, we remove the final softmaxlayer over different answer spans.

See Issue #31

FileNotFoundError at prediction time

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-20-bff63677b429> in <module>()
----> 1 final_prediction, all_predictions, all_nbest_json, scores_diff_json = model.predict(X=(test_examples, test_features))

~/cdQA/cdqa/reader/bertqa_sklearn.py in predict(self, X)
   1195             self.verbose_logging,
   1196             self.version_2_with_negative,
-> 1197             self.null_score_diff_threshold)
   1198 
   1199         return final_prediction, all_predictions, all_nbest_json, scores_diff_json

~/cdQA/cdqa/reader/bertqa_sklearn.py in write_predictions(all_examples, all_features, all_results, n_best_size, max_answer_length, do_lower_case, output_prediction_file, output_nbest_file, output_null_log_odds_file, verbose_logging, version_2_with_negative, null_score_diff_threshold)
    636     final_prediction = list(final_predictions_sorted.items())[0][1]['text']
    637 
--> 638     with open(output_prediction_file, "w") as writer:
    639         writer.write(json.dumps(all_predictions, indent=4) + "\n")
    640 

FileNotFoundError: [Errno 2] No such file or directory: 'logs/bert_qa_squad_v1.1_sklearn/predictions.json'

synchronise run-squad.py

NameError: name 'device' is not defined in predict() method

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-14-65cedbb88ef8> in <module>()
      2 test_examples, test_features = test_processor.fit_transform(X=squad_examples)
      3 # model = load('model.joblib')
----> 4 predictions = model.predict(X=(test_examples, test_features))

~/cdQA/cdqa/reader/bertqa_sklearn.py in predict(self, X)
   1037             if len(all_results) % 1000 == 0:
   1038                 logger.info("Processing example: %d" % (len(all_results)))
-> 1039             input_ids = input_ids.to(device)
   1040             input_mask = input_mask.to(device)
   1041             segment_ids = segment_ids.to(device)

NameError: name 'device' is not defined

03/11/2019 09:42:42 - INFO - cdqa.reader.bertqa_sklearn -   *** Example ***
03/11/2019 09:42:42 - INFO - cdqa.reader.bertqa_sklearn -   unique_id: 1000000000
03/11/2019 09:42:42 - INFO - cdqa.reader.bertqa_sklearn -   example_index: 0
03/11/2019 09:42:42 - INFO - cdqa.reader.bertqa_sklearn -   doc_span_index: 0
03/11/2019 09:42:42 - INFO - cdqa.reader.bertqa_sklearn -   tokens: [CLS] [UNK] is the creator of [UNK] [UNK] ? [SEP] [UNK] [UNK] launches the prototype [UNK] , first online community for corporate clients [SEP]
03/11/2019 09:42:42 - INFO - cdqa.reader.bertqa_sklearn -   token_to_orig_map: 10:0 11:1 12:2 13:3 14:4 15:5 16:5 17:6 18:7 19:8 20:9 21:10 22:11
03/11/2019 09:42:42 - INFO - cdqa.reader.bertqa_sklearn -   token_is_max_context: 10:True 11:True 12:True 13:True 14:True 15:True 16:True 17:True 18:True 19:True 20:True 21:True 22:True
03/11/2019 09:42:42 - INFO - cdqa.reader.bertqa_sklearn -   input_ids: 101 100 2003 1996 8543 1997 100 100 1029 102 100 100 18989 1996 8773 100 1010 2034 3784 2451 2005 5971 7846 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
03/11/2019 09:42:42 - INFO - cdqa.reader.bertqa_sklearn -   input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
03/11/2019 09:42:42 - INFO - cdqa.reader.bertqa_sklearn -   segment_ids: 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
03/11/2019 09:42:42 - INFO - cdqa.reader.bertqa_sklearn -   *** Example ***
03/11/2019 09:42:42 - INFO - cdqa.reader.bertqa_sklearn -   unique_id: 1000000001
03/11/2019 09:42:42 - INFO - cdqa.reader.bertqa_sklearn -   example_index: 1
03/11/2019 09:42:42 - INFO - cdqa.reader.bertqa_sklearn -   doc_span_index: 0
03/11/2019 09:42:42 - INFO - cdqa.reader.bertqa_sklearn -   tokens: [CLS] [UNK] is the creator of [UNK] [UNK] ? [SEP] [UNK] [UNK] has progressed at lightning speed in recent years . [UNK] are now able to beat humans in [UNK] matches , understand natural language , reason and learn . [UNK] a result , software and robots have something to offer in every field to make business more productive , profitable and innovative . [UNK] of a revolution fore ##to ##ld . [SEP]

Run evaluation script with bert sklearn wrapper to check metrics are consistent with run_squad script

@andrelmfarias How did you generate metrics.json previously?

cdqa-suite / cdqa Goto Github PK

cdqa's Introduction

cdQA: Closed Domain Question Answering

cdQA in details

Table of Contents

Installation

With pip

From source

Hardware Requirements

Getting started

Preparing your data

Manual

With converters

Downloading pre-trained models and data

Training models

Making predictions

Evaluating models

Notebook Examples

Deployment

Manual

Contributing

References

LICENSE

cdqa's People

Contributors

Stargazers

Watchers

Forkers

cdqa's Issues

Recommend Projects

Recommend Topics

Recommend Org