Comments (4)
The function read_squad_examples()
takes a json file as input. This json is converted to a list of dicts with SQuAD format with json.load(reader)["data"]
.
We could therefore do the following at prediction time:
- Get user question
- Run document retriever based on question
- Generate this list of dicts with SQuAD format with the same question asked on every paragraphs and empty
qas
key. - Run the document reader on this list of dicts
- Sort predictions
from cdqa.
Actually, the converter
we built does something similar to this.
from cdqa.
import uuid
def generate_squad_examples(question, article_indices, metadata):
squad_examples = []
metadata_sliced = metadata.loc[article_indices]
for index, row in tqdm(metadata_sliced.iterrows()):
temp = {'title': row['title'],
'paragraphs': []}
for paragraph in row['paragraphs']:
temp['paragraphs'].append({'context': paragraph,
'qas': [],
'question': question,
'id': str(uuid.uuid1())})
squad_examples.append(temp)
return squad_examples
Then we can call this in the example:
squad_examples = generate_squad_examples(question='Who is the creator of Artificial Intelligence?',
article_indices=article_indices,
metadata=df)
Outputs:
[{'title': 'Artificial Intelligence: more revolutionary than the Internet!',
'paragraphs': [{'context': 'BNP Paribas launches the prototype AGORA, first online community for corporate clients',
'qas': [],
'question': 'Who is the creator of Artificial Intelligence?',
'id': 'bec64330-3b40-11e9-8dad-0242ac110012'},
{'context': 'Artificial Intelligence has progressed at lightning speed in recent years. Machines are now able to beat humans in Go matches, understand natural language, reason and learn. As a result, software and robots have something to offer in every field to make business more productive, profitable and innovative. Chronicle of a revolution foretold.',
'qas': [],
'question': 'Who is the creator of Artificial Intelligence?',
'id': 'bec6701c-3b40-11e9-8dad-0242ac110012'},
{'context': 'Artificial Intelligence refers to a set of technologies – machine learning, deep learning, language processing, etc. – that share one common feature in that they rely on a computer system capable of analyzing, understanding, learning and discovering connections between things, facts and events as well as manipulating concepts. It should come as no surprise that machines have acquired these extraordinary abilities. Just like flying cars, autonomous and hyper-intelligent humanoid robots have been a major part of science fiction for decades.',
'qas': [],
'question': 'Who is the creator of Artificial Intelligence?',
'id': 'bec67102-3b40-11e9-8dad-0242ac110012'},
{'context': '“Artificial Intelligence is a word that has been around for 60 years, but which ultimately refers to nothing more than software. Machines are very good at performing repetitive tasks and can help humans work more efficiently. But they cannot take their own initiatives and can only make progress by interacting with people”, explains Edouard d’Archimbaud, manager of the Data Science & Artificial Intelligence Lab at BNP Paribas CIB. ',
'qas': [],
'question': 'Who is the creator of Artificial Intelligence?',
'id': 'bec67238-3b40-11e9-8dad-0242ac110012'}],
{'title': 'Sugiyama to lead Japan in France Fed Cup clash (AFP)',
'paragraphs': [{'context': 'Machine learning, deep learning, artificial intelligence—Julien Dinh, Senior Research Lead at...',
'qas': [],
'question': 'Who is the creator of Artificial Intelligence?',
'id': 'bec68e6c-3b40-11e9-8dad-0242ac110012'}]}]
from cdqa.
Question: Who is the creator of Artificial Intelligence?
Predictions returned by predictions = model.predict(X=(test_examples, test_features))
are:
(OrderedDict([('2398202a-41b4-11e9-beaa-796013f1ec43', 'BNP Paribas'),
('239828b8-41b4-11e9-beaa-796013f1ec43',
'Chronicle of a revolution'),
('2398294e-41b4-11e9-beaa-796013f1ec43',
'machine learning, deep learning, language processing, etc.'),
('23983056-41b4-11e9-beaa-796013f1ec43', 'Edouard d’Archimbaud'),
('2398309c-41b4-11e9-beaa-796013f1ec43', 'AI'),
('239830e2-41b4-11e9-beaa-796013f1ec43', 'BNP Paribas'),
('23983128-41b4-11e9-beaa-796013f1ec43', 'Marvin Lee Minsky'),
('23983164-41b4-11e9-beaa-796013f1ec43',
'Artificial Intelligence is in fact likely to surpass humans in performing tasks that require reasoning and learning.'),
('239831a0-41b4-11e9-beaa-796013f1ec43', 'Watson'),
('239831e6-41b4-11e9-beaa-796013f1ec43', 'Google'),
('2398322c-41b4-11e9-beaa-796013f1ec43', 'Accenture'),
('23983268-41b4-11e9-beaa-796013f1ec43', 'AI'),
('239832a4-41b4-11e9-beaa-796013f1ec43', 'Partnership on AI'),
('239832e0-41b4-11e9-beaa-796013f1ec43', 'BNP Paribas'),
('23983326-41b4-11e9-beaa-796013f1ec43', 'Edouard d’Archimbaud'),
('23983362-41b4-11e9-beaa-796013f1ec43', 'data scientists'),
('2398339e-41b4-11e9-beaa-796013f1ec43', 'Edouard d’Archimbaud'),
('239833e4-41b4-11e9-beaa-796013f1ec43',
'AI system’s ability to learn “by example” or “by experience”.'),
('23983420-41b4-11e9-beaa-796013f1ec43',
'Deep learning is a learning technology that uses artificial neural networks, which approximate human learning to process “raw data”.'),
('2398345c-41b4-11e9-beaa-796013f1ec43', 'Alan Turing'),
('23983498-41b4-11e9-beaa-796013f1ec43', 'TEDxParis'),
('239834d4-41b4-11e9-beaa-796013f1ec43', 'BNP Paribas'),
('23983510-41b4-11e9-beaa-796013f1ec43', 'BNP Paribas'),
('23983a60-41b4-11e9-beaa-796013f1ec43', 'change management'),
('23983ad8-41b4-11e9-beaa-796013f1ec43', 'BNP Paribas'),
('23983b1e-41b4-11e9-beaa-796013f1ec43', 'Julien Dinh'),
('23983f92-41b4-11e9-beaa-796013f1ec43', 'Julien Dinh')]),
OrderedDict(),
OrderedDict())
The ground truth is Marvin Lee Minsky
, available in context 23983128-41b4-11e9-beaa-796013f1ec43
:
{'context': 'One of the creators of Artificial Intelligence, Marvin Lee Minsky, notably defines it as “the construction of computer programs that engage in tasks that are, for now, more satisfactorily accomplished by humans because they require high-level mental processes”. ',
'qas': [{'answers': [],
'question': 'Who is the creator of Artificial Intelligence?',
'id': '23983128-41b4-11e9-beaa-796013f1ec43'}]},
- How to get the best answer from predictions (see #36) ?
- What is
nbest_predictions.json
(empty in my case) ?
from cdqa.
Related Issues (20)
- While running the 'pdf_converter' function
- is there any limit on the no.of rows in a data-frame for the annotator to load the json file?
- ModuleNotFoundError: No module named 'torch' HOT 2
- -
- Syntax error when importing my csv file
- numpy core fromnumeric.py error in QAPipeline.fit_retriever HOT 1
- MemoryError workaround HOT 1
- Maintenance of the project HOT 5
- can not use PIP to install
- return link to the pdf file page where the answer is located
- cdqa install error HOT 2
- How to use cdQA for non-English language? HOT 2
- ValueError: empty vocabulary; perhaps the documents only contain stop words in TfidfVectorizer
- Wrong default
- getting this error while loading the custom data. HOT 1
- Adding annotated training dataset
- CDQA is not installing Anaconda Navigator using PIP command HOT 4
- How can I link cdQA model to SQuAD v2 model? For QA model
- ModuleNotFoundError: No module named 'transformers.modeling_bert' HOT 1
- pdf_converter cdqa throws AttributeError: type object 'object' has no attribute 'dtype'
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cdqa.