A question answering system that extracts answers questions in natural language from Wikipedia. Inspired by IBM Watson and START. Currently the answers extracted by the system are moderately accurate. Follow the creator's blog at shirishkadam.com for updates on progress.
Elasticsearch is being used to store and index the scrapped and parsed texts from Wikipedia.
Elasticsearch 6.X
installation guide can be found at Elasticsearch Documentation.
You might have to start the elasticsearch search service.
$ git clone https://github.com/5hirish/adam_qas.git
$ cd adam_qas
$ pip install -r requirements.txt
$ python -m qas.adam "When was linux kernel version 4.0 released ?"
Note: The above installation downloads the best-matching default english language model for spaCy. But to improve the model's accuracy you can install other models too. Read more at spaCy docs.
$ python -m spacy download en_core_web_md
Find more in depth documentation about the system with its research paper and system architecture here
Python Package dependencies listed in requirements.txt
- Extract information from Wikipedia
- Classify questions with regular expression (default)
- Classify questions with a SVM (optional)
- Vector space model used for answer extraction
- Rank candidate answers
- Merge top 5 answers into one response
- Replace Wikipedia APIs with custom scraper
- Storing extracted data in database (elasticsearch)
- Anaphora resolution in both questions and answers
- Machine learning query constructor rather than rule-based
- Improve vector space language model for answer extraction
Please see our contributing documentation for some tips on getting started.
- @5hirish - Shirish Kadam