The open source version of the Amazon Comprehend docs. You can submit feedback & requests for changes by submitting issues in this repo or by making proposed changes & submitting a pull request
Regarding the usage of 10-20% of the documents for testing purposes, shouldn't it say
"we ensure that there are no labels in the test set which we have seen before"
instead of
"we ensure that there are no labels in the test set which we have never seen before"
The Python script file has errors from line 18-21.
File "bin/comprehend-ssie-annotation-tool-cli.py", line 18, in
from utils.s3_helper import S3Client
ModuleNotFoundError: No module named 'utils.s3_helper'
Could there be missing steps in the Uploading a PDF to an S3 bucket section?
Or is current script out dated?
Tested and script work, perhaps update instructions to first run - pipenv shell before executing script, missing instructions on AWS page
In the Documentation "Training a Custom Classifier" (in how-document-classification-training.md) there is a mention to escaped commas in the training dataset CSV file but it is for labels only and no example is given.
I tried several typical escaping characters one can usually use with CSV files such as \ or " but none of them has worked.
I ended with replacing comma with the html version of it: ",".
What is the official escaping method for comma in the document or label? Could you please provide an example.