At the moment the tool doesn't state how many patents are analysed when CPC classification is applied. For example, the console output for python detect.py -cpc=Y02 -ps=USPTO-random-10000
would be:
Patents readied; 10,000 patents loaded
10,000 patents available after publication date sift
Sifting patents for Y02: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10000/10000 [00:01<00:00, 9848.67patent/s]
Dropped 0 patents due to empty abstracts
Patents readied; 10,000 patents loaded
10,000 patents available after publication date sift
Sifting patents for Y02: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10000/10000 [00:01<00:00, 9848.67patent/s]
Dropped 0 patents due to empty abstracts
XX,XXX patents with Y02 classification analysed **_<- this is the new code needed_**
This is so the user knows if the analysis was performed on a suitable sample size.