emareg / paper-checker Goto Github PK
View Code? Open in Web Editor NEWFind simple grammar mistakes in scientific documents.
Find simple grammar mistakes in scientific documents.
It would be nice if each generated report has a legend on the top that says the meaning of different colors. Additionally, in the readme, for each option it would be nice to add lines like " The results are then shown in {color}"
hyphenated words that just happen to fall at the end of a line are reconstructed without the hyphen.
In my paper I have this example.
`...but with very contrasting power-
performance thread....."
This becomes
"but with very contrasting powerperformance thread"
after pdf2text. No idea if it's solvable but thought I'd let you know.
Currently bs4
is needed to be installed for plagiarism checker. Could be good to have a requirements.txt
or any other method to install dependencies automatically
Google provides an API endpoint for search queries that return JSON based responses that contain all the information that the plagiarism checker needs. For example, searching for lectures in the custom search engine of the API documentation returns the following:
{
"kind": "customsearch#search",
"url": {
"type": "application/json",
"template": "https://...."
},
"queries": {
"request": [
{
//...
}
],
"nextPage": [
{
"title": "Google Custom Search - lectures",
"totalResults": "781000000",
"searchTerms": "lectures",
"count": 10,
"startIndex": 11,
"inputEncoding": "utf8",
"outputEncoding": "utf8",
"safe": "off",
"cx": "017576662512468239146:omuauf_lfve"
}
]
},
"context": {
"title": "CS Curriculum",
"facets": [
[
{
"anchor": "Lectures",
"label": "lectures",
"label_with_op": "more:lectures"
}
],
[
{
"anchor": "Assignments",
"label": "assignments",
"label_with_op": "more:assignments"
}
],
[
{
"anchor": "Reference",
"label": "reference",
"label_with_op": "more:reference"
}
]
]
},
"searchInformation": {
"searchTime": 0.350489,
"formattedSearchTime": "0.35",
"totalResults": "781000000",
"formattedTotalResults": "781,000,000"
},
"items": [
{
"kind": "customsearch#result",
"title": "Introduction to Machine Learning",
"htmlTitle": "Introduction to Machine Learning",
"link": "https://see.stanford.edu/Course/CS229",
"displayLink": "see.stanford.edu",
"snippet": "Slides from Andrew's lecture on getting machine learning algorithms to work in \npractice can be found here. Previous projects: A list of last year's final projects ...",
"htmlSnippet": "Slides from Andrew's \u003cb\u003electure\u003c/b\u003e on getting machine learning algorithms to work in \u003cbr\u003e\npractice can be found here. Previous projects: A list of last year's final projects ...",
"cacheId": "vB97xQjhxVcJ",
"formattedUrl": "https://see.stanford.edu/Course/CS229",
"htmlFormattedUrl": "https://see.stanford.edu/Course/CS229",
"pagemap": {
"cse_thumbnail": [
{
"src": "https://encrypted-tbn2.gstatic.com/images?q=tbn:ANd9GcQ2_-hJWbczpcTOUvBJuymIrbHevHrTlAL-EhyPo--xfmFh0F0Ts8iCmOc",
"width": "148",
"height": "208"
}
],
"metatags": [
{
"viewport": "width=device-width, initial-scale=1"
}
],
"cse_image": [
{
"src": "https://see.stanford.edu/Content/Images/Instructors/ng.jpg"
}
]
},
"labels": [
{
"name": "lectures",
"displayName": "Lectures",
"label_with_op": "more:lectures"
}
]
},
// There are more results here
]
}
For more information: https://developers.google.com/custom-search/v1/overview
Should this type of search replace the current search or maybe added as an additional search?
In spelling.py the files names_geo.dic and names_people.dic are used in line 207, 208.
But they are missing in the src/dictionary folder.
dictionary = read_dictionary(dictionary, 'src/dictionary/names_geo.dic')
dictionary = read_dictionary(dictionary, 'src/dictionary/names_people.dic')
Hey,
in multiline captions the checker complains about missing dots.
\caption{
abcdef hijkl.
}
It insteads recommends:
\caption{
abcdef hijkl.
.}
Also, it recommends an unanimous
, although a unanimous
is correct. (It begins with a "y" sound -> a year
).
Thanks!
When using the -s
option for spellchecking, with or without the other options, the spelling errors are marked with yellow/orange on the left at the line numbers whereas they are marked with pink in the text
It is misleading that the even though the plagiarism checker is being done and logged to terminal, the results do not show up in the HTML report. It should be at least documented that its output is only in the terminal/stdout.
Here I list some false positives which were reported. They could be easily reproduced in test cases
Text | Output |
---|---|
by at least | You have repeated an adposition, which is probably not intended.: by at → by |
from \cite{...} with | You have repeated an adposition, which is probably not intended.: from with → from |
of \cite{...} | Do not use prepositions to end your sentences.: of. → . |
for in-depth | You have repeated an adposition, which is probably not intended.: for in- → for - |
so far | Informal word, could be substituted. so ⇒ Therefore, |
\num{554400} | Large number, you should use a thousand separator.: 554400 |
Reproduce: run papercheck on a .tex file including a table (e.g. example/testfile.tex)
Expected: textstats should count at least one table
Actual: textstats shows 0 as table count
Problem:
the textstats.py for some reason does not count \begin{table}
, even the regex seems to be correct.
Another useful check for the TeX checker would be to identify variables which are used in the text but not in math mode (
The CLI interface provides: -o FILE, --output FILE write report to FILE
which is never used. The output filename is hardcoded.
The __main__.py
searches for a subfolder papercheck
which is not present in the zip file:
Traceback (most recent call last):
File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "./paperchecker/__main__.py", line 67, in <module>
ModuleNotFoundError: No module named 'papercheck
__main__.py:67
states from papercheck.checker.grammar import checkGrammar
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.