A spider, which collects authors' cooprative organization from webofscience.
Input: data/authors
- line structure: id;author-name;author-organization
Output:
- valid directory
- each file in valid corresponds to one author; the file name is the author id.
- line structure: author-name;author-organization;cooperative-organization;organization-frequency
- non_found directory
- each file corresponds to one author.
- non_organ directory
- each file corresponds to one author.
Usage:
- python crawl.py data/authors