Parses information regarding:
- What HTML version has the document?
- What is the page title?
- How many headings of what level are in the document?
- How many internal and external links are in the document? Are there any inaccessible links and how many?
- Did the page contain a login-form?
$ activator test
$ activator run
or start Activator UI and run application from it
$ activator ui
$ activator dist
For parsing the html page JSOUP library is used.
UrlScraper
service is a frontend to the parsing backend. It's also responsible for caching the results of parsing.ParserActor
is an actor which is responsible for parsing the content, as well as applyingExtractor
's to extract nessasary information.Extractor
's are responsible for extracting information, design goal was to simplify adding new types if infomation extration.