Stop writing multiple parser scripts for parsing different websites. With Parsify you can have a single few lines script and the configuration file to fit your parser to different websites.
pip install parsify
Make sure you have your configuration file (usually handbook.json
) ready.
import parsify as pf
# Create Parsify engine
ngn = pf.Engine(handbook='handbook.json')
# Run a single step
# Provide step name as an argument
# Should be in Engine.current_parser
# Should not have any "dynamic_variables" when custom using this method
# By default Engine.current_parser is the first parser in the Handbook
step_result = ngn.stepshot(step='get_products')
# print(step_result)
# Parse a single website (must be configured in "handbook.json")
# Provide scope name as an argument
scope_result = ngn.scopeshot(parser='example.com')
# print(scope_result)
# Run all the parsers that are configured in "handbook.json"
final_result = ngn.parse()
# print(final_result)
- Handbook file should start with "parser" key value of which is the array of parsers.
- Each parser in the array should have two keys:
- "scope" - String: Name of the parser. Usually website name, i.e. "example.com".
- "steps" - Array: Steps to parse.
- Each step should have at least following fields:
- "name" - String: Unique name of the step. This field will make possible to access this step's results and dynamic variables in the proceeding steps (if needed).
- "chain_id" - Integer: Steps with the same chain id will be executed as a sequence of steps on every iteration.
- "url" - String: Target url of the request(s) for the current step.
- "method" - String: Request method for the current step.
- "output_path" String: Path of the result data in response. Use dots if it's multi-nested, for example, if needed result is in response -> "data" -> "products", "output_path" should be "data.products".
- "output" Dictionary:
Distributed under the MIT License. See LICENSE
file for more information.
Luka Sosiashvili - @lukasanukvari - [email protected]
Project Link: https://github.com/lukasanukvari/parsify
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.