This is a Flask server that provides an endpoint to scrape content from websites.
- Scrapes text content from the following HTML tags:
h1
,h2
,h3
,h4
,p
, anddiv
. - Simple GET request interface to extract text from any webpage.
/scrap?url=<website>
: Scrapes the content of the provided URL.
To scrape the content of a website, make a request to the endpoint like this:
curl -X GET "https://scraper-py.vercel.app/scrap?url=https://example.com"
Or simply paste the URL into your browser:
https://scraper-py.vercel.app/scrap?url=https://example.com
The API returns a JSON object containing the extracted text from the target page. Example response:
{
"elements": {
"h1": ["Heading 1 Text"],
"h2": ["Subheading 1", "Subheading 2"],
"h3": ["Another Subheading"],
"h4": [],
"p": ["Paragraph text content."],
"div": ["Div content here."]
},
"url": "https://example.com/"
}
Scraping the content of Cloudilic:
Make sure you have Python installed, then install the necessary libraries:
pip install -r requirements.txt
Run the Flask application with:
python api/index.py
Open your browser and visit:
http://127.0.0.1:5000/
You can then use the /scrap
endpoint to scrape websites.
This server is deployed on Vercel. You can access it here.