Git Product home page Git Product logo

sushil-rgb / yellowpage-scraper Goto Github PK

View Code? Open in Web Editor NEW
9.0 2.0 2.0 55.88 MB

A YellowPage scraper is a Python program/script that extracts data from the YellowPages.com website using the Python programming language. The scraper can be used to gather information such as business names, addresses, phone numbers, emails and reviews from the YellowPages website.

Home Page: https://www.yellowpages.com

License: GNU General Public License v3.0

Python 100.00%
automation leadgeneration playwright-python python webscraper dataisbeautiful yellowpage-crawling asynchornous asyncio concurrent-programming

yellowpage-scraper's Introduction

YellowPage-scraper

Welcome to the Yellowpage Webscraper using Python Playwright! This repository contains the code for a web scraper that can extract information from yellow pages websites. The scraper uses the Python Playwright library to automate the process of browsing and extracting data from the website. To get started, you will need to have Python and and the necessary requirements installed on your machine. You can install Playwright by running the following command:

pip install -r requirements.txt
playwright install

The repository includes the following files:

scraper.py: This is the main script that initiate the automation. tools.py: This file contains the main code for the scrapera. output.xlsx: This file will be created by the script and will contain the extracted data in xlsx format.

To run the script, simply navigate to the repository directory and run the following command:

python scraper.py

The script will then start extracting data from the website based on the configuration settings and will save the data to the output.xlsx file.

Please note that the script is designed to work with yellow pages websites and may not work with other types of websites. Additionally, the script may be blocked by the website if it detects excessive scraping activity, so please use it responsibly.

If you have any issues or suggestions for improvements, please feel free to open an issue on the repository or submit a pull request.

Thank you for using the Yellowpage!

yellowpage-scraper's People

Contributors

sushil-rgb avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

nusiloot jyaba

yellowpage-scraper's Issues

Sync API vs. Async API

Hello:

I'm getting the following error and need help resolving the issue.

Error: It looks like you are using Playwright Sync API inside the asyncio loop.
Please use the Async API instead.

Thanks,

Manny

Getting playwright error

Hi I was trying your code, I'm getting error in scraper.py. (I'm using jupyter notebook). Below is the complete error


Error Traceback (most recent call last)
Cell In[5], line 15
12 data = yellowPages(p, make_headless)
13 return data
---> 15 print(yellow())
17 total_time = round(time.time()-start_time, 2)
18 time_in_secs = round(total_time)

Cell In[5], line 11, in yellow()
10 def yellow():
---> 11 with sync_playwright() as p:
12 data = yellowPages(p, make_headless)
13 return data

File c:\Users\106141\AppData\Local\Programs\Python\Python38\lib\site-packages\playwright\sync_api_context_manager.py:44, in PlaywrightContextManager.enter(self)
42 self._own_loop = True
43 if self._loop.is_running():
---> 44 raise Error(
45 """It looks like you are using Playwright Sync API inside the asyncio loop.
46 Please use the Async API instead."""
47 )
49 # In Python 3.7, asyncio.Process.wait() hangs because it does not use ThreadedChildWatcher
50 # which is used in Python 3.8+. This is unix specific and also takes care about
51 # cleaning up zombie processes. See https://bugs.python.org/issue35621
...
56 and isinstance(asyncio.get_child_watcher(), asyncio.SafeChildWatcher)
57 ):

Error: It looks like you are using Playwright Sync API inside the asyncio loop.
Please use the Async API instead.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.