Git Product home page Git Product logo

googlesearch's Introduction

googlesearch

googlesearch is a Python library for searching Google, easily. googlesearch uses requests and BeautifulSoup4 to scrape Google.

Installation

To install, run the following command:

python3 -m pip install googlesearch-python

Usage

To get results for a search term, simply use the search function in googlesearch. For example, to get results for "Google" in Google, just run the following program:

from googlesearch import search
search("Google")

Additional options

googlesearch supports a few additional options. By default, googlesearch returns 10 results. This can be changed. To get a 100 results on Google for example, run the following program.

from googlesearch import search
search("Google", num_results=100)

In addition, you can change the language google searches in. For example, to get results in French run the following program:

from googlesearch import search
search("Google", lang="fr")

To extract more information, such as the description or the result URL, use an advanced search:

from googlesearch import search
search("Google", advanced=True)
# Returns a list of SearchResult
# Properties:
# - title
# - url
# - description

If requesting more than 100 results, googlesearch will send multiple requests to go through the pages. To increase the time between these requests, use sleep_interval:

from googlesearch import search
search("Google", sleep_interval=5, num_results=200)

googlesearch's People

Contributors

chc-tw avatar codemee avatar denvercoder1 avatar flashnuke avatar inscribedeeper avatar mohammed-ashour avatar nv7-github avatar omerfi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

googlesearch's Issues

Having issue with www.google.com

I'm trying to fetch google search results via a socks5 proxy. However, I encountered this. I can access google via requests.get(), though.

search('Zhang, Hongzhang CAS - Dalian Institute of Chemical Physics')
urllib3.exceptions.MaxRetryError: SOCKSHTTPSConnectionPool(host='www.google.com', port=443): Max retries exceeded with url: /search?q=Zhang,+Hongzhang+CAS+-+Dalian+Institute+of+Chemical+Physics&num=11&hl=en (Caused by SSLError(SSLCertVerificationError("hostname 'www.google.com' doesn't match either of 'dropbox.com', 'www.dropbox.com', 'support.dropbox.com', 'live.dropbox.com', 'opensource.dropbox.com', 'linux.dropbox.com', 'texter.dropbox.com'")))

googlesearch giving wrong url

So, what I'm trying to do is to open the first result of every search in google via python to access its website. Everything works fine until I insert the keyword "gmail", it's always opening a Wikipedia page like this A person from StackOverFlow suggest I come here to ask for help.

import webbrowser
from googlesearch import search

def open_website(name):
    def get_website(name):
        try:
            for web in search(name,stop=1):
                webbrowser.open(web)
                return True
        except Exception:
            return "I can't open your website, check your connection"
    if get_website(name) == None:
        return("Your website doesn't exist")
    else:
        return "Opening "+name+"..."
    
open_website("gmail") #you can use "facebook", "instagram", "duolingo" or something like that

Returning object generator vice the search result

Getting an odd behavior where printing out a list of just the result gives me the usable link that I want, but not the individual result itself (returns the generator object search):

image

I can get around what I'm even using this for in the meantime by just looking for the first list entry but don't believe this is intended

def search never returns if too few results

Cool library btw.

Small issue when few results returned by google:

In the function def search() if there are too few results, the function never returns.

For example, passing num_results=10 but Google returns less than 10 results, the function will run forever in the loop "while start < num_results:"

A simple way to fix it is using a counter that loops on each sleep of the function.

#init count at the top of the function
count = 0

#check the count inside the "while start < num_results:" loop
if count < num_results:
print("Too few results")
break

#loop the count above the "sleep(sleep_interval)" code
count += 1.

Extract ads links (possible improvement)

Hi,

Thanks for this development i find it very useful and interesting.

But, how can we get the links at the top of the research (paid urls-ads)?

Thanks a lot!

Some requests break the search

I find out there are some queries that make the search function stall for over 1 minute, then they return 429, regardless of waiting time.
Ex. "Malaysia sugar tax, RM0.40 (US$0.086) per litre, more than 5 grams/100ml" takes a few seconds to retrieve the first 2 links, but at the 3rd, it makes me wait 1:30 mins, then returns 429, and the IP is unusable.
I tried the same query on Google Colab (that should not use my IP), yet, to be sure, I also tried switching internet connection to the phone hotspot and using an EC2, and all lead to the same results (breaking at the 3rd link of the same query): some queries can break the algorithm

Ideally, we should use the timeout params for the requests, but (I tried) it does not work in the case above.
While adding delays or user-agent can help prevent the 429 as a whole, I think this specific issue still needs to be addressed.

How can I use proxy in this library.

I have code like this:

for j in search(usermessage + site, num_results=5, lang='ru', sleep_interval=2, proxy='http://104.26.2.46:80'):
responce.append(j)

I installed a proxy but it doesn't work.
I saw issues where this topic was raised, but I still didn’t understand how to use a proxy.

Search results are completely different than what it should be

Piece of code:

query = "Philippines Palawan"
for result in search(query, num_results=50):
    if "wikipedia" in result: do something
    else: something else

This gives me the following list:

https://www.tripadvisor.com/Tourism-g294255-Palawan_Island_Palawan_Province_Mimaropa-Vacations.html
https://www.hotels.com/go/philippines/best-palawan-things-to-do
https://guidetothephilippines.ph/articles/ultimate-guides/palawan-travel-guide
https://philippinetourismusa.com/top-destinations/palawan/
https://www.connections.be/en/tours/philippines/the-palawan-adventure
https://stock.adobe.com/search?k=%22palawan%20philippines%22
https://www.cntraveler.com/galleries/2015-07-13/visiting-the-most-beautiful-island-in-the-world-palawan-philippines
https://philippines.travel/destinations/palawan
https://en.wikivoyage.org/wiki/Palawan
https://www.travel-palawan.com/about-palawan/
https://www.lightfoottravel.com/en/asia/philippines/palawan
https://www.elnidoresorts.com/en/
https://www.instagram.com/dream_palawan/?hl=en
http://www.gis-reseau-asie.org/en/philippines-last-frontier-what-perspectives-palawan
https://pcsd.gov.ph/
https://ebird.org/region/PH-PLW
https://palawan-news.com/
https://en.unesco.org/biosphere/aspac/palawan
https://www.connections.be/en/tours/philippines/the-palawan-adventure
https://stock.adobe.com/search?k=%22palawan%20philippines%22
https://philippines.travel/destinations/palawan
https://en.wikivoyage.org/wiki/Palawan
https://www.travel-palawan.com/about-palawan/
https://www.lightfoottravel.com/en/asia/philippines/palawan
https://www.elnidoresorts.com/en/
https://www.instagram.com/dream_palawan/?hl=en
http://www.gis-reseau-asie.org/en/philippines-last-frontier-what-perspectives-palawan
https://pcsd.gov.ph/
https://ebird.org/region/PH-PLW
https://palawan-news.com/
https://en.unesco.org/biosphere/aspac/palawan
https://en.wikivoyage.org/wiki/Palawan
https://www.travel-palawan.com/about-palawan/
https://www.lightfoottravel.com/en/asia/philippines/palawan
https://www.elnidoresorts.com/en/
https://www.instagram.com/dream_palawan/?hl=en
http://www.gis-reseau-asie.org/en/philippines-last-frontier-what-perspectives-palawan
https://pcsd.gov.ph/
https://ebird.org/region/PH-PLW
https://palawan-news.com/
https://en.unesco.org/biosphere/aspac/palawan
https://ebird.org/region/PH-PLW
http://www.gis-reseau-asie.org/en/philippines-last-frontier-what-perspectives-palawan
https://palawan-news.com/
https://en.unesco.org/biosphere/aspac/palawan
https://ebird.org/region/PH-PLW
https://palawan-news.com/
https://en.unesco.org/biosphere/aspac/palawan
https://en.unesco.org/biosphere/aspac/palawan
https://en.unesco.org/biosphere/aspac/palawan

If, when I do a google search, wikipedia is the exact first result, why can't I find it in 50 results through the program?

Description comes incomplete when there is dates on it

Hello everyone.. this is a great project, I hope to get bigger and better... One improvment I'll try to implement is that when the result has dates on the description, the code cuts the description part and gets only the date.. example...

image
image
image

I see what is the issue on the code, I'll try to fix.. anyway I'll create the issue to keep track of it! :D

Infinite loop fetching when using `search` function

It appears the search function is broken, and calls to the search function get stuck in an infinite loop.

You can reproduce this easily with a simple script like this one:

from googlesearch import search
import logging

logging.basicConfig(level=logging.DEBUG)

print("Starting search...")
res = search("nhl bowen byram")
print("Finished search.")
list_of_urls = [x for x in res]
print(list_of_urls)

Also tried to just convert the generator to a list with the same outcome:

from googlesearch import search
import logging

logging.basicConfig(level=logging.DEBUG)

print("Starting search...")
res = search("nhl bowen byram")
print(list(res))
print("Finished search.")

The output of the following:

Hello World
finished
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): www.google.com:443
DEBUG:urllib3.connectionpool:https://www.google.com:443 "GET /search?q=nhl%2Bbowen%2Bbyram&num=12&hl=en&start=0 HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): www.google.com:443
DEBUG:urllib3.connectionpool:https://www.google.com:443 "GET /search?q=nhl%2Bbowen%2Bbyram&num=12&hl=en&start=0 HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): www.google.com:443
DEBUG:urllib3.connectionpool:https://www.google.com:443 "GET /search?q=nhl%2Bbowen%2Bbyram&num=12&hl=en&start=0 HTTP/1.1" 200 None

To debug this further, I put trace statements in the package, and it looks like start and num_results are never updated:

    # Fetch
    start = 0
    while start < num_results:
        print(start, num_results)
        # Send request
        resp = _req(escaped_term, num_results - start,
                    lang, start, proxies, timeout)

Result:

01/30/2024 08:31:43 AM Results from Google: <generator object search at 0x122abcb30>
0 10
01/30/2024 08:31:43 AM Starting new HTTPS connection (1): www.google.com:443
01/30/2024 08:31:43 AM https://www.google.com:443 "GET /search?q=nhl%2Bbowen%2Bbyram&num=12&hl=en&start=0 HTTP/1.1" 200 None
0 10
01/30/2024 08:31:44 AM Starting new HTTPS connection (1): www.google.com:443
01/30/2024 08:31:44 AM https://www.google.com:443 "GET /search?q=nhl%2Bbowen%2Bbyram&num=12&hl=en&start=0 HTTP/1.1" 200 None
0 10
01/30/2024 08:31:45 AM Starting new HTTPS connection (1): www.google.com:443
01/30/2024 08:31:45 AM https://www.google.com:443 "GET /search?q=nhl%2Bbowen%2Bbyram&num=12&hl=en&start=0 HTTP/1.1" 200 None
0 10
01/30/2024 08:31:45 AM Starting new HTTPS connection (1): www.google.com:443
01/30/2024 08:31:45 AM https://www.google.com:443 "GET /search?q=nhl%2Bbowen%2Bbyram&num=12&hl=en&start=0 HTTP/1.1" 200 None
0 10
01/30/2024 08:31:45 AM Starting new HTTPS connection (1): www.google.com:443
01/30/2024 08:31:45 AM https://www.google.com:443 "GET /search?q=nhl%2Bbowen%2Bbyram&num=12&hl=en&start=0 HTTP/1.1" 200 None

search word with quotation

Hi.
I search in google "laminated glass" and "polyvinyl butyral" and see multiple pages results; but when I send this phrase to googlesearch module; I am not getting any results.
why? and how I can to fix it?
(I am need exact result of this phrase.)
this is my code:

    from googlesearch import search

    phrase = '"laminated glass" and "polyvinyl butyral"'
    results = search(phrase)
    res_list = list(results)

thanks

Import Error

The error says: cannot import name 'search' from 'googlesearch'

So, I need help regarding this error.

generator object search at (solved)

Hi! i try this code:

from googlesearch import search
search("Google", num_results=100)

and, this is the response:

<generator object search at 0x000001C366222960>

any idea?
thanks

Weird user-agent behavior and infinite while loop risk.

Thank you for the 1.2.2 update.

I've be working on the project a bit over the past week, I have noted some potential problems;

The user_agents.py feature is a welcomed addition, but it made the code fail for me (I am located in Europe?).
v1.1.0 had a static recent (and common) user agent (Windows 10), but the user agent list in the file has many highly specific/unused user-agents.

Thus, each time I attempted a request Google redirected me to a consent url for cookies validation

It appears that google (at least in my region) flags unused/weird user agents and prompt them to accept cookies.

To accept the cookies programmatically and not have to write a specific POST function to accept/reject the cookies you can pass this in your headers. It does the trick and you land on the right page request;

headers = {
            "User-Agent": self.user_agent,
            "Cookie": "CONSENT=YES+cb.20220302-17-p0.en+FX+100; NID=0"
        }

This led to another problem: getting redirected on the consent page (or even retrieving the html from the session with the cookies headers) led to a somewhat different html structure that did not contain:

result_block = soup.find_all('div', attrs={'class': 'g'})

--> result_block takes the value of an empty list

Because further down the while loop, start is only incremented if link and title and description exist, this results in an infinite while loop.

I would suggest to consolidate the code to provide and provide an exit in the case result block is an empty list?

Using widely used and recent user-agents using latest OSX, or Windows 10 seems to do the trick for me in the meantime..

import random

def get_useragent():
    return random.choice(_useragent_list)

_useragent_list = [
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36',
    'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36 Edg/111.0.1661.62',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/111.0'
]

What a cool module!

Sorry it's not really an issue I'm addressing about.

When using the googlesearch module's search function, can an additional parameter such as specifying a site: www.wikipedia.com to Google Index be accepted? I want to limit the results so that when I execute the search function, all related URL links, based on the given query, gets retrieved from sources in the Wikipedia website.

search specific (e.g. google news)

is there anyway to search through a specific section of google such as google news or google finance? I tried changing the init file where
url="https://www.news.google.com/search",

added news between www and google
is it not that simple, sorry for noob question!
there was no end to it, it would just continue to run

metadata-generation-failed error

I encountered an error while trying to install a package using pip. The error message is "metadata-generation-failed".

Step to Reproduce

Run the following command in the terminal:
python3 -m pip install googlesearch-python

Collecting googlesearch-python
  Using cached googlesearch-python-1.2.0.tar.gz (7.4 kB)
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error
  
  × python setup.py egg_info did not run successfully.
  │ exit code: 1

note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

Version used for development

pip 23.0.1
Python 3.10.5

search() got an unexpected keyword argument 'num_results'

From yesterday onwards the package is not working properly when used with num_results
se = search("Google",num_results=10,lang="en")

1 from googlesearch import search
----> 2 se = search("Google",num_results=10,lang="en")
3 print(se)

Without it, it's working fine.

adding "+" while dorking

Rewrited Code Be Like:
instead of:
def search(term, num_results=10, lang="en", proxy=None, advanced=False): escaped_term = term.replace(' ', '+')

you could use:

def search(term, num_results=10, lang="en", proxy=None, advanced=False): escaped_term = term.replace(' ', '%20')

Does this library violate Google's Terms of Service?

Automated queries
Google's Terms of Service don't allow the sending of automated queries of any sort to our system without express permission in advance from Google. Sending automated queries consumes resources and includes using any software (such as WebPosition Gold) that sends automated queries to Google to determine how a website or web page ranks in Google search results for various queries. In addition to rank checking, other types of automated access to Google without permission are also a violation of our Webmaster Guidelines and Terms of Service.

https://developers.google.com/search/docs/advanced/guidelines/automated-queries

Loosen requirements on dependencies

Currently a specific version of beautifulsoup4 and requests is specified, but it would be nice to loosen this up so this model can be used with other modules more easily

Build file error (with fix)

I think your build file for building the wheel is not fully correct. To build the wheel using
python -m build .
you should do

  1. Add a MANIFEST.in file in the root directory containing
include requirements.txt
  1. Change your setup.py slightly to
from setuptools import setup

with open("README.md", "r", encoding='UTF-8') as fh:
    long_description = fh.read()

with open("requirements.txt", "r", encoding='UTF-8') as fh:
    requirements = fh.read().split("\n")

setup(
    name="googlesearch-python",
    version="1.2.0",
    author="Nishant Vikramaditya",
    author_email="[email protected]",
    description="A Python library for scraping the Google search engine.",
    long_description=long_description,
    long_description_content_type="text/markdown",
    url="https://github.com/Nv7-GitHub/googlesearch",
    packages=["googlesearch"],
    classifiers=[
        "Programming Language :: Python :: 3",
        "License :: OSI Approved :: MIT License",
        "Operating System :: OS Independent",
    ],
    python_requires=">=3.6",
    install_requires=requirements,
    include_package_data=True,  # Include additional files specified in MANIFEST.in
)

Tested with python >=3.9 it builds the wheel and the source dist.

Does this require another library for proxy use?

I'm getting error

For a simple example search with proxy

search("example search", proxy="http://191.243.218.249:53281")

I get the error:

Max retries exceeded with url: /search?q=something&num=12&hl=en&start=0 (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:1108)')))

Any suggestions to fix?

I've tried to solve with https://stackoverflow.com/questions/33410577/python-requests-exceptions-sslerror-eof-occurred-in-violation-of-protocol

i change the user agent and stop getting result

Hello, I'm using this beautiful library and I really appreciate your hard work on it.

I have a problem when I make a simple change to the code.

I can't get any result when I make the user agent picked randomly from my file.

this is my query :
`
from googlesearch import search

t=search("google", num_results=10)
print (t)
`

but these are the changes I made in the code :

`
from bs4 import BeautifulSoup
from requests import get
import random
random_pick = open('useragent').read().splitlines()
user_agent =random.choice(random_pick)

def search(term, num_results=10, lang="en", proxy=None):
usr_agent = {
'User-Agent': user_agent}

def fetch_results(search_term, number_results, language_code):
    escaped_search_term = search_term.replace(' ', '+')

    google_url = 'https://www.google.com/search?q={}&num={}&hl={}'.format(escaped_search_term, number_results + 1,
                                                                          language_code)
    proxies = None
    if proxy:
        if proxy[:5] == "https":
            proxies = {"https": proxy}
        else:
            proxies = {"http": proxy}

    response = get(google_url, headers=usr_agent, proxies=proxies)
    response.raise_for_status()

    return response.text

def parse_results(raw_html):
    soup = BeautifulSoup(raw_html, 'html.parser')
    result_block = soup.find_all('div', attrs={'class': 'g'})
    for result in result_block:
        link = result.find('a', href=True)
        title = result.find('h3')
        if link and title:
            yield link['href']

html = fetch_results(term, num_results, lang)
return list(parse_results(html))

`
the result I got is empty list --> []
could you please figure out why? only works when I put fixed user agent

Google "Featured Snippets"

Is there a way to get the featured snippets that Google shows when you search for something instead of just links?

why doesn't this search code work?

import json
import random
import re
import requests
from bs4 import BeautifulSoup

try:
        from googlesearch import search
        search("My name is Prince", num_results=100, lang="fr")  #search on google "My name is Prince"

        print(search)

except ImportError:
    print("No module named 'google' found")

    for j in search(query, tld="co", num=10, stop=10, pause=2):
         print(j)

Anyone facing Error 429? Client Error: Too Many Requests

I am receiving 429 error after 5 or 6 searches. I know the code does randomly choose a different user agent which shouldn't bring this error, however I did get it. I'm running my script from the command promt and wanted to check if other uses are also facing the same issue. I also do not see any Retry After in the error message which doesn't help.

image

I would request the community to provide any insights on this or workarounds if any.

Thanks in advance.

Google cloud plateform error 429

Good Morning,

I have a 429 error when I use Google search on Google cloud plateform ( Google functions) but I don't have errors when I use it directly on my workstation. Have you got an Idea ?

maximum number of results returned

Hi there,

thank you for this brilliant dev work. just a minor thing, I noticed that the maximum number of results seems to be 100.

below is the simple line to run:

search('data', 500)

would be possible to help why and enable higher limit?

Thank you so much.
L

parameter num_results related error

for j in search(query, num_results=15):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: search() got an unexpected keyword argument 'num_results'

code:
from googlesearch import search

def search(query):
results = []
for j in search(query, num_results=15):
results.append(j)
return results

query = input("What do you want to search for? ")
results = search(query)
print(f"Here are the top {len(results)} results for your search:")
for result in results:
print(result)

version info
Name: googlesearch-python
Version: 1.2.3

Python 3.11.3
pip 23.1.1

HTTP 429 Too Many Requests

How many requests does Google allow to their website per day?
I keep getting this

  File "/googlesearch/__init__.py", line 305, in search
    html = get_page(url, user_agent, verify_ssl)
  File "/googlesearch/__init__.py", line 174, in get_page
    response = urlopen(request)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 531, in open
    response = meth(req, response)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 641, in http_response
    'http', request, response, code, msg, hdrs)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 563, in error
    result = self._call_chain(*args)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 503, in _call_chain
    result = func(*args)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 755, in http_error_302
    return self.parent.open(new, timeout=req.timeout)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 531, in open
    response = meth(req, response)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 641, in http_response
    'http', request, response, code, msg, hdrs)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 569, in error
    return self._call_chain(*args)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 503, in _call_chain
    result = func(*args)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 649, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 429: Too Many Requests

search() gets stuck on infinite "while loop" if the search term has zero results

start = 0
while start < num_results:
# Send request
resp = _req(escaped_term, num_results-start, lang, start, proxies)
# Parse
soup = BeautifulSoup(resp.text, 'html.parser')
result_block = soup.find_all('div', attrs={'class': 'g'})
for result in result_block:
# Find link, title, description
link = result.find('a', href=True)
title = result.find('h3')
description_box = result.find('div', {'style': '-webkit-line-clamp:2'})
if description_box:
description = description_box.find('span')
if link and title and description:
start += 1
if advanced:
yield SearchResult(link['href'], title.text, description.text)
else:
yield link['href']

Code cannot exit the while loop if the value of start never goes up, and the value of start only ever goes up if a valid result is yielded.
For search terms that have zero results, this results on the code getting stuck trying to find something forever.

Unable to use proxy

API_KEY = "2fdb6ced427de857f32870d733fe69b0"
query = "loli"
proxy = f"http://scraperapi.country_code=us:{API_KEY}@proxy-server.scraperapi.com:8001"

loli_links = [link for link in search(term=query, proxy=proxy)]

Not returning results

I have a script that was using this library and it always worked normally. But today I went to use the script and lib is no longer returning the result of URLs. the error returned is "429 Client Error"

Update dependecies

I have a project where I'm using this package and BeautifulSoup4, but I can't update to 4.9.3 since this package depends on BeautifulSoup4==4.9.1.

I have tested this package locally with 4.9.3 and it seems to be working correctly.

Additionally, I noticed there are many extra dependencies in requirements.txt that don't seem to be used in the code. Could it be simplified to just the packages that are used?

Zero Search Result

there is a case which searching google returns no result ( because of mistyping or other reasons) so there will be an infinite loop in search function:

    while start < num_results:
        # Send request
        resp = _req(escaped_term, num_results - start,
                    lang, start, proxies, timeout)

        # Parse
        soup = BeautifulSoup(resp.text, "html.parser")
        result_block = soup.find_all("div", attrs={"class": "g"})
        for result in result_block:
            # Find link, title, description
            link = result.find("a", href=True)
            title = result.find("h3")
            description_box = result.find(
                "div", {"style": "-webkit-line-clamp:2"})
            if description_box:
                description = description_box.text
                if link and title and description:
                    start += 1
                    if advanced:
                        yield SearchResult(link["href"], title.text, description)
                    else:
                        yield link["href"]
        sleep(sleep_interval)

the result_block is nothing so the start parameters remains always below num_result
So I add a break point to line before sleep line

        if start==0: break

Newest changes have not yet been released on Pypi

Hi @Nv7-GitHub,

I've been using this package, but by using it, it is preventing me from upgrading my beatifulsoup4 dependency from 4.9.1 to 4.9.3 since the released version of this package depends on 4.9.1.

The current requirements.txt requests beatifulsoup4 4.9.3, but this version of the file is not released on Pypi and cannot be installed with pip.

If you have the time, it would be great if you would be able to publish a new release.

You may want to consider a GitHub action to upload a Python Package using Twine when a release is created.

You may also want to consider adding Dependabot for more easily updating dependency versions.

Let me know if I can help with anything.

Sometimes returns the href of the search query as a result

the following bit of code prints out the search query as a result for me:

from googlesearch import search

se = search("Google",num_results=10,lang="en")

print(se[-1])

here's what the output of the script looks like : /search?q=Google&num=11&hl=en&tbm=isch&source=iu&ictx=1&fir=mM5eejaz-bUIsM%252C0UCf55-GTy6fDM%252C%252Fm%252F045c7b&vet=1&usg=AI4_-kS3fhB6I4-4YGkbI-0POxk60cjoEw&sa=X&ved=2ahUKEwi8y-PKyuzzAhVRZc0KHSjABlkQ_B16BAhIEAI#imgrc=mM5eejaz-bUIsM

It seems like this is happening because of the descriptive result with more information that comes up on the right side. The package tries to process it like any other result and ends up selecting an element with no href value and tried to get the href value. The default action of BeautifulSoup is just to return the pages href value it seems and so that's how it end up at the back of the results.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.