nv7-github / googlesearch Goto Github PK

View Code? Open in Web Editor NEW

405.0 6.0 103.0 26 KB

A Python library for scraping the Google search engine.

Home Page: https://pypi.org/project/googlesearch-python/

License: MIT License

Python 100.00%

python-library python python3

googlesearch's Introduction

googlesearch

googlesearch is a Python library for searching Google, easily. googlesearch uses requests and BeautifulSoup4 to scrape Google.

Installation

To install, run the following command:

python3 -m pip install googlesearch-python

Usage

To get results for a search term, simply use the search function in googlesearch. For example, to get results for "Google" in Google, just run the following program:

from googlesearch import search
search("Google")

Additional options

googlesearch supports a few additional options. By default, googlesearch returns 10 results. This can be changed. To get a 100 results on Google for example, run the following program.

from googlesearch import search
search("Google", num_results=100)

In addition, you can change the language google searches in. For example, to get results in French run the following program:

from googlesearch import search
search("Google", lang="fr")

To extract more information, such as the description or the result URL, use an advanced search:

from googlesearch import search
search("Google", advanced=True)
# Returns a list of SearchResult
# Properties:
# - title
# - url
# - description

If requesting more than 100 results, googlesearch will send multiple requests to go through the pages. To increase the time between these requests, use sleep_interval:

from googlesearch import search
search("Google", sleep_interval=5, num_results=200)

googlesearch's People

Contributors

Stargazers

Watchers

Forkers

eima1995 jaewhanlee sparkingdark dardem carlosplanchon ksmaheshkumar sayantanr p6l yesiming skygrip mohammed-ashour denvercoderone michaelkole mahekanm poiedk spsyco infinity-plus jaric dddouble roeravid agilicus an0nym0u5101 crackercat ethan0429 muokicaleb vishalmohanty nto4 thehappydinoa joarleymoraes 5l1v3r1 jquinn147 yutarour roveil wherrera10 liquidpeachy sky-ranko itisdb mhmtbsbyndr verisimilitudex michaelspece sounritesh dekoder jericjan ralfons-06 sothatmay jigarcognino csnelsonchu tarcisiojr cybersekure kinged007 xream henrykautz tomviner mofe23 gioelecerati rhkdgh255 myncow iamrealwilson kingpossum cristian-delosrios inscribedeeper arpitjain799 codemee fersalom fcgca chc-tw nkalla-0906 ultrontheai aramir94 alex-ht lewis-morris jazzzchan alexmerm soldera omar6995 kellyaka mukthy stevenroh zhiyiyeo postin power5845 smit-kukadiya engjellavdiu ethicalsecurity-agency jeremystevens sorokinvld vineetp6 nneji123 pimotruko parsavarposhti advancedai-nl vincentx0905 aayush518 aasthapaudel kabinh07 felvin-search zhuanganjun benjaminlevy sb-298 cprite

googlesearch's Issues

Having issue with www.google.com

I'm trying to fetch google search results via a socks5 proxy. However, I encountered this. I can access google via requests.get(), though.

search('Zhang, Hongzhang CAS - Dalian Institute of Chemical Physics')

urllib3.exceptions.MaxRetryError: SOCKSHTTPSConnectionPool(host='www.google.com', port=443): Max retries exceeded with url: /search?q=Zhang,+Hongzhang+CAS+-+Dalian+Institute+of+Chemical+Physics&num=11&hl=en (Caused by SSLError(SSLCertVerificationError("hostname 'www.google.com' doesn't match either of 'dropbox.com', 'www.dropbox.com', 'support.dropbox.com', 'live.dropbox.com', 'opensource.dropbox.com', 'linux.dropbox.com', 'texter.dropbox.com'")))

googlesearch giving wrong url

So, what I'm trying to do is to open the first result of every search in google via python to access its website. Everything works fine until I insert the keyword "gmail", it's always opening a Wikipedia page like this A person from StackOverFlow suggest I come here to ask for help.

import webbrowser
from googlesearch import search

def open_website(name):
    def get_website(name):
        try:
            for web in search(name,stop=1):
                webbrowser.open(web)
                return True
        except Exception:
            return "I can't open your website, check your connection"
    if get_website(name) == None:
        return("Your website doesn't exist")
    else:
        return "Opening "+name+"..."
    
open_website("gmail") #you can use "facebook", "instagram", "duolingo" or something like that

Can I give date (start and end) as function input?

Returning object generator vice the search result

Getting an odd behavior where printing out a list of just the result gives me the usable link that I want, but not the individual result itself (returns the generator object search):

I can get around what I'm even using this for in the meantime by just looking for the first list entry but don't believe this is intended

def search never returns if too few results

Cool library btw.

Small issue when few results returned by google:

In the function def search() if there are too few results, the function never returns.

For example, passing num_results=10 but Google returns less than 10 results, the function will run forever in the loop "while start < num_results:"

A simple way to fix it is using a counter that loops on each sleep of the function.

#init count at the top of the function
count = 0

#check the count inside the "while start < num_results:" loop
if count < num_results:
print("Too few results")
break

#loop the count above the "sleep(sleep_interval)" code
count += 1.

Access to number of hits

Does the package provide number of results?

Extract ads links (possible improvement)

Hi,

Thanks for this development i find it very useful and interesting.

But, how can we get the links at the top of the research (paid urls-ads)?

Thanks a lot!

Some requests break the search

I find out there are some queries that make the search function stall for over 1 minute, then they return 429, regardless of waiting time.
Ex. "Malaysia sugar tax, RM0.40 (US$0.086) per litre, more than 5 grams/100ml" takes a few seconds to retrieve the first 2 links, but at the 3rd, it makes me wait 1:30 mins, then returns 429, and the IP is unusable.
I tried the same query on Google Colab (that should not use my IP), yet, to be sure, I also tried switching internet connection to the phone hotspot and using an EC2, and all lead to the same results (breaking at the 3rd link of the same query): some queries can break the algorithm

Ideally, we should use the timeout params for the requests, but (I tried) it does not work in the case above.
While adding delays or user-agent can help prevent the 429 as a whole, I think this specific issue still needs to be addressed.

How can I use proxy in this library.

I have code like this:

for j in search(usermessage + site, num_results=5, lang='ru', sleep_interval=2, proxy='http://104.26.2.46:80'):
responce.append(j)

I installed a proxy but it doesn't work.
I saw issues where this topic was raised, but I still didn’t understand how to use a proxy.

Search results are completely different than what it should be

Piece of code:

query = "Philippines Palawan"
for result in search(query, num_results=50):
    if "wikipedia" in result: do something
    else: something else

This gives me the following list:

https://www.tripadvisor.com/Tourism-g294255-Palawan_Island_Palawan_Province_Mimaropa-Vacations.html
https://www.hotels.com/go/philippines/best-palawan-things-to-do
https://guidetothephilippines.ph/articles/ultimate-guides/palawan-travel-guide
https://philippinetourismusa.com/top-destinations/palawan/
https://www.connections.be/en/tours/philippines/the-palawan-adventure
https://stock.adobe.com/search?k=%22palawan%20philippines%22
https://www.cntraveler.com/galleries/2015-07-13/visiting-the-most-beautiful-island-in-the-world-palawan-philippines
https://philippines.travel/destinations/palawan
https://en.wikivoyage.org/wiki/Palawan
https://www.travel-palawan.com/about-palawan/
https://www.lightfoottravel.com/en/asia/philippines/palawan
https://www.elnidoresorts.com/en/
https://www.instagram.com/dream_palawan/?hl=en
http://www.gis-reseau-asie.org/en/philippines-last-frontier-what-perspectives-palawan
https://pcsd.gov.ph/
https://ebird.org/region/PH-PLW
https://palawan-news.com/
https://en.unesco.org/biosphere/aspac/palawan
https://www.connections.be/en/tours/philippines/the-palawan-adventure
https://stock.adobe.com/search?k=%22palawan%20philippines%22
https://philippines.travel/destinations/palawan
https://en.wikivoyage.org/wiki/Palawan
https://www.travel-palawan.com/about-palawan/
https://www.lightfoottravel.com/en/asia/philippines/palawan
https://www.elnidoresorts.com/en/
https://www.instagram.com/dream_palawan/?hl=en
http://www.gis-reseau-asie.org/en/philippines-last-frontier-what-perspectives-palawan
https://pcsd.gov.ph/
https://ebird.org/region/PH-PLW
https://palawan-news.com/
https://en.unesco.org/biosphere/aspac/palawan
https://en.wikivoyage.org/wiki/Palawan
https://www.travel-palawan.com/about-palawan/
https://www.lightfoottravel.com/en/asia/philippines/palawan
https://www.elnidoresorts.com/en/
https://www.instagram.com/dream_palawan/?hl=en
http://www.gis-reseau-asie.org/en/philippines-last-frontier-what-perspectives-palawan
https://pcsd.gov.ph/
https://ebird.org/region/PH-PLW
https://palawan-news.com/
https://en.unesco.org/biosphere/aspac/palawan
https://ebird.org/region/PH-PLW
http://www.gis-reseau-asie.org/en/philippines-last-frontier-what-perspectives-palawan
https://palawan-news.com/
https://en.unesco.org/biosphere/aspac/palawan
https://ebird.org/region/PH-PLW
https://palawan-news.com/
https://en.unesco.org/biosphere/aspac/palawan
https://en.unesco.org/biosphere/aspac/palawan
https://en.unesco.org/biosphere/aspac/palawan

If, when I do a google search, wikipedia is the exact first result, why can't I find it in 50 results through the program?

Description comes incomplete when there is dates on it

Hello everyone.. this is a great project, I hope to get bigger and better... One improvment I'll try to implement is that when the result has dates on the description, the code cuts the description part and gets only the date.. example...

I see what is the issue on the code, I'll try to fix.. anyway I'll create the issue to keep track of it! :D

Will my IP address be block by google if I perform 5 searches every 30 seconds?

Infinite loop fetching when using `search` function

It appears the search function is broken, and calls to the search function get stuck in an infinite loop.

You can reproduce this easily with a simple script like this one:

from googlesearch import search
import logging

logging.basicConfig(level=logging.DEBUG)

print("Starting search...")
res = search("nhl bowen byram")
print("Finished search.")
list_of_urls = [x for x in res]
print(list_of_urls)

Also tried to just convert the generator to a list with the same outcome:

from googlesearch import search
import logging

logging.basicConfig(level=logging.DEBUG)

print("Starting search...")
res = search("nhl bowen byram")
print(list(res))
print("Finished search.")

The output of the following:

Hello World
finished
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): www.google.com:443
DEBUG:urllib3.connectionpool:https://www.google.com:443 "GET /search?q=nhl%2Bbowen%2Bbyram&num=12&hl=en&start=0 HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): www.google.com:443
DEBUG:urllib3.connectionpool:https://www.google.com:443 "GET /search?q=nhl%2Bbowen%2Bbyram&num=12&hl=en&start=0 HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): www.google.com:443
DEBUG:urllib3.connectionpool:https://www.google.com:443 "GET /search?q=nhl%2Bbowen%2Bbyram&num=12&hl=en&start=0 HTTP/1.1" 200 None

To debug this further, I put trace statements in the package, and it looks like start and num_results are never updated:

    # Fetch
    start = 0
    while start < num_results:
        print(start, num_results)
        # Send request
        resp = _req(escaped_term, num_results - start,
                    lang, start, proxies, timeout)

Result:

01/30/2024 08:31:43 AM Results from Google: <generator object search at 0x122abcb30>
0 10
01/30/2024 08:31:43 AM Starting new HTTPS connection (1): www.google.com:443
01/30/2024 08:31:43 AM https://www.google.com:443 "GET /search?q=nhl%2Bbowen%2Bbyram&num=12&hl=en&start=0 HTTP/1.1" 200 None
0 10
01/30/2024 08:31:44 AM Starting new HTTPS connection (1): www.google.com:443
01/30/2024 08:31:44 AM https://www.google.com:443 "GET /search?q=nhl%2Bbowen%2Bbyram&num=12&hl=en&start=0 HTTP/1.1" 200 None
0 10
01/30/2024 08:31:45 AM Starting new HTTPS connection (1): www.google.com:443
01/30/2024 08:31:45 AM https://www.google.com:443 "GET /search?q=nhl%2Bbowen%2Bbyram&num=12&hl=en&start=0 HTTP/1.1" 200 None
0 10
01/30/2024 08:31:45 AM Starting new HTTPS connection (1): www.google.com:443
01/30/2024 08:31:45 AM https://www.google.com:443 "GET /search?q=nhl%2Bbowen%2Bbyram&num=12&hl=en&start=0 HTTP/1.1" 200 None
0 10
01/30/2024 08:31:45 AM Starting new HTTPS connection (1): www.google.com:443
01/30/2024 08:31:45 AM https://www.google.com:443 "GET /search?q=nhl%2Bbowen%2Bbyram&num=12&hl=en&start=0 HTTP/1.1" 200 None

Does not get first result

Hi,
These lines over here: https://github.com/Nv7-GitHub/googlesearch/blob/v1.1.0/googlesearch/__init__.py#L48-L54 stop the package from retrieving the first result, which unfortunately is usually the correct...

search word with quotation

Hi.
I search in google "laminated glass" and "polyvinyl butyral" and see multiple pages results; but when I send this phrase to googlesearch module; I am not getting any results.
why? and how I can to fix it?
(I am need exact result of this phrase.)
this is my code:

    from googlesearch import search

    phrase = '"laminated glass" and "polyvinyl butyral"'
    results = search(phrase)
    res_list = list(results)

thanks

The latest release is dependent on a vulnerable dependency

The latest release of googlesearch (1.10) is dependent on a vulnerable dependency Requests==2.25.1, Please update it

Import Error

The error says: cannot import name 'search' from 'googlesearch'

So, I need help regarding this error.

generator object search at (solved)

Hi! i try this code:

from googlesearch import search
search("Google", num_results=100)

and, this is the response:

any idea?
thanks

Weird user-agent behavior and infinite while loop risk.

Thank you for the 1.2.2 update.

I've be working on the project a bit over the past week, I have noted some potential problems;

The user_agents.py feature is a welcomed addition, but it made the code fail for me (I am located in Europe?).
v1.1.0 had a static recent (and common) user agent (Windows 10), but the user agent list in the file has many highly specific/unused user-agents.

Thus, each time I attempted a request Google redirected me to a consent url for cookies validation

It appears that google (at least in my region) flags unused/weird user agents and prompt them to accept cookies.

To accept the cookies programmatically and not have to write a specific POST function to accept/reject the cookies you can pass this in your headers. It does the trick and you land on the right page request;

headers = {
            "User-Agent": self.user_agent,
            "Cookie": "CONSENT=YES+cb.20220302-17-p0.en+FX+100; NID=0"
        }

This led to another problem: getting redirected on the consent page (or even retrieving the html from the session with the cookies headers) led to a somewhat different html structure that did not contain:

result_block = soup.find_all('div', attrs={'class': 'g'})

--> result_block takes the value of an empty list

Because further down the while loop, start is only incremented if link and title and description exist, this results in an infinite while loop.

I would suggest to consolidate the code to provide and provide an exit in the case result block is an empty list?

Using widely used and recent user-agents using latest OSX, or Windows 10 seems to do the trick for me in the meantime..

import random

def get_useragent():
    return random.choice(_useragent_list)

_useragent_list = [
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36',
    'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36 Edg/111.0.1661.62',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/111.0'
]

429 Client Error: Too Many Requests for url:

| main | ERROR | An error occurred: 429 Client Error: Too Many Requests for url: https://www.google.com/sorry/index?continue=https://www.google.com/search%3Fq%3D%25E5%25B9%25BF%25E5%25B7%259E%25E5%259F%258E%25E8%2581%258C%25E6%25A0%25A1%25E6%25AD%258C%25E6%2598%25AF%25E4%25BB%2580%25E4%25B9%2588%26num%3D32%26hl%3Dzh%26start%3D0&hl=zh&q=EhAqEM5AAjIAAAFJAQQBcwAJGOLhwKsGIjBQOcAu724Gk5CUqRlfIv2iMkMeCAAldO0d1rnqA2n1AZfcWu_YcGEAdwjNzd-i8PMyAXJaAUM

What a cool module!

Sorry it's not really an issue I'm addressing about.

When using the googlesearch module's search function, can an additional parameter such as specifying a site: www.wikipedia.com to Google Index be accepted? I want to limit the results so that when I execute the search function, all related URL links, based on the given query, gets retrieved from sources in the Wikipedia website.

search specific (e.g. google news)

is there anyway to search through a specific section of google such as google news or google finance? I tried changing the init file where
url="https://www.news.google.com/search",

added news between www and google
is it not that simple, sorry for noob question!
there was no end to it, it would just continue to run

metadata-generation-failed error

I encountered an error while trying to install a package using pip. The error message is "metadata-generation-failed".

Step to Reproduce

Run the following command in the terminal:
python3 -m pip install googlesearch-python

Collecting googlesearch-python
  Using cached googlesearch-python-1.2.0.tar.gz (7.4 kB)
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error
  
  × python setup.py egg_info did not run successfully.
  │ exit code: 1

note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

Version used for development

pip 23.0.1
Python 3.10.5

search() got an unexpected keyword argument 'num_results'

From yesterday onwards the package is not working properly when used with num_results
se = search("Google",num_results=10,lang="en")

1 from googlesearch import search
----> 2 se = search("Google",num_results=10,lang="en")
3 print(se)

Without it, it's working fine.

adding "+" while dorking

Rewrited Code Be Like:
instead of:
def search(term, num_results=10, lang="en", proxy=None, advanced=False): escaped_term = term.replace(' ', '+')

you could use:

def search(term, num_results=10, lang="en", proxy=None, advanced=False): escaped_term = term.replace(' ', '%20')

Does this library violate Google's Terms of Service?

Automated queries
Google's Terms of Service don't allow the sending of automated queries of any sort to our system without express permission in advance from Google. Sending automated queries consumes resources and includes using any software (such as WebPosition Gold) that sends automated queries to Google to determine how a website or web page ranks in Google search results for various queries. In addition to rank checking, other types of automated access to Google without permission are also a violation of our Webmaster Guidelines and Terms of Service.

https://developers.google.com/search/docs/advanced/guidelines/automated-queries

Loosen requirements on dependencies

Currently a specific version of beautifulsoup4 and requests is specified, but it would be nice to loosen this up so this model can be used with other modules more easily

a way to safe search?

is there a way to enable/disable safe search?

Build file error (with fix)

I think your build file for building the wheel is not fully correct. To build the wheel using
python -m build .
you should do

Add a MANIFEST.in file in the root directory containing

include requirements.txt

Change your setup.py slightly to

from setuptools import setup

with open("README.md", "r", encoding='UTF-8') as fh:
    long_description = fh.read()

with open("requirements.txt", "r", encoding='UTF-8') as fh:
    requirements = fh.read().split("\n")

setup(
    name="googlesearch-python",
    version="1.2.0",
    author="Nishant Vikramaditya",
    author_email="[email protected]",
    description="A Python library for scraping the Google search engine.",
    long_description=long_description,
    long_description_content_type="text/markdown",
    url="https://github.com/Nv7-GitHub/googlesearch",
    packages=["googlesearch"],
    classifiers=[
        "Programming Language :: Python :: 3",
        "License :: OSI Approved :: MIT License",
        "Operating System :: OS Independent",
    ],
    python_requires=">=3.6",
    install_requires=requirements,
    include_package_data=True,  # Include additional files specified in MANIFEST.in
)

Tested with python >=3.9 it builds the wheel and the source dist.

Quota for the Google Search

Does this project have any quota that limits how many times we can use in one day or in one minute?

Does this require another library for proxy use?

I'm getting error

For a simple example search with proxy

search("example search", proxy="http://191.243.218.249:53281")

I get the error:

Max retries exceeded with url: /search?q=something&num=12&hl=en&start=0 (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:1108)')))

Any suggestions to fix?

I've tried to solve with https://stackoverflow.com/questions/33410577/python-requests-exceptions-sslerror-eof-occurred-in-violation-of-protocol

i change the user agent and stop getting result

Hello, I'm using this beautiful library and I really appreciate your hard work on it.

I have a problem when I make a simple change to the code.

I can't get any result when I make the user agent picked randomly from my file.

this is my query :
`
from googlesearch import search

t=search("google", num_results=10)
print (t)
`

but these are the changes I made in the code :

`
from bs4 import BeautifulSoup
from requests import get
import random
random_pick = open('useragent').read().splitlines()
user_agent =random.choice(random_pick)

def search(term, num_results=10, lang="en", proxy=None):
usr_agent = {
'User-Agent': user_agent}

def fetch_results(search_term, number_results, language_code):
    escaped_search_term = search_term.replace(' ', '+')

    google_url = 'https://www.google.com/search?q={}&num={}&hl={}'.format(escaped_search_term, number_results + 1,
                                                                          language_code)
    proxies = None
    if proxy:
        if proxy[:5] == "https":
            proxies = {"https": proxy}
        else:
            proxies = {"http": proxy}

    response = get(google_url, headers=usr_agent, proxies=proxies)
    response.raise_for_status()

    return response.text

def parse_results(raw_html):
    soup = BeautifulSoup(raw_html, 'html.parser')
    result_block = soup.find_all('div', attrs={'class': 'g'})
    for result in result_block:
        link = result.find('a', href=True)
        title = result.find('h3')
        if link and title:
            yield link['href']

html = fetch_results(term, num_results, lang)
return list(parse_results(html))

`
the result I got is empty list --> []
could you please figure out why? only works when I put fixed user agent

Google "Featured Snippets"

Is there a way to get the featured snippets that Google shows when you search for something instead of just links?

Getting the text preview of a search result?

Is there a way of getting the text preview of a search result? If so, how do i do it? And if not can you implement it?

why doesn't this search code work?

import json
import random
import re
import requests
from bs4 import BeautifulSoup

try:
        from googlesearch import search
        search("My name is Prince", num_results=100, lang="fr")  #search on google "My name is Prince"

        print(search)

except ImportError:
    print("No module named 'google' found")

    for j in search(query, tld="co", num=10, stop=10, pause=2):
         print(j)

Anyone facing Error 429? Client Error: Too Many Requests

I am receiving 429 error after 5 or 6 searches. I know the code does randomly choose a different user agent which shouldn't bring this error, however I did get it. I'm running my script from the command promt and wanted to check if other uses are also facing the same issue. I also do not see any Retry After in the error message which doesn't help.

I would request the community to provide any insights on this or workarounds if any.

Thanks in advance.

Google cloud plateform error 429

Good Morning,

I have a 429 error when I use Google search on Google cloud plateform ( Google functions) but I don't have errors when I use it directly on my workstation. Have you got an Idea ?

maximum number of results returned

Hi there,

thank you for this brilliant dev work. just a minor thing, I noticed that the maximum number of results seems to be 100.

below is the simple line to run:

search('data', 500)

would be possible to help why and enable higher limit?

Thank you so much.
L

parameter num_results related error

for j in search(query, num_results=15):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: search() got an unexpected keyword argument 'num_results'

code:
from googlesearch import search

def search(query):
results = []
for j in search(query, num_results=15):
results.append(j)
return results

query = input("What do you want to search for? ")
results = search(query)
print(f"Here are the top {len(results)} results for your search:")
for result in results:
print(result)

version info
Name: googlesearch-python
Version: 1.2.3

Python 3.11.3
pip 23.1.1

Add search country as a parameter

Is it possible to specify the country as a parameter? I see that I can specify the language, but not the country.

HTTP 429 Too Many Requests

How many requests does Google allow to their website per day?
I keep getting this

  File "/googlesearch/__init__.py", line 305, in search
    html = get_page(url, user_agent, verify_ssl)
  File "/googlesearch/__init__.py", line 174, in get_page
    response = urlopen(request)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 531, in open
    response = meth(req, response)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 641, in http_response
    'http', request, response, code, msg, hdrs)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 563, in error
    result = self._call_chain(*args)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 503, in _call_chain
    result = func(*args)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 755, in http_error_302
    return self.parent.open(new, timeout=req.timeout)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 531, in open
    response = meth(req, response)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 641, in http_response
    'http', request, response, code, msg, hdrs)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 569, in error
    return self._call_chain(*args)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 503, in _call_chain
    result = func(*args)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 649, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 429: Too Many Requests

search() gets stuck on infinite "while loop" if the search term has zero results

googlesearch/googlesearch/__init__.py

Lines 42 to 62 in be88ef6

 start = 0 

 while start < num_results: 

 # Send request 

 resp = _req(escaped_term, num_results-start, lang, start, proxies) 

 # Parse 

 soup = BeautifulSoup(resp.text, 'html.parser') 

 result_block = soup.find_all('div', attrs={'class': 'g'}) 

 for result in result_block: 

 # Find link, title, description 

 link = result.find('a', href=True) 

 title = result.find('h3') 

 description_box = result.find('div', {'style': '-webkit-line-clamp:2'}) 

 if description_box: 

 description = description_box.find('span') 

 if link and title and description: 

 start += 1 

 if advanced: 

 yield SearchResult(link['href'], title.text, description.text) 

 else: 

 yield link['href']

Code cannot exit the while loop if the value of start never goes up, and the value of start only ever goes up if a valid result is yielded.
For search terms that have zero results, this results on the code getting stuck trying to find something forever.

Unable to use proxy

API_KEY = "2fdb6ced427de857f32870d733fe69b0"
query = "loli"
proxy = f"http://scraperapi.country_code=us:{API_KEY}@proxy-server.scraperapi.com:8001"

loli_links = [link for link in search(term=query, proxy=proxy)]

Not returning results

I have a script that was using this library and it always worked normally. But today I went to use the script and lib is no longer returning the result of URLs. the error returned is "429 Client Error"

Update dependecies

I have a project where I'm using this package and BeautifulSoup4, but I can't update to 4.9.3 since this package depends on BeautifulSoup4==4.9.1.

I have tested this package locally with 4.9.3 and it seems to be working correctly.

Additionally, I noticed there are many extra dependencies in requirements.txt that don't seem to be used in the code. Could it be simplified to just the packages that are used?

search result in variable

in which variable does this module stores final search result ?

Zero Search Result

there is a case which searching google returns no result ( because of mistyping or other reasons) so there will be an infinite loop in search function:

    while start < num_results:
        # Send request
        resp = _req(escaped_term, num_results - start,
                    lang, start, proxies, timeout)

        # Parse
        soup = BeautifulSoup(resp.text, "html.parser")
        result_block = soup.find_all("div", attrs={"class": "g"})
        for result in result_block:
            # Find link, title, description
            link = result.find("a", href=True)
            title = result.find("h3")
            description_box = result.find(
                "div", {"style": "-webkit-line-clamp:2"})
            if description_box:
                description = description_box.text
                if link and title and description:
                    start += 1
                    if advanced:
                        yield SearchResult(link["href"], title.text, description)
                    else:
                        yield link["href"]
        sleep(sleep_interval)

the result_block is nothing so the start parameters remains always below num_result
So I add a break point to line before sleep line

        if start==0: break

Newest changes have not yet been released on Pypi

Hi @Nv7-GitHub,

I've been using this package, but by using it, it is preventing me from upgrading my beatifulsoup4 dependency from 4.9.1 to 4.9.3 since the released version of this package depends on 4.9.1.

The current requirements.txt requests beatifulsoup4 4.9.3, but this version of the file is not released on Pypi and cannot be installed with pip.

If you have the time, it would be great if you would be able to publish a new release.

You may want to consider a GitHub action to upload a Python Package using Twine when a release is created.

You may also want to consider adding Dependabot for more easily updating dependency versions.

Let me know if I can help with anything.

Sometimes returns the href of the search query as a result

the following bit of code prints out the search query as a result for me:

from googlesearch import search

se = search("Google",num_results=10,lang="en")

print(se[-1])

here's what the output of the script looks like : /search?q=Google&num=11&hl=en&tbm=isch&source=iu&ictx=1&fir=mM5eejaz-bUIsM%252C0UCf55-GTy6fDM%252C%252Fm%252F045c7b&vet=1&usg=AI4_-kS3fhB6I4-4YGkbI-0POxk60cjoEw&sa=X&ved=2ahUKEwi8y-PKyuzzAhVRZc0KHSjABlkQ_B16BAhIEAI#imgrc=mM5eejaz-bUIsM

It seems like this is happening because of the descriptive result with more information that comes up on the right side. The package tries to process it like any other result and ends up selecting an element with no href value and tried to get the href value. The default action of BeautifulSoup is just to return the pages href value it seems and so that's how it end up at the back of the results.

Advanced search error

search() got an unexpected keyword argument 'advanced'

	start = 0
	while start < num_results:
	# Send request
	resp = _req(escaped_term, num_results-start, lang, start, proxies)

	# Parse
	soup = BeautifulSoup(resp.text, 'html.parser')
	result_block = soup.find_all('div', attrs={'class': 'g'})
	for result in result_block:
	# Find link, title, description
	link = result.find('a', href=True)
	title = result.find('h3')
	description_box = result.find('div', {'style': '-webkit-line-clamp:2'})
	if description_box:
	description = description_box.find('span')
	if link and title and description:
	start += 1
	if advanced:
	yield SearchResult(link['href'], title.text, description.text)
	else:
	yield link['href']