Git Product home page Git Product logo

oxylabs / google-news-scraper Goto Github PK

View Code? Open in Web Editor NEW
1.0K 13.0 8.0 47 KB

Use Google News API to obtain the latest global news for your project, including a wide range of sources, headlines, URLs, and publication dates from the Google News platform.

Makefile 3.70% Python 96.30%
google-news-scraper google-scraper google-scraper-api news-scraper news-scraper-api scraper-api google-news-api api-for-google-news api-google-news api-stock-news

google-news-scraper's Introduction

Scraping Google News

Oxylabs promo code

Free Google News Scraper

Prerequisites

To run this tool, you need to have Python 3.11 installed in your system.

Installation

Open up a terminal window, navigate to this repository and run this command:

make install

Getting the topic to scrape

This tool is used to scrape Google News articles based on the topic they're listed in.

First of all, open up Google News, and look through the topics listed in the top header of the webpage.

image

Click on a topic you wish to scrape. In this example we'll be using the Business topic.

Next, look at the URL in your browser and copy the string of characters that come after /topics/, that's your topic ID.

image

In the URL shown in the screenshot, the topic ID would be CAAqJggKIiBDQkFTRWdvSUwyMHZNRGx6TVdZU0FtVnVHZ0pWVXlnQVAB.

Save this value, you'll need it for scraping the articles.

Scraping

To scrape articles from your selected topic, run this command in your terminal: make scrape TOPIC_ID=<your_selected_topic_id>

With the Business topic ID selected before, the command should look like this:

make scrape TOPIC_ID=CAAqJggKIiBDQkFTRWdvSUwyMHZNRGx6TVdZU0FtVnVHZ0pWVXlnQVAB

After running the command, you should see this in your terminal:

image

When the tool has finished running, you should see a file named articles.csv in the directory you were running the tool.

If you open the generated CSV file, the data should look something like this:

image

Notes

In case the code doesn't work or your project is of bigger scale, please refer to the second part of the tutorial. There, we showcase how to scrape public data with Oxylabs Scraper API.

Oxylabs Google News API

You can get a 7-day trial for Oxylabs Google News API and get free 5K results. The tool will deliver a list of sources, titles, URLs, and dates from published articles all over the Google News portal. This API returns real-time data and gives access to localized results, all while avoiding blocks.

After you claim your trial, using Google News API consists of three main steps:

  1. Create your API user via our dashboard
  2. Send a request
  3. Retrieve the data in JSON or HTML

Request sample

In the example below, we use Google News API and make a request to collect search result pages for the search term adidas on the google.nl domain:

import requests
from pprint import pprint

# Structure payload.
payload = {
    'source': 'google_search',
    'domain': 'nl',`
    'query': 'adidas',
    'parse': True,
    'context': [
        {'key': 'tbm', 'value': 'nws'},
    ],
}

# Get response.
response = requests.post(
    'https://realtime.oxylabs.io/v1/queries',
    auth=('USERNAME', 'PASSWORD'),
    json=payload,
)

# Print prettified response to stdout.
pprint(response.json())

To see request samples in other languages and parameter values along with their descriptions, please take a look at our extensive Google News API documentation.

google-news-scraper's People

Contributors

augustoxy avatar ignasshimk avatar ignassimkunas avatar oxylabsorg avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

google-news-scraper's Issues

b64 decoding no longer working?

hello,

it seems the url b64 decoding no longer works. I couldn't find info re Google changing the encoding of their google news urls.
Are you aware of anything re this?

Thanks,

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.