Git Product home page Git Product logo

nodejsnewscrawler's Introduction

NewsCrawler

This is a software which scans a given set of news sources and extracts its headlines. The headlines are processed and saved for later analysis. The data can be accessed from a web frontend or a rest API. Its all about the analysis of news trends.

Features

  • Downloads article titles from the news data/sources.json
  • Saves them in a Redis database
  • Creates an inverted index for search
  • Creates left/right neighbour relations for every word (and day)
  • Creates same headline relations for every word (and day)
  • Counts the occurence for every word (and day)
  • Finds the most popular words for the day
  • Provides a rest api for data access
  • Web frontend which supports most of the backend functionalities
  • The frontend is responsive
  • Example news bot for facebook

Rest API

These are the API endpoints

/api/search/:query
/api/rightneighbour/:word
/api/rightneighbour/:word/:day
/api/leftneighbour/:word
/api/leftneighbour/:word/:day
/api/sameheadline/:query
/api/sameheadline/:query/:day
/api/count/:word
/api/popularwords/:day
/api/popularwords/
/api/popularwordhistory/:word
/api/link/:id
/api/sources/

To get the day today to give to the api you can use the following function

getToday = function(){
    return Math.floor(Date.now() / 1000 / 60 / 60 / 24);
}

Backend Dependencies (npm)

  • express
  • express-rest
  • cheerio
  • redis
  • request
  • node-schedule
  • fbgraph
  • nlp_compromise

Frontend Dependencies

  • bootstrap
  • JQuery

Setup

Download the files, install Redis, install the npm dpendencies and run the 'Start.js'. You can also specify the news sources in the data/sources.json. After start navigate with your favorite browser to http://localhost:3000 to see the frontend. If you want to use the bots, you have to create the config.json in the /data folder. There is an example config in the same folder.

Usage

To run the webserver and the downloader / processor just execute the 'Start.js' file

> node Start.js

Facebook bot

It is possible to create news bots for facebook pages with this framework. I created an example bot. Create a /data/config.json file for your facebook page and run

> node WhoGotKilledBot.js

If you want to see a bot in action: https://www.facebook.com/pg/whogotkilled/

The WhoGotKilledBot searches for news with the word "killed" and tires to extract the information who got killed in which place. It will post these articles every 10 minutes to the facebook page and avoids doublicated posts. There is also an TrumpNews bot.

Generating the most recent headlines

Try generating the most important headlines of the day by executing 'CreateHeadlines.js'. This is heighly experimental

> node CreateHeadlines.js
[ [ 'steven', 'mnuchin' ],
  [ 'womens', 'march' ],
  [ 'novak', 'djokovic', 'upset' ],
  [ 'largest', 'student', 'loan' ],
  [ 'agriculture', 'secretary' ],
  [ 'los', 'angeles' ],
  [ 'developer', 'rick', 'perry' ],
  [ 'treasury', 'pick' ],
  [ 'press', 'conference' ],
  [ 'chelsea', 'manning' ],
  [ 'avalanche', 'buries', 'hotel' ],
  [ 'second', 'round' ],
  [ 'full', 'article' ],
  [ 'takes', 'office' ],
  [ 'australian', 'open' ],
  [ 'least', '30', 'firefighters', 'killed' ],
  [ 'tehran', 'high', 'rise', 'collapses' ],
  [ 'peoples', 'choice', 'awards' ],
  [ 'first', 'lady' ],
  [ 'more', 'than', '100', 'lapd' ],
  [ 'donald', 'trumps', 'inauguration', 'day' ],
  ...

Frontend

alt tag

nodejsnewscrawler's People

Contributors

moritzgoeckel avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

nodejsnewscrawler's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.