Git Product home page Git Product logo

loklak_scraper_js's People

Contributors

achint08 avatar daminisatya avatar djmgit avatar hemantjadon avatar jigyasa-grover avatar kapillamba4 avatar kavithaenair avatar mariobehling avatar orbiter avatar skrpl avatar vibhcool avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

loklak_scraper_js's Issues

Base Scrapper Not working

The base scrapper doesn't have an onInit() method but its object is created so it throws an error.

Improve Youtube scraper

Issue Type: Enhancement

Short Description

Implement the scraping logic of Youtube scraper in ES 6 by extending the scraper from BaseLoklakScraper using request-promise-native for sending GET requests. The current implementation is written in ES5.1 and uses synchronous request.

Wrong "library" output name of bundled files

  • Issue type: Bug report

Short description

Bundled files generated by Webpack has wrong library name. library name is used to import the CommonJS module.

Environment

  • Operating system: Independent of OS
  • Software source: Github repository

Steps to reproduce

  1. execute **webpack** command in the root directory.
  2. files will be generated in the build directory.
  3. the second word in any file of build directory is the name of the file itself, e.g. for twitter.js it is **var twitter**.

Expected behaviour

The library name i.e the 2nd word in the generated bundled file should be the same with the name of scraper class.

Actual behaviour

The library name is the name of the generated bundled file.

Would you like to work on the issue?

Yes

Add build script

  • Issue type: Feature request

Usecase

Users should be able to create optimized build using webpack.

Description

The script should produce optimized build and also show progress while building.

webpack -p --progress

Adding Flickr Scrapper.

Flickr is one of the top ten most used social networking site. Flickr HTML can be scrapped and can be added.

Process date fetched to standard format in TimeAndDate scraper

There is one issue in TimeAndDate scraper that it doesn't process the date fetched to standard format in which it can be directly used .
Something like: Thu Apr 06 15:14:32 IST 2017 or 2017-04-06T09:44:32.000Z

this scraper requires processing of the fetched data.

Add API for using JS scrapers as loklak harvesting workers

Issue Description

Issue type: Parent issue

As of now, this JS has to be bundled so that it can be used in other projects and even then, the functions have to be manually imported.

It would be good to have an API of the following type or similar -

import { loklakHarvester } from 'loklak_scrapers_js';

let myLoklakHarvester = loklakHarvester('http://api.loklak.org', 4)
                            .onHarvestStart((backend, query) => {
                                ...
                            })
                            .onHarvestComplete((backend, query, messages) => {
                                ...
                            })
                            .onHarvestError((backend, query, error) => {
                                ...
                            })
                            .onPushStart((backend, messages) => {
                                ...
                            })
                            .onPushComplete((backend, messages) => {
                                ...
                            })
                            .onPushError((backend, messages, error) => {
                                ...
                            })
                            .onSuggestionFetch((backend, suggestions) => {
                                ...
                            })
                            .onShutDown(() => {
                                ...
                            });

...

myLoklakHarvester.setBackend('http://backend.loklak.org');
myLoklakHarvester.setWorkers(3);

...

myLoklakHarvester.shutDown();

This would facilitate usage of loklak_scrapers_js in many projects and also allow an easy, plug and play interface for any website.

Create a BaseScrapper Class

Creating a BaseLoklakScrapper class which is extended by all the other scrappers and provides the easy API and uniform for the scrapper implementation

Improve Quora Scraper

Short Description

Currently, a valid profile (name of user) needs to be provided as a query parameter to scrape the details of a profile and the GET request to obtain the HTML of user profile is sent at the instantiation of a QuoraLoklakScraper object.

Required enhancement

Rather than providing a valid profile (name of user), a query should be taken as an input and then profile search is done using the https://www.quora.com/search?q=QUERY&type=profile. Links of profiles are obtained from the above-mentioned URL and then each profile is scraped. Finally, the scraped profile data is aggregated in a list and returned.

Improve Reddit scraper

Issue type: Enhancment

Required Enhancement

Current scraper takes query parameter from command line argument and scraping is done while RedditLoklakScraper is instantiated. A new method is created that takes the query parameter and provides the scraped data through a callback.

Improve Github scraper

Issue type: Enhancement

Short Description

Currently, Github scraper fetches data using Github API in a synchronous way. Rather than fetching data in a synchronous way.

Required Enhancement

The data should be fetched in an asynchronous way using request-promise-native. Along with that, the code should be converted to ES6 and the scraper should extend BaseLoklakScraper.

Add webpack config file to transpile and bundle

Create a webpack.config.js file that can be used to bundle 3rd party NodeJS modules. So, that the bundled files can be used in Java Platforms (loklak_server and loklak_wok_android). Also, include Babel plugins in webpack.config.js for transpiling ES6.

StackOverflow Scraper

Hello! I am new here and want to contribute.
I was thinking of adding a stackoverflow scraper, in which we could pass question as query and result would be some questions related to our question that has been already asked on SO and their answers.
Plus i can also add user details scraper too.
I will start working on it as soon as you let me know.

Linkedin Profile Scrapper

There are some public profile in Linkedin that can be scrapped. Create a scraper that can scrape linkedin profiles.

Add TravisCI

As of now, TravisCI should do the following:

  • Compile using Webpack to generate bundled files and push the bundled files to a new branch build.
  • Push the bundled twitter.js file to loklak_wok_android, so that it can be used by LiquidCore for scraping tweets.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.