Git Product home page Git Product logo

crawlertest's Introduction

Crawler tests

I need to make a crawler; PHP or else.

This repo is to quickly test if it cna be done effectively in PHP or I need to switch the language. Below are the notes of my short R&D.

Initial path

Few URL

I found below URLs useful during initial search.

Possible options

Tools

PHP Options

During last few years, I am mainly working in PHP so yes, PHP is my preferred tool of choice as it will not involve big learning curve. Yes I'm open for Python and also have professional Java experience in the past, PHP will be my preference if it can provide decent performance. I believe PHP can do it on high scale with PHP 7, different caching, multiple servers along with message queue.

Decision 1: Try first with PHP. Again, this is just initial experiment si I'd like to give PHP first chance.

Looking available PHP options

  • PHP crawl is not an option due to its restrictive license (GPL). However I'd like to check its code to see how it is doing things at lower level. May be I get few ideas from there.

Since there is no other crawler option in PHP, I probably need to make my own crawler. Another reason, in future, if idea clicks, crawler will have lot of responsibility. It will be heart and soul of my application so I do not want to be restricted by any third party tool. Also I want to learn how crawler.

Decision 2: Try custom crawler Look at other open source solutions but at least attempt to make own crawler. May be it could be new open source crawler or at least I'll learn some thing new :)

Preference: New open source project. (Get idea from other open source projects)

Open for: If custom take a lot of time, open for other open source project in PHP, Python and Java.

  • PHP Simple Test Scriptable Browser seems another part to look at. It is not actually a crawler but if I need to make a crawler from scratch, this could make reading web pages easy.

PHP1

First experiment in php is listed in php1.md

crawlertest's People

Contributors

kapilsharma avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.