Git Product home page Git Product logo

autologin's Introduction

Autologin: Automatic login for web spiders

AutoLogin is a utility that makes it easier for web spiders to crawl websites that require login. Provide it with credentials and a URL or the html source of a page(normally the homepage), and it will attempt to login for you. Cookies are returned to be used by your spider.

The goal of Autologin is to make it easier for web spiders to crawl websites that require authentication without having to re-write login code for each website.

Autologin can be used as a library, on the command line, or as a service. You can make use of Autologin without generating http requests, so you can drop it right into your spider without worrying about impacting rate limits.

Autologin is written in Python and only requires lxml and Flask in order to do its thing. However if you install Formasaurus (and you should) it will use it automatically and performance will improve.

  • Features
  • Quickstart
  • Installation
  • [Auth Cookies From URL](##Auth cookies from URL)
  • [Auth Cookies From HTML](##Auth cokies from HTML)
  • [Login request](##Login request)
  • [Extract login links](##Extract login links)
  • [Command Line](##Command Line)
  • [Web Service](##Web Service)

Features

  • Automatically find login forms and fields
  • Obtain authenticated cookies
  • Obtain form requests to submit from your own spider
  • Extract links to login pages
  • Use as a library with or without making http requests
  • Command line client
  • Web service for testing your requests and cookies

Quickstart

Don't like reading documentation?

from autologin.autologin import AutoLogin

url = 'https://reddit.com'
username = 'foo'
password = 'bar'
al = AutoLogin()
cookies = al.auth_cookies_from_url(url, username, password)

You now have a cookiejar that you can use in your spider. Don't want a cookiejar?

cookies.__dict__

You now have a dictionary.

Installation

This is not (yet) registered on PyPi so you must clone the repository and use setup.py to build and install:

$ git clone https://github.com/WalnutATiie/autologin.git
$ cd autologin
$ sudo pip install - requirements.txt
$ python setup.py build
$ python setup.py install

Auth cookies from URL

This method makes an http request to the URL using urllib, extracts the login form (if there is one), fills the fields and submits the form. It then return any cookies it has picked up.

cookies = al.auth_cookies_from_url(url, username, password)

with proxy:

cookies = al.auth_cookies_from_url(url, username, password,proxy_type='http',proxy='http://192.168.0.1:8080')

Notice we only support http/https proxy.
Note that it returns all cookies, they may be session cookies rather than authenticated cookies.

Auth cookies from HTML

This method extracts the login form (if there is one), fills the fields and submits the form. It then return any cookies it has picked up.

cookies = al.auth_cookies_from_html(html_source, username, password, base_url=None)

The base_url can be used to a form url is returned when the form action is empty. Note that it returns all cookies, they may be session cookies rather than authenticated cookies.

Login request

This method extracts the login form (if there is one), fills the fields and returns a dictionary with the form url and args for your spider to submit. No http requests are made.

cookies = al.login_request(html_source, username, password, base_url=None)

The base_url can be used to a form url is returned when the form action is empty.

Extract login links

This method extracts any login links that it can find in the source and returns a list.

cookies = al.extract_login_links(html_source)

Command Line

$ autologin
usage: autologin [-h] [--proxy PROXY] [--show-in-browser SHOW_IN_BROWSER]
                 username password url

Web Service

$ autologin-server
 * Running on http://127.0.0.1:8088/ (Press CTRL+C to quit)
 * Restarting with stat

Opening a browser to this URL will show you the AutoLogin UI which can be used to test credentials and get a basic understanding of how the system works. API endpoints are also documented here if you'd like to use AutoLogin as a service.

Contributors

Source code and bug tracker are on github: https://github.com/TeamHG-Memex/autologin.

License

License is MIT.

autologin's People

Contributors

lukemaxwell avatar hyp3ri0n-ng avatar qcdev avatar walnutatiie avatar atowler avatar mehaase avatar

Stargazers

 avatar

Watchers

James Cloos avatar  avatar

Forkers

chansonz

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.