Git Product home page Git Product logo

gpipe43's Introduction

gpipe43 is a full text RSS generator which can hosted on Google App Engine. Use Regex to search and format full text from a article, or any other content that you want.
Inspired by Yahoo Pipes and Feed43.
Yahoo Pipe RIP.

Feature

  • Support multi page.
  • Display all images of article's gallery.
  • Appending article's comment is possible.

Prepare

Simple quickstart

Edit /main/user_agents.py

  • add UA

Edit config.py

  • prjname: Name of your project on app engine
  • bucket_name: Name of bucket
  • subdir4bg: The crawler working under: http://[prjname].appspot.com/[subdir4bg]/[rssname]
  • subdir4rss: This is your RSS site: http://[prjname].appspot.com/[subdir4rss]/[rssname]

Edit example.py,replace 'example' to your own RSS's name

  • rssname: RSS's name.
  • siteurl: The website or a RSS feed that you want to generat fulltext RSS.
  • reg4site: Regex that can find articles' URL. Leave a blank if siteurl is a feed.
  • reg4title: Regex for title of a article. Leave a blank if siteurl is a feed.
  • reg4pubdate: Regex for publish date of a article. Leave a blank if siteurl is a feed. The format of pubdate must contain '%Y-%m-%d', otherwise leave a blank.
  • reg4text: Regex for main body of a article.
  • reg4comment: Regex for comment. Not necessary, can leave it blank. You can also use this Regex to find all the image of a gallery in the article.
  • reg4nextpage: Regex for article's next page if there's more than one page.
  • Anzahl: How much article will be generated. If there's not only one siteurl, this limit for EVERY SINGLE siteurl instead of for all articleurl from all siteurl. 0 = no limit.

  • *encoding: Optional. Generally chardet can detect the right encoding, but sometimes it cannot(for example, recognize gb18030 as gb3212), so I use 'replace' option of decode method to avoid illegal character, then there's replacement character in generated feed. So you can specify the encoding of the website. It only influence main text.
  • rssgen.ausfuehren('use_urllib/use_urlfetch', 'st/mt', siteurl, reg4site, reg4title, reg4pubdate, reg4text, reg4comment, reg4nextpage, Anzahl): Generat a RSS from a website.
  • feed_fulltext.ausfuehren('use_urllib/use_urlfetch', siteurl, reg4nextpage, reg4text, reg4comment, Anzahl, rssname): Use this to generat fulltext from a RSS feed.
    • use_urllib: Use urllib2,with UA
    • use_urlfetch: Use urlfetch,no UA
    • mt: Multi threading
    • st: Single threading

Edit feed_list.py

  • Replace 'example' to your own RSS's name

app.yaml, cron.yaml

Optional

  • Edit ./main/Vorlage.xml and Vorlage_Error.xml, you can fill the properties of elements 'generator', 'webMaster' and 'copyright'.
  • If you just would like to format an existing feed, see example_02.py, then add url and script to app.yaml. It's not necessary to add it in feed_list.py and cron.yaml, because the feed will not save in cloud storage.

Test

dev_appserver.py [PATH_TO_YOUR_APP]/app.yaml

Start the crawler: http://localhost:8080/[subdir4bg]/[rssname]
When done, here to check your RSS: http://localhost:8080/[subdir4rssg]/[rssname]

See official guide: Using the Local Development Server

Upload to app engine

  • cd to the directory of your project

gcloud config set project PROJECT_NAME
gcloud app deploy app.yaml cron.yaml --version=VERSION_NUMBER

See official guide: Deploying a Python App

Examples

gpipe43's People

Contributors

boneflame avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Forkers

ewallz

gpipe43's Issues

Need a full example.py for a bird user.

Could you give a full example.py. I hope my gapps working at first. and then I will learn more. sorry for my bad english.
能给一个完整的example.py文件吗?我希望gapps运行起来先,之后再慢慢学习。
如果能建立列表,共享大家已经配置可用的源,将大大提高本项目面对的用户群。我就是其中之一,只会用,不会编程。

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.