Git Product home page Git Product logo

Comments (7)

alvinwoon avatar alvinwoon commented on May 17, 2024

I am also looking for the ability to follow links or parse the next page. Sometimes the first url is not what you are looking for (for example if you want to parse the first result of a search page and not the search page itself)

from toapi.

elliotgao2 avatar elliotgao2 commented on May 17, 2024

@scottwoodall

There is a solution here:

api = Api(url)
app = api.server.app

@app.route('/post_page/')
def post_method():
    res = requests.post(url, data) # You need to analysis the ajax post request of source site.
    return  item.parse(res.text)

from toapi.

elliotgao2 avatar elliotgao2 commented on May 17, 2024

@alvinwoon

This example could help you.

https://github.com/gaojiuli/toapi/blob/master/examples/hackernews_page.py

from toapi.

alvinwoon avatar alvinwoon commented on May 17, 2024

@gaojiuli Thanks!

from toapi.

scottwoodall avatar scottwoodall commented on May 17, 2024

@gaojiuli Where does data, and item come from? I tried:

@app.route('/posts')
def post_method(*args, **kwargs):
    print(args)
    print(kwargs)

but they are empty.

from toapi.

Ehco1996 avatar Ehco1996 commented on May 17, 2024

由于toapi内置的fetch_page_source()方法 没有针对post请求的情况

我们需要自行添加flask路由来实现功能

这里给出一个比较详细的例子

假设我需要通过post方法来得到这个 url 的数据,并且通过toapi的方式来解析的

  • items的编写
from toapi import Item, XPath


class Search(Item):
    '''
    从搜索的界面解析出
    书名 id 链接 简介
    '''
    title = XPath('//h3/a/text()')
    book_id = XPath('//h3/a/@href')
    url = XPath('//h3/a/@href')
    content = XPath('//p[2]/text()')

    def clean_title(self, title):
        return ''.join(title)

    def clean_book_id(self, book_id):
        return book_id.split('-')[1]

    def clean_url(self, url):
        return url[:url.find('?')]

    class Meta:
        source = XPath('//li[@class="pbw"]')
        # 这里的route留空,防止重复注册路由
        route = {}
  • 路由的注册
from toapi import Api
from items.search import Search
from settings import MySettings
import json
import requests


api = Api('',settings=MySettings)
api.register(Search)

@api.server.app.route('/search/<keyword>')
def search_page(keyword):
    '''
    91bay新书论坛
    搜索功能
    '''
    data = {
        'searchsel': 'forum',
        'mod': 'forum',
        'srchtype': 'title',
        'srchtxt': keyword,
    }
    r = requests.post(
        'http://91baby.mama.cn/search.php?searchsubmit=yes', data)
    r.encoding = 'utf8'
    html = r.text
    results = {}
    items = [Search]
    # 通过toapi的方法对网页进行解析
    for item in items:
        parsed_item = api.parse_item(html, item)
        results[item.__name__] = parsed_item
    # 返回json
    return api.server.app.response_class(
        response=json.dumps(results, ensure_ascii=False),
        status=200,
        mimetype='application/json'
    )

if __name__ == '__main__':
    api.serve()

这样我们就可以通过访问http://127.0.0.1:5000/search/keyword 来解析post数据

这个方法由于没有得到toapi的支持
所以缓存功能是不可以使用的

from toapi.

howie6879 avatar howie6879 commented on May 17, 2024

Hi @Ehco1996
You can also use the cache by yourself
There is a document here:

from toapi.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.