Git Product home page Git Product logo

discogs_aio_spider's Introduction

Crawl a foreign vinyl record information website(Discogs), use Asyncio+Httpx+Motor+Aio-Pika+Aioredis in Python3.8

Project flow chart

Project flow chart

How to run?

install poetry

python3 -m pip install poetry

init environment

poetry init

install dependencies

poetry install 

Run

poetry run python main.py

Attention: Step1 must generate task queue first, otherwise step2 will go wrong.

Similarly, step3 must wait for step2 to generate a task queue. The call between files can be optimized according to actual business conditions.(Where step1, step2, step3 are aliases of functions)

About Code

Using some of the new features of python, you can learn some new usages, such as: typing, fstring, pydantic, dataclass How to develop asynchronous programming, the scheduling relationship between codes,the use of decorators, etc.

Introduction to the use of storage tools

  • redis: remove duplication
  • rabbitmq: Conducive to its message confirmation mechanism
  • mongo: data persistence

Bug?

  • In step1, an error will be reported when using step2 and step3, so multiple processes are used, and the optimization will be continued later to find solutions.
  • pydantic can't completely replace dataclass, there will be errors in some places, whether follow-up investigation can be solved.
  • There are some typing that has not been added and will be completed later.

Of course, the level is limited, and there are many areas that need to be optimized. Welcome to discuss with me.

Contact Me

discogs_aio_spider's People

Contributors

cxapython avatar dependabot[bot] avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

discogs_aio_spider's Issues

找到个小bug

image

最近在学习大佬写的代码
在装饰器和代码封装这一块,大佬逻辑写的很优美,应该是造过轮子的。
大佬玩的倒是挺飘逸挺开心的,不过可苦了我这个看代码的。
还有学到了一些python的新特性,涨了见识,并且一不小心就发现了自己有多lj
代码部分快看完了,但项目核心异步这一块一直没有深入了解过,不过这也算是个契机吧,
可以钻研一下,看以后能不能找到大佬写的其他bug
:-)

文件提交规范,

文件提交规范,没有看到requirements文件,还有一些产生的缓存文件也提交上来了

我在使用Httpx的过程中遇到问题

I keep getting this error when I try to construct the request, and after checking the official documentation, it said it's a missing necessary header, but I can debug it successfully using the browser.

What could be causing this error?

`import httpx

def getMatch():
header = {
'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.51 Safari/537.36 Edg/99.0.1150.39'
}
client = httpx.Client(headers=header, http2=True, verify='riotgames.pem')
url = 'some_url'
request = httpx.Request("GET", url=url)
r = client.send(request)
print(r)

getMatch()`

httpx.LocalProtocolError: cannot receive data before headers

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.