Git Product home page Git Product logo

fast-lianjia-crawler's People

Contributors

caoz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fast-lianjia-crawler's Issues

关于创建数据库的一些问题

我在看到您的代码后第一次使用PostgreSQL,虽然已经在油管上有了一个基本的了解,但我在按照您的readme执行创建数据库的操作时,一直显示无法连接到数据库,用户的password认证失败,所以希望能和您交流一下这方面的问题,谢谢

不同商圈之间有重复小区, communites表id唯一导致数据插入失败

由于不同商圈之间可能有重复的社区,但是communities表id不能重复,导致抓取第二个商圈时就运行失败了,所以需要做一下处理。仿照删除商圈id的做法,在小区信息插入数据库之前加入了一行删除已有社区id的代码。如下所示,Main.py 163行update_db函数。

def update_db(db_session, biz_circle, communities):
    """
    更新小区信息, 商圈信息
    """
    db_session.query(Community).filter(
        Community.biz_circle_id == biz_circle.id
    ).delete()

    for community_info in communities['list']:
        try:
            district_id = DISTRICT_MAP[community_info['district_name']]
            community = Community(biz_circle.city_id, district_id, biz_circle.id, community_info)
            
            db_session.query(Community).filter(
                Community.id == community.id
            ).delete()

            db_session.add(community)
        except Exception as e:
            # 返回的信息可能是错误的/不完整的, 如小区信息失效后返回的是不完整的信息
            # 如: http://sz.lianjia.com/xiaoqu/2414168277659446
            logging.error('错误: 小区 id: {}; 错误信息: {}'.format(community_info['community_id'], repr(e)))

    biz_circle.communities_count = communities['count']
    biz_circle.communities_updated_at = datetime.now()

    db_session.commit()

更新城市信息遇到了问题

python app/main.py 110000
2020-05-31 03:15:29,322 root[config] INFO: 使用配置文件 "config.json".
2020-05-31 03:15:29,323 root[config] WARNING: 配置文件不存在, 使用默认配置文件 "config.default.json".
2020-05-31 03:15:29,580 root[main] INFO: 初始化/更新城市信息... city_id=110000
Traceback (most recent call last):
File "app/main.py", line 209, in
main()
File "app/main.py", line 20, in main
update_city(city_id)
File "app/main.py", line 30, in update_city
city_info = get_city_info(city_id)
File "app/main.py", line 73, in get_city_info
data = util.get_data(url, payload, method='POST')
File "C:\Users\lihon\PycharmProjects\Fast-LianJia-Crawler\app\util_init_.py", line 27, in get_data
return parse_data(r)
File "C:\Users\lihon\PycharmProjects\Fast-LianJia-Crawler\app\util_init_.py", line 35, in parse_data
raise Exception('请求出错了: ' + as_json['error'])
Exception: 请求出错了: 无效的请求

请教token和app_id的问题

抱歉没有找到你的联系方式,比如邮件等,提个issue。请教两个问题。

  1. 我是通过charles抓app包的,ua 能抓出来没问题,但是 app_id 和 app_secret 是如何得到的呢,我看到抓出来的字段非常的多,没有看到这两个字段,至少app_id是没有,是怎么判断需要这两个字段的呢?

  2. token的获取同问题1,是怎么知道链家的加密方式的啊。

都是通过不断尝试试出来的么?比较困惑,希望作者解惑,谢谢。

请问这个是什么情况呢?是我的数据库的问题么?为什么他在请求tcp的5432端口?使用的是什么数据库么?

config
{ "db_info": { "db": "lian-jia", "host": "localhost", "user": "root", "password": "123456" } }

运行代码&返回值
`(py3) F:\Github\Fast-LianJia-Crawler>python app/main.py 110000
2019-04-26 18:28:27,388 root[config] INFO: 使用配置文件 "config.json".
2019-04-26 18:28:27,388 root[config] WARNING: 配置文件不存在, 使用默认配置文件 "config.default.json".
Traceback (most recent call last):
File "F:\Tools\Anaconda3\envs\py3\lib\site-packages\sqlalchemy\engine\base.py", line 2262, in _wrap_pool_connect
return fn()
File "F:\Tools\Anaconda3\envs\py3\lib\site-packages\sqlalchemy\pool\base.py", line 363, in connect
return _ConnectionFairy._checkout(self)
File "F:\Tools\Anaconda3\envs\py3\lib\site-packages\sqlalchemy\pool\base.py", line 760, in _checkout
fairy = _ConnectionRecord.checkout(pool)
File "F:\Tools\Anaconda3\envs\py3\lib\site-packages\sqlalchemy\pool\base.py", line 492, in checkout
rec = pool._do_get()
File "F:\Tools\Anaconda3\envs\py3\lib\site-packages\sqlalchemy\pool\impl.py", line 139, in _do_get
self._dec_overflow()
File "F:\Tools\Anaconda3\envs\py3\lib\site-packages\sqlalchemy\util\langhelpers.py", line 68, in exit
compat.reraise(exc_type, exc_value, exc_tb)
File "F:\Tools\Anaconda3\envs\py3\lib\site-packages\sqlalchemy\util\compat.py", line 129, in reraise
raise value
File "F:\Tools\Anaconda3\envs\py3\lib\site-packages\sqlalchemy\pool\impl.py", line 136, in _do_get
return self._create_connection()
File "F:\Tools\Anaconda3\envs\py3\lib\site-packages\sqlalchemy\pool\base.py", line 308, in _create_connection
return _ConnectionRecord(self)
File "F:\Tools\Anaconda3\envs\py3\lib\site-packages\sqlalchemy\pool\base.py", line 437, in init
self.__connect(first_connect_check=True)
File "F:\Tools\Anaconda3\envs\py3\lib\site-packages\sqlalchemy\pool\base.py", line 639, in __connect
connection = pool.invoke_creator(self)
File "F:\Tools\Anaconda3\envs\py3\lib\site-packages\sqlalchemy\engine\strategies.py", line 114, in connect
return dialect.connect(*cargs, **cparams)
File "F:\Tools\Anaconda3\envs\py3\lib\site-packages\sqlalchemy\engine\default.py", line 453, in connect
return self.dbapi.connect(*cargs, **cparams)
File "F:\Tools\Anaconda3\envs\py3\lib\site-packages\psycopg2_init
.py", line 126, in connect
conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
psycopg2.OperationalError: could not connect to server: Connection refused (0x0000274D/10061)
Is the server running on host "localhost" (::1) and accepting
TCP/IP connections on port 5432?
could not connect to server: Connection refused (0x0000274D/10061)
Is the server running on host "localhost" (127.0.0.1) and accepting
TCP/IP connections on port 5432?

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "app/main.py", line 11, in
from lian_jia import City, District, BizCircle, Community
File "F:\Github\Fast-LianJia-Crawler\app\lian_jia_init_.py", line 5, in
Base.metadata.create_all(engine)
File "F:\Tools\Anaconda3\envs\py3\lib\site-packages\sqlalchemy\sql\schema.py", line 4287, in create_all
ddl.SchemaGenerator, self, checkfirst=checkfirst, tables=tables
File "F:\Tools\Anaconda3\envs\py3\lib\site-packages\sqlalchemy\engine\base.py", line 2032, in _run_visitor
with self._optional_conn_ctx_manager(connection) as conn:
File "F:\Tools\Anaconda3\envs\py3\lib\contextlib.py", line 112, in enter
return next(self.gen)
File "F:\Tools\Anaconda3\envs\py3\lib\site-packages\sqlalchemy\engine\base.py", line 2024, in _optional_conn_ctx_manager
with self._contextual_connect() as conn:
File "F:\Tools\Anaconda3\envs\py3\lib\site-packages\sqlalchemy\engine\base.py", line 2226, in _contextual_connect
self._wrap_pool_connect(self.pool.connect, None),
File "F:\Tools\Anaconda3\envs\py3\lib\site-packages\sqlalchemy\engine\base.py", line 2266, in _wrap_pool_connect
e, dialect, self
File "F:\Tools\Anaconda3\envs\py3\lib\site-packages\sqlalchemy\engine\base.py", line 1536, in _handle_dbapi_exception_noconnection
util.raise_from_cause(sqlalchemy_exception, exc_info)
File "F:\Tools\Anaconda3\envs\py3\lib\site-packages\sqlalchemy\util\compat.py", line 383, in raise_from_cause
reraise(type(exception), exception, tb=exc_tb, cause=cause)
File "F:\Tools\Anaconda3\envs\py3\lib\site-packages\sqlalchemy\util\compat.py", line 128, in reraise
raise value.with_traceback(tb)
File "F:\Tools\Anaconda3\envs\py3\lib\site-packages\sqlalchemy\engine\base.py", line 2262, in _wrap_pool_connect
return fn()
File "F:\Tools\Anaconda3\envs\py3\lib\site-packages\sqlalchemy\pool\base.py", line 363, in connect
return _ConnectionFairy._checkout(self)
File "F:\Tools\Anaconda3\envs\py3\lib\site-packages\sqlalchemy\pool\base.py", line 760, in _checkout
fairy = _ConnectionRecord.checkout(pool)
File "F:\Tools\Anaconda3\envs\py3\lib\site-packages\sqlalchemy\pool\base.py", line 492, in checkout
rec = pool._do_get()
File "F:\Tools\Anaconda3\envs\py3\lib\site-packages\sqlalchemy\pool\impl.py", line 139, in _do_get
self._dec_overflow()
File "F:\Tools\Anaconda3\envs\py3\lib\site-packages\sqlalchemy\util\langhelpers.py", line 68, in exit
compat.reraise(exc_type, exc_value, exc_tb)
File "F:\Tools\Anaconda3\envs\py3\lib\site-packages\sqlalchemy\util\compat.py", line 129, in reraise
raise value
File "F:\Tools\Anaconda3\envs\py3\lib\site-packages\sqlalchemy\pool\impl.py", line 136, in _do_get
return self._create_connection()
File "F:\Tools\Anaconda3\envs\py3\lib\site-packages\sqlalchemy\pool\base.py", line 308, in _create_connection
return _ConnectionRecord(self)
File "F:\Tools\Anaconda3\envs\py3\lib\site-packages\sqlalchemy\pool\base.py", line 437, in init
self.__connect(first_connect_check=True)
File "F:\Tools\Anaconda3\envs\py3\lib\site-packages\sqlalchemy\pool\base.py", line 639, in __connect
connection = pool.invoke_creator(self)
File "F:\Tools\Anaconda3\envs\py3\lib\site-packages\sqlalchemy\engine\strategies.py", line 114, in connect
return dialect.connect(*cargs, **cparams)
File "F:\Tools\Anaconda3\envs\py3\lib\site-packages\sqlalchemy\engine\default.py", line 453, in connect
return self.dbapi.connect(*cargs, **cparams)
File "F:\Tools\Anaconda3\envs\py3\lib\site-packages\psycopg2_init
.py", line 126, in connect
conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) could not connect to server: Connection refused (0x0000274D/10061)
Is the server running on host "localhost" (::1) and accepting
TCP/IP connections on port 5432?
could not connect to server: Connection refused (0x0000274D/10061)
Is the server running on host "localhost" (127.0.0.1) and accepting
TCP/IP connections on port 5432?

(Background on this error at: http://sqlalche.me/e/e3q8)`

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.