Git Product home page Git Product logo

python-craigslist's People

Contributors

alculquicondor avatar bschlenk avatar danielklim avatar debosmit-neogi avatar echentw avatar imdevinc avatar irahorecka avatar juliomalegria avatar mathieu-clement avatar neuromusic avatar onebrownsound avatar tweakdeveloper avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

python-craigslist's Issues

Was working in Python 2, getting import error in Python 3

I had the module working well in Python 2, but I want to use it in a script that needs to be in Python 3 and I keep getting an import error.

from craigslist import CraigslistHousing

returns "ImportError: cannot import name 'CraigslistHousing'"

There was some other module called "craigslist" rather than "python-craigslist" installed that I tried to remove. It's not showing up when I use "pip3 list" but I'm not sure if it's causing some sort of conflict.

posted_today always is True

I found that the 'posted_today' filter always appeared to be 1, even when passed a value of 0. I believe this line in the init of CraigslistBase is to blame (around line 91):

self.filters[filter['url_key']] = filter['value'] or value

I believe that my desired value, 0, was causing the 'or' test to fail and using the default filter value of 1 in all cases. I changed it to this and it appears to work:

if not value == None:
    self.filters[filter['url_key']] = value
else:
    self.filters[filter['url_key']] = filter['value'] 

Not a valid site error

None of your example is working, gives errors like:

'newyork' is not a valid site
Traceback (most recent call last):
  File "/Clients/xx/cl.py", line 3, in <module>
    cl_e = CraigslistEvents(site='newyork', filters={'free': True, 'food': True})
  File "/Library/Python/2.7/site-packages/craigslist/__init__.py", line 67, in __init__
    raise ValueError(msg)
ValueError: 'newyork' is not a valid site

housing_type filter

It would be useful to have a housing type filter. I have implemented this change simply by adding

'housing_type': {'url_key': 'housing_type', 'value': None

to extra_filters dictionary in the CraigslistHousing class. However, the types are designated by number and not a string like'condo' or 'house'. I was thinking of making a function that would parse the filters dictionary so that the user could give the argument

filters={'housing_type':['condo', 'house']}

and the parser would change it to

filters={'housing_type':[2, 6]}

which is what Craigslist needs.

Let me know if this sounds ok and I can send a pull request.

UnicodeEncodeError

Thank you for making this awesome package. I tried to run your example and it returns with an error after a few seconds.

code:

from craigslist import CraigslistHousing
cl_h = CraigslistHousing(site='sfbay', area='sfc', category='roo',
                         filters={'max_price': 1200, 'private_room': True})
for result in cl_h.get_results(sort_by='newest', geotagged=True):
    print result

Output:

Traceback (most recent call last):
File "/get_data_test.py", line 35, in
print (result)
UnicodeEncodeError: 'UCS-2' codec can't encode characters in position 56-56: Non-BMP character not supported in Tk

Bug in example code

Here is the housing example:

from craigslist import CraigslistHousing

cl_h = CraigslistHousing(site='sfbay', area='sfc', category='roo',
                         filters={'max_price': 1200, 'private_room': True})

for result in cl_h.get_results(sort_by='newest', geotagged=True):
    print result

This was working about a week ago, but as of recently it throws an error (on the line with the for loop). The error is:
File "/Library/Python/2.7/site-packages/craigslist/__init__.py", line 160, in get_results p_text = row.find('span', {'class': 'p'}).text AttributeError: 'NoneType' object has no attribute 'text'

Thanks

Locations, zip codes, and distance

When specifying a location, say NewYork, and also a zip code, say 90210, and also a distance, 200 miles, how exactly is that interpreted by Craigslist since it's a condadiction? Is it 200 miles from the zip code or the location?

ImportError: No module named 'Queue'

File "C:\Python34\lib\site-packages\craigslist\__init__.py", line 3, in <module> from Queue import Queue ImportError: No module named 'Queue'

Looks like it's not Python3 compatible after all.

geotags

Hello, I set geotagged = True for housing, but I receive all 'geotag': None. These websites usually have maps and an address underneath. Is this a CL issue?

.show_filters() not working

>>> from craigslist import CraigslistJobs
>>> CraigslistJobs.show_filters()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: type object 'CraigslistJobs' has no attribute 'show_filters'

1.0.2

upon releasing 1.0.2 you got rid of the filter tree you had? I saw some type of tree you had and was referencing it for what filters to apply.

would appreciate it if you re-added it to the help or sent me in the right direction

thanks alot as always!

Using multiple sites

cl_h = CraigslistForSale(site='vancouver', 'toronto' area='', category='cto',

I want to be able to use multiple sites like the above, instead of just searching vancouver I want to be able to search toronto too.

how can i do this?

thanks

Cant import into Jupyter Notebook

I was having an issue similar to that of @JRHutson but his uninstall/reinstall solution does not work for me. I am attempting to import into a python 3 Jupyter Notebook but it times out with a connection error.

Any ideas? Thanks :)

sites.py does not parse properly.

Assuming due to an update in CL. All that was needed was an extra '/' in rsplit.

!OLD
site = a.attrs['href'].rsplit('/', 1)[1].split('.')[0]

!NEW
site = a.attrs['href'].rsplit('//', 1)[1].split('.')[0]

Works now :)

Craigslist Housing Attribute Error

It looks like craigslist has two ['data-id'] tags that are used interchangeably.
As an example for this request (http://washingtondc.craigslist.org/search/doc/apa?minAsk=3000&maxAsk=5000), 1/2 of the results have a link attribute of "['data-id']" and the other half have a link attribute of "['data-ids']".
This throws an attribute error, so if i might suggest you could just try data-id except for attribute error and try data-ids.
Let me know if you need any more clarification. Thanks!

jsonsearch vs regularsearch?

Hey @juliomalegria, I was wondering if you've ever used the jsonsearch endpoint within craigslist and compared it to the regular search? I wasn't sure what the differences are but it seems a tad easier to parse.

Here's a quick script that runs on Python 3.5 and requires the requests library demonstrating the jsonsearch endpoint: https://gist.github.com/AlJohri/dc51918a65752099b2a8f4df5dba7f93

Here's an example of the jsonsearch endpoint that I'm referring to: https://washingtondc.craigslist.org/jsonsearch/apa?postal_code=20071&search_distance=1&sort=date&s=0

API Broken

Is there something broken with the API? Whenever I "get_results", it returns nothing.

python3

After installing craigslist module with pip3 successfully, I have following error with Python 3.4.3 on ubuntu 14.04:

Traceback (most recent call last):
  File "nybits9.py", line 16, in <module>
    from craigslist import CraigslistHousing
  File "/usr/local/lib/python3.4/dist-packages/craigslist/__init__.py", line 3, in <module>
    from craigslist._search import search
  File "/usr/local/lib/python3.4/dist-packages/craigslist/_search/__init__.py", line 23
    params = {"s": offset, "sort": sort, **kwargs}
                                          ^
SyntaxError: invalid syntax

I would really appreciate any help

thanks you

Crash on include_details when post has been deleted

GET https://sfbay.craigslist.org/sfc/apa/d/san-francisco-open-sun-1245pm-130pm/7037808940.html
Response code: 200
Geotagging result ...
Adding details to result...
Traceback (most recent call last):
  [...]
  File ".../venv/lib/python3.6/site-packages/craigslist/__init__.py", line 245, in get_results
    self.include_details(result, detail_soup)
  File ".../venv/lib/python3.6/site-packages/craigslist/__init__.py", line 282, in include_details
    body_text = (getattr(e, 'text', e) for e in body
TypeError: 'NoneType' object is not iterable

The relevant page causing the crash contains the following HTML:

<section class="body">
<div id="userbody">
<div class="prevnext js-only">
<a class="prev">◀  prev </a>
<a class="backup" title="back to search"></a>
<a class="next"> next ▶ </a>
</div>
<span id="has_been_removed"></span>
<div class="removed">
<h2>
This posting has been deleted by its author.
    </h2>
</div>
</div>
</section>

refactor to separate IO from parsing?

hi @juliomalegria, I was trying to use your library in some code I'm working on where I'm controlling much of the IO myself. I was having a little trouble because the IO and parsing logic is currently intertwined. The parts that I was hoping to import are:

  1. parsing the listing page
  2. getting the filters that are available on different types of sections
  3. getting the base url depending on the site (city) or section

The main reason for handling the IO myself is so I can handle caching, use whichever IO library I want (requests, aiohttp, etc.), perform threading or multiprocessing whenever and wherever I want to, and download and parse the actual post itself in addition to the geotagging data.

Is this something you'd be interested in doing? It's definitely not a simple change.

Count items returned from Craigslist locations shows tendencies for modulo 120

Hello Julio,

I noticed that python-craigslist returns items from Craigslist locations with a tendency to have posting counts that are factors of 120. To clarify, the maximum items returned from a Craigslist location for a category (e.g. apts/housing from santabarbara) is 3000. The 120 comes from the number of listings per page in Craigslist; meaning at maximum, a Craigslist category per location should have 25 pages.

My speculation is that the craigslist .get_results() module jams and exits when transitioning to a different listing page. I noticed this to be especially the case when geotag=True. I am currently looking at apts/housing in the United States through CraigslistHousing. Please see figure below that outlines this observation (i.e. the high peaks):

python-craigslist_post_freq

Max get_results

I did some testing, and it seems like the maximum get_results per site, area, and category is 3000.

e.g.
`
from craigslist import CraigslistHousing
x = CraigslistHousing(site='sfbay',category='apa',area='eby')
y = [i for i in x.get_results(sort_by='newest')]

print(len(y)) #3000
`

Would you say Craigslist has an upper limit on posts per area in a certain category?

AttributeError: 'NoneType' object has no attribute 'text'

Love this Python application, thanks. It has been working great, but today, I started getting this error. Any suggestions:

Traceback (most recent call last):
  File "./get_search.py", line 28, in <module>
    for x in gems.get_results(sort_by='newest', geotagged=True):
  File "/home/user47/craigslist_app/lib/python3.4/site-packages/craigslist/__init__.py", line 160, in get_results
    p_text = row.find('span', {'class': 'p'}).text
AttributeError: 'NoneType' object has no attribute 'text'

I'm having trouble running the example code :(

C:\Python27\python.exe "C:/Users/Bobby/PycharmProjects/untitled/complicated loops.py"
Traceback (most recent call last):
File "C:/Users/Bobby/PycharmProjects/untitled/complicated loops.py", line 2, in
from craigslist import CraigslistHousing
File "C:\Python27\lib\site-packages\craigslist__init__.py", line 4, in
import requests
File "C:\Python27\lib\site-packages\requests__init__.py", line 58, in
from . import utils
File "C:\Python27\lib\site-packages\requests\utils.py", line 25, in
from .compat import parse_http_list as parse_list_header
File "C:\Python27\lib\site-packages\requests\compat.py", line 7, in
from .packages import chardet
File "C:\Python27\lib\site-packages\requests\packages__init
_.py", line 3, in
from . import urllib3
File "C:\Python27\lib\site-packages\requests\packages\urllib3__init__.py", line 10, in
from .connectionpool import (
File "C:\Python27\lib\site-packages\requests\packages\urllib3\connectionpool.py", line 37, in
from .request import RequestMethods
File "C:\Python27\lib\site-packages\requests\packages\urllib3\request.py", line 6, in
from .filepost import encode_multipart_formdata
File "C:\Python27\lib\site-packages\requests\packages\urllib3\filepost.py", line 8, in
from .fields import RequestField
File "C:\Python27\lib\site-packages\requests\packages\urllib3\fields.py", line 1, in
import email.utils
ImportError: No module named utils

Process finished with exit code 1

I installed the module via pip on my windows computer. It is in the list of modules when I do pip list. Do i have to access a setup file now or what?

AttributeError: 'NoneType' object has no attribute 'text'

This was run in ipython notebook (python 2.7.11)

/Users/megablanc/Library/Python/2.7/lib/python/site-packages/craigslist/__init__.pyc in get_results(self, limit, sort_by, geotagged)
    158                 price = row.find('span', {'class': 'price'})
    159                 where = row.find('small')
--> 160                 p_text = row.find('span', {'class': 'p'}).text
    161 
    162                 result = {'id': id,

AttributeError: 'NoneType' object has no attribute 'text'

Full code:

from craigslist import CraigslistHousing

cl = CraigslistHousing(site='boston', area='gbs', category='roo',
                         filters={'max_price': 2000, 'min_price': 500})

results = cl.get_results(sort_by='newest', geotagged=True, limit=20)
for result in results:
    print result

I get a couple results but eventually it crashes.

Bypassing Captcha

Can this wrapper bypass captcha to get the contact information? By anyway is this possible to completely bypass captcha and get the contact info. I want to create a scrapper

No longer works?

As of today it looks like the wrapper no longer returns results. I've been running it for the past week and it's been fine until now. Is it an issue on my end?

Get fields used to filter by

Is there a way to get the fields the query filtered by in the result? For example, when searching for cars we can filter with min_miles and max_miles, however the result doesn't return the actual miles. Can this be done with customize_result function instead? If so can you add example usage in the read me? I've looked at the source code and from what I understand you have to change the flag of custom_result_fields to true in CraigslistBase. I'd be nice to have this done once an instance has been created.

Results from wrong location

Easy to deal with by filtering externally, but might be worth thinking about searches with few results.

For housing, small markets tend to have few listings so searches like CraigslistHousing(site='stillwater', category='apa', filters={'posted_today': True}) will return housing from all over Oklahoma, as scraped from the 'similar listings' section.

This may be applicable for job or sale searches as well, where results to similar queries are shown when there are few or no results.

Datetime question

This utility is awesome, thanks for all the work. Question about datetime. I was looking in the Rochester, MN area, and received a result that looks like this;

{'name': u'QUIET BUILDING! CLOSE TO DOWNTOWN! BEST DEAL AROUND!', 'area': u'875ft2', 'url': u'https://rmn.craigslist.org/apa/d/rochester-quiet-building-close-to/6748094294.html', 'where': u'Rochester NW', 'price': u'$825', 'bedrooms': u'2', 'geotag': (44.036915, -92.464017), 'repost_of': u'6576850170', 'has_image': True, 'datetime': u'2018-12-12 16:15', 'has_map': True, 'id': u'6748094294'}

The datetime displayed is 2018-12-12. However, if I look at the listing based on its url, the date posted shows as: 2018-11-13. Is this something you've come across?

Get fields used to filter by

Is there a way to get the fields the query filtered by in the result? For example, when searching for cars we can filter with min_miles and max_miles, however the result doesn't return the actual miles. Can this be done with customize_result function instead? If so can you add example usage in the read me? I've looked at the source code and from what I understand you have to change the flag of custom_result_fields to true in CraigslistBase. I'd be nice to have this done once an instance has been created.

Not getting geotags?

It seems that all returned data under forsale has a geotag field of "null". Did craigslist change recently?

site.py doesn't work.

Hi,

I am very glad to find this package!
Because now I am trying to find an apartment in SoCal area, this package is really helpful for me.

Btw, when I try to use your example, unfortunately, it doesn't work because site.py couldn't get decent data from craigslist so that I just commented a line from 'init' and then now it works perfectly as I expected [1]. If you have time to revise it, this can be considered.

Thanks!
SJ

[1] line 66 (#raise ValueError(msg))

Reading the post description

Hi,
I was wondering if there was functionality to get the description of a given post within the module?
Thanks!

neighborhoods filter

Hello guys,

Any ideas how we can filter using neighborhoods? It looks like not supported, correct?

Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.