juliomalegria / python-craigslist Goto Github PK

View Code? Open in Web Editor NEW

385.0 385.0 165.0 118 KB

Simple Craigslist wrapper

License: MIT No Attribution

Python 100.00%

python-craigslist's People

Contributors

Stargazers

Watchers

Forkers

scholer iamjakob bfirestone onebrownsound seb-leb mathieu-clement ethan17458 pesanteur neuromusic bag-of-projects mspelman2389 jellis18 cu-boulder-course techscientist pchock apapiu pchng dennis-leeyinghui skafis skypather alculquicondor gregv21v jmcclenon brianc118 geoffdonaldson jderoner fatehdhaliwal eezzeldin techstoa ubergarm epec254 solidspark bytearchive matthew-eads vpontis jmondo manu-chroma bschlenk michaeltjohnston danielklim mthor1234 kevinawoo eugene-yzm htys8899 lekv r3dfruitrollup yannickleroux vkstwts paulklym salvopr tmst g0dspeak oneslowpony sandlerben burkesquires abwillingham ed-chin-git joshphiggins treshev buypolarbear pythonthings sharatsridhar jackwellsxyz bennettrogers ramiro314 echentw willtejeda bcprods ryan10 cjpais yangxiaoze tomharel pshutler nerko69 delikat sdoggett bellyfat fesp21 stephanlensky silverfoxy habemusne chintanvasoya debosmit-neogi sagnik403 sid5566 cool-naman tech-vin ayushjhanwar rahul18728 deepankar06 pooniwalaabhishek archijain08 amartya18110 akshaydudoo irahorecka walid-rahman2 dominic-sylvester cwsaunders blin-os christophergrant

python-craigslist's Issues

release new version with p.text fix

Can you publish a new version with this fix: ebd99b5

Was working in Python 2, getting import error in Python 3

I had the module working well in Python 2, but I want to use it in a script that needs to be in Python 3 and I keep getting an import error.

from craigslist import CraigslistHousing

returns "ImportError: cannot import name 'CraigslistHousing'"

There was some other module called "craigslist" rather than "python-craigslist" installed that I tried to remove. It's not showing up when I use "pip3 list" but I'm not sure if it's causing some sort of conflict.

call to `xrange` in init makes it incompatible with python3

https://github.com/juliomalegria/python-craigslist/blob/master/craigslist/__init__.py#L273

posted_today always is True

I found that the 'posted_today' filter always appeared to be 1, even when passed a value of 0. I believe this line in the init of CraigslistBase is to blame (around line 91):

self.filters[filter['url_key']] = filter['value'] or value

I believe that my desired value, 0, was causing the 'or' test to fail and using the default filter value of 1 in all cases. I changed it to this and it appears to work:

if not value == None:
    self.filters[filter['url_key']] = value
else:
    self.filters[filter['url_key']] = filter['value']

Not a valid site error

None of your example is working, gives errors like:

'newyork' is not a valid site
Traceback (most recent call last):
  File "/Clients/xx/cl.py", line 3, in <module>
    cl_e = CraigslistEvents(site='newyork', filters={'free': True, 'food': True})
  File "/Library/Python/2.7/site-packages/craigslist/__init__.py", line 67, in __init__
    raise ValueError(msg)
ValueError: 'newyork' is not a valid site

housing_type filter

It would be useful to have a housing type filter. I have implemented this change simply by adding

'housing_type': {'url_key': 'housing_type', 'value': None

to extra_filters dictionary in the CraigslistHousing class. However, the types are designated by number and not a string like'condo' or 'house'. I was thinking of making a function that would parse the filters dictionary so that the user could give the argument

filters={'housing_type':['condo', 'house']}

and the parser would change it to

filters={'housing_type':[2, 6]}

which is what Craigslist needs.

Let me know if this sounds ok and I can send a pull request.

UnicodeEncodeError

Thank you for making this awesome package. I tried to run your example and it returns with an error after a few seconds.

code:

from craigslist import CraigslistHousing
cl_h = CraigslistHousing(site='sfbay', area='sfc', category='roo',
                         filters={'max_price': 1200, 'private_room': True})
for result in cl_h.get_results(sort_by='newest', geotagged=True):
    print result

Output:

Traceback (most recent call last):
File "/get_data_test.py", line 35, in
print (result)
UnicodeEncodeError: 'UCS-2' codec can't encode characters in position 56-56: Non-BMP character not supported in Tk

Bug in example code

Here is the housing example:

from craigslist import CraigslistHousing

cl_h = CraigslistHousing(site='sfbay', area='sfc', category='roo',
                         filters={'max_price': 1200, 'private_room': True})

for result in cl_h.get_results(sort_by='newest', geotagged=True):
    print result

This was working about a week ago, but as of recently it throws an error (on the line with the for loop). The error is:
File "/Library/Python/2.7/site-packages/craigslist/__init__.py", line 160, in get_results p_text = row.find('span', {'class': 'p'}).text AttributeError: 'NoneType' object has no attribute 'text'

Thanks

small Python 3 typo in README - print result should be print(result)

appears three times.

CraigslistForSale does not actually support auto_transmission filter

Was reading through your code and I saw in your readme, it says auto_transmission is a filter you can pass to car searches, however that option isnt available

Can you insert a search query?

Perhaps I'm dense, how do you simply add a search query?

Locations, zip codes, and distance

When specifying a location, say NewYork, and also a zip code, say 90210, and also a distance, 200 miles, how exactly is that interpreted by Craigslist since it's a condadiction? Is it 200 miles from the zip code or the location?

Searching for key words in a title

How do you search for key words in the title of an ad using the filter parameter?

ImportError: No module named 'Queue'

File "C:\Python34\lib\site-packages\craigslist\__init__.py", line 3, in <module> from Queue import Queue ImportError: No module named 'Queue'

Looks like it's not Python3 compatible after all.

geotags

Hello, I set geotagged = True for housing, but I receive all 'geotag': None. These websites usually have maps and an address underneath. Is this a CL issue?

.show_filters() not working

>>> from craigslist import CraigslistJobs
>>> CraigslistJobs.show_filters()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: type object 'CraigslistJobs' has no attribute 'show_filters'

1.0.2

upon releasing 1.0.2 you got rid of the filter tree you had? I saw some type of tree you had and was referencing it for what filters to apply.

would appreciate it if you re-added it to the help or sent me in the right direction

thanks alot as always!

Using multiple sites

cl_h = CraigslistForSale(site='vancouver', 'toronto' area='', category='cto',

I want to be able to use multiple sites like the above, instead of just searching vancouver I want to be able to search toronto too.

how can i do this?

thanks

Cant import into Jupyter Notebook

I was having an issue similar to that of @JRHutson but his uninstall/reinstall solution does not work for me. I am attempting to import into a python 3 Jupyter Notebook but it times out with a connection error.

Any ideas? Thanks :)

sites.py does not parse properly.

Assuming due to an update in CL. All that was needed was an extra '/' in rsplit.

!OLD
site = a.attrs['href'].rsplit('/', 1)[1].split('.')[0]

!NEW
site = a.attrs['href'].rsplit('//', 1)[1].split('.')[0]

Works now :)

Craigslist Housing Attribute Error

It looks like craigslist has two ['data-id'] tags that are used interchangeably.
As an example for this request (http://washingtondc.craigslist.org/search/doc/apa?minAsk=3000&maxAsk=5000), 1/2 of the results have a link attribute of "['data-id']" and the other half have a link attribute of "['data-ids']".
This throws an attribute error, so if i might suggest you could just try data-id except for attribute error and try data-ids.
Let me know if you need any more clarification. Thanks!

Not able to get a house for sale in austin

This isnt working. The readme doesnt help you find another city other than san francisco

jsonsearch vs regularsearch?

Hey @juliomalegria, I was wondering if you've ever used the jsonsearch endpoint within craigslist and compared it to the regular search? I wasn't sure what the differences are but it seems a tad easier to parse.

Here's a quick script that runs on Python 3.5 and requires the requests library demonstrating the jsonsearch endpoint: https://gist.github.com/AlJohri/dc51918a65752099b2a8f4df5dba7f93

Here's an example of the jsonsearch endpoint that I'm referring to: https://washingtondc.craigslist.org/jsonsearch/apa?postal_code=20071&search_distance=1&sort=date&s=0

the url_key of min_price is just min_price not minAsk

'min_price': {'url_key': 'minAsk', 'value': None},
'max_price': {'url_key': 'maxAsk', 'value': None},

http://washingtondc.craigslist.org/search/apa?min_price=900&max_price=1000&availabilityMode=0

API Broken

Is there something broken with the API? Whenever I "get_results", it returns nothing.

Unclear what filter values should be

Is the filter "bathrooms" supposed to be an integer or a list? Also confused about "query", is it a string or perhaps a dict?

python3

After installing craigslist module with pip3 successfully, I have following error with Python 3.4.3 on ubuntu 14.04:

Traceback (most recent call last):
  File "nybits9.py", line 16, in <module>
    from craigslist import CraigslistHousing
  File "/usr/local/lib/python3.4/dist-packages/craigslist/__init__.py", line 3, in <module>
    from craigslist._search import search
  File "/usr/local/lib/python3.4/dist-packages/craigslist/_search/__init__.py", line 23
    params = {"s": offset, "sort": sort, **kwargs}
                                          ^
SyntaxError: invalid syntax

I would really appreciate any help

thanks you

Add "bundle_duplicates" base filter

It would be nice to add the &bundleDuplicates=1 option to the list of base filters.

Crash on include_details when post has been deleted

GET https://sfbay.craigslist.org/sfc/apa/d/san-francisco-open-sun-1245pm-130pm/7037808940.html
Response code: 200
Geotagging result ...
Adding details to result...
Traceback (most recent call last):
  [...]
  File ".../venv/lib/python3.6/site-packages/craigslist/__init__.py", line 245, in get_results
    self.include_details(result, detail_soup)
  File ".../venv/lib/python3.6/site-packages/craigslist/__init__.py", line 282, in include_details
    body_text = (getattr(e, 'text', e) for e in body
TypeError: 'NoneType' object is not iterable

The relevant page causing the crash contains the following HTML:

<section class="body">
<div id="userbody">
<div class="prevnext js-only">
<a class="prev">◀  prev </a>
<a class="backup" title="back to search">▲</a>
<a class="next"> next ▶ </a>
</div>
<span id="has_been_removed"></span>
<div class="removed">
<h2>
This posting has been deleted by its author.
    </h2>
</div>
</div>
</section>

refactor to separate IO from parsing?

hi @juliomalegria, I was trying to use your library in some code I'm working on where I'm controlling much of the IO myself. I was having a little trouble because the IO and parsing logic is currently intertwined. The parts that I was hoping to import are:

parsing the listing page
getting the filters that are available on different types of sections
getting the base url depending on the site (city) or section

The main reason for handling the IO myself is so I can handle caching, use whichever IO library I want (requests, aiohttp, etc.), perform threading or multiprocessing whenever and wherever I want to, and download and parse the actual post itself in addition to the geotagging data.

Is this something you'd be interested in doing? It's definitely not a simple change.

Count items returned from Craigslist locations shows tendencies for modulo 120

Hello Julio,

I noticed that python-craigslist returns items from Craigslist locations with a tendency to have posting counts that are factors of 120. To clarify, the maximum items returned from a Craigslist location for a category (e.g. apts/housing from santabarbara) is 3000. The 120 comes from the number of listings per page in Craigslist; meaning at maximum, a Craigslist category per location should have 25 pages.

My speculation is that the craigslist .get_results() module jams and exits when transitioning to a different listing page. I noticed this to be especially the case when geotag=True. I am currently looking at apts/housing in the United States through CraigslistHousing. Please see figure below that outlines this observation (i.e. the high peaks):

Craigslist categories.

Where can one get a list of categories?

craigslist areas / categories reference json api

instead (or perhaps in addition to) using http://www.craigslist.org/about/sites, you can also use the craigslist reference json api: http://www.craigslist.org/about/reference

areas: http://reference.craigslist.org/Areas
categories: http://reference.craigslist.org/Categories

Max get_results

I did some testing, and it seems like the maximum get_results per site, area, and category is 3000.

e.g.
`
from craigslist import CraigslistHousing
x = CraigslistHousing(site='sfbay',category='apa',area='eby')
y = [i for i in x.get_results(sort_by='newest')]

print(len(y)) #3000
`

Would you say Craigslist has an upper limit on posts per area in a certain category?

AttributeError: 'NoneType' object has no attribute 'text'

Love this Python application, thanks. It has been working great, but today, I started getting this error. Any suggestions:

Traceback (most recent call last):
  File "./get_search.py", line 28, in <module>
    for x in gems.get_results(sort_by='newest', geotagged=True):
  File "/home/user47/craigslist_app/lib/python3.4/site-packages/craigslist/__init__.py", line 160, in get_results
    p_text = row.find('span', {'class': 'p'}).text
AttributeError: 'NoneType' object has no attribute 'text'

I'm having trouble running the example code :(

C:\Python27\python.exe "C:/Users/Bobby/PycharmProjects/untitled/complicated loops.py"
Traceback (most recent call last):
File "C:/Users/Bobby/PycharmProjects/untitled/complicated loops.py", line 2, in
from craigslist import CraigslistHousing
File "C:\Python27\lib\site-packages\craigslist__init__.py", line 4, in
import requests
File "C:\Python27\lib\site-packages\requests__init__.py", line 58, in
from . import utils
File "C:\Python27\lib\site-packages\requests\utils.py", line 25, in
from .compat import parse_http_list as parse_list_header
File "C:\Python27\lib\site-packages\requests\compat.py", line 7, in
from .packages import chardet
File "C:\Python27\lib\site-packages\requests\packages__init_.py", line 3, in
from . import urllib3
File "C:\Python27\lib\site-packages\requests\packages\urllib3__init__.py", line 10, in
from .connectionpool import (
File "C:\Python27\lib\site-packages\requests\packages\urllib3\connectionpool.py", line 37, in
from .request import RequestMethods
File "C:\Python27\lib\site-packages\requests\packages\urllib3\request.py", line 6, in
from .filepost import encode_multipart_formdata
File "C:\Python27\lib\site-packages\requests\packages\urllib3\filepost.py", line 8, in
from .fields import RequestField
File "C:\Python27\lib\site-packages\requests\packages\urllib3\fields.py", line 1, in
import email.utils
ImportError: No module named utils

Process finished with exit code 1

I installed the module via pip on my windows computer. It is in the list of modules when I do pip list. Do i have to access a setup file now or what?

AttributeError: 'NoneType' object has no attribute 'text'

This was run in ipython notebook (python 2.7.11)

/Users/megablanc/Library/Python/2.7/lib/python/site-packages/craigslist/__init__.pyc in get_results(self, limit, sort_by, geotagged)
    158                 price = row.find('span', {'class': 'price'})
    159                 where = row.find('small')
--> 160                 p_text = row.find('span', {'class': 'p'}).text
    161 
    162                 result = {'id': id,

AttributeError: 'NoneType' object has no attribute 'text'

Full code:

from craigslist import CraigslistHousing

cl = CraigslistHousing(site='boston', area='gbs', category='roo',
                         filters={'max_price': 2000, 'min_price': 500})

results = cl.get_results(sort_by='newest', geotagged=True, limit=20)
for result in results:
    print result

I get a couple results but eventually it crashes.

Bypassing Captcha

Can this wrapper bypass captcha to get the contact information? By anyway is this possible to completely bypass captcha and get the contact info. I want to create a scrapper

CraiglistForSale Category=mca

The motorcycle category is missing the min engine displacement and max engine displacement fields.

No longer works?

As of today it looks like the wrapper no longer returns results. I've been running it for the past week and it's been fine until now. Is it an issue on my end?

Get fields used to filter by

Is there a way to get the fields the query filtered by in the result? For example, when searching for cars we can filter with min_miles and max_miles, however the result doesn't return the actual miles. Can this be done with customize_result function instead? If so can you add example usage in the read me? I've looked at the source code and from what I understand you have to change the flag of custom_result_fields to true in CraigslistBase. I'd be nice to have this done once an instance has been created.

Results from wrong location

Easy to deal with by filtering externally, but might be worth thinking about searches with few results.

For housing, small markets tend to have few listings so searches like CraigslistHousing(site='stillwater', category='apa', filters={'posted_today': True}) will return housing from all over Oklahoma, as scraped from the 'similar listings' section.

This may be applicable for job or sale searches as well, where results to similar queries are shown when there are few or no results.

Save print result to data set

Hi @juliomalegria , How can I save the result in to a data format, that I can output the data?

thanks
Sophia

Datetime question

This utility is awesome, thanks for all the work. Question about datetime. I was looking in the Rochester, MN area, and received a result that looks like this;

{'name': u'QUIET BUILDING! CLOSE TO DOWNTOWN! BEST DEAL AROUND!', 'area': u'875ft2', 'url': u'https://rmn.craigslist.org/apa/d/rochester-quiet-building-close-to/6748094294.html', 'where': u'Rochester NW', 'price': u'$825', 'bedrooms': u'2', 'geotag': (44.036915, -92.464017), 'repost_of': u'6576850170', 'has_image': True, 'datetime': u'2018-12-12 16:15', 'has_map': True, 'id': u'6748094294'}

The datetime displayed is 2018-12-12. However, if I look at the listing based on its url, the date posted shows as: 2018-11-13. Is this something you've come across?

Get fields used to filter by

pages may have less than RESULTS_PER_REQUEST

if (total_so_far - start) < RESULTS_PER_REQUEST:
    break

depending on the filters used, I'm seeing many pages have less than 100 results per page even if there are multiple pages. here is an example url:

https://washingtondc.craigslist.org/search/apa?search_distance=1&postal=20071&availabilityMode=0

Run document.querySelectorAll("#sortable-results .row").length == 51 despite the top showing 1 to 100 of 1177.

Not getting geotags?

It seems that all returned data under forsale has a geotag field of "null". Did craigslist change recently?

site.py doesn't work.

Hi,

I am very glad to find this package!
Because now I am trying to find an apartment in SoCal area, this package is really helpful for me.

Btw, when I try to use your example, unfortunately, it doesn't work because site.py couldn't get decent data from craigslist so that I just commented a line from 'init' and then now it works perfectly as I expected [1]. If you have time to revise it, this can be considered.

Thanks!
SJ

[1] line 66 (#raise ValueError(msg))

Reading the post description

Hi,
I was wondering if there was functionality to get the description of a given post within the module?
Thanks!

neighborhoods filter

Hello guys,

Any ideas how we can filter using neighborhoods? It looks like not supported, correct?