juliomalegria / python-craigslist Goto Github PK
View Code? Open in Web Editor NEWSimple Craigslist wrapper
License: MIT No Attribution
Simple Craigslist wrapper
License: MIT No Attribution
Can you publish a new version with this fix: ebd99b5
I had the module working well in Python 2, but I want to use it in a script that needs to be in Python 3 and I keep getting an import error.
from craigslist import CraigslistHousing
returns "ImportError: cannot import name 'CraigslistHousing'"
There was some other module called "craigslist" rather than "python-craigslist" installed that I tried to remove. It's not showing up when I use "pip3 list" but I'm not sure if it's causing some sort of conflict.
I found that the 'posted_today' filter always appeared to be 1, even when passed a value of 0. I believe this line in the init of CraigslistBase is to blame (around line 91):
self.filters[filter['url_key']] = filter['value'] or value
I believe that my desired value, 0, was causing the 'or' test to fail and using the default filter value of 1 in all cases. I changed it to this and it appears to work:
if not value == None:
self.filters[filter['url_key']] = value
else:
self.filters[filter['url_key']] = filter['value']
None of your example is working, gives errors like:
'newyork' is not a valid site
Traceback (most recent call last):
File "/Clients/xx/cl.py", line 3, in <module>
cl_e = CraigslistEvents(site='newyork', filters={'free': True, 'food': True})
File "/Library/Python/2.7/site-packages/craigslist/__init__.py", line 67, in __init__
raise ValueError(msg)
ValueError: 'newyork' is not a valid site
It would be useful to have a housing type filter. I have implemented this change simply by adding
'housing_type': {'url_key': 'housing_type', 'value': None
to extra_filters
dictionary in the CraigslistHousing
class. However, the types are designated by number and not a string like'condo'
or 'house'
. I was thinking of making a function that would parse the filters dictionary so that the user could give the argument
filters={'housing_type':['condo', 'house']}
and the parser would change it to
filters={'housing_type':[2, 6]}
which is what Craigslist needs.
Let me know if this sounds ok and I can send a pull request.
Thank you for making this awesome package. I tried to run your example and it returns with an error after a few seconds.
code:
from craigslist import CraigslistHousing
cl_h = CraigslistHousing(site='sfbay', area='sfc', category='roo',
filters={'max_price': 1200, 'private_room': True})
for result in cl_h.get_results(sort_by='newest', geotagged=True):
print result
Output:
Traceback (most recent call last):
File "/get_data_test.py", line 35, in
print (result)
UnicodeEncodeError: 'UCS-2' codec can't encode characters in position 56-56: Non-BMP character not supported in Tk
Here is the housing example:
from craigslist import CraigslistHousing
cl_h = CraigslistHousing(site='sfbay', area='sfc', category='roo',
filters={'max_price': 1200, 'private_room': True})
for result in cl_h.get_results(sort_by='newest', geotagged=True):
print result
This was working about a week ago, but as of recently it throws an error (on the line with the for loop). The error is:
File "/Library/Python/2.7/site-packages/craigslist/__init__.py", line 160, in get_results p_text = row.find('span', {'class': 'p'}).text AttributeError: 'NoneType' object has no attribute 'text'
Thanks
appears three times.
Was reading through your code and I saw in your readme, it says auto_transmission
is a filter you can pass to car searches, however that option isnt available
Perhaps I'm dense, how do you simply add a search query?
When specifying a location, say NewYork, and also a zip code, say 90210, and also a distance, 200 miles, how exactly is that interpreted by Craigslist since it's a condadiction? Is it 200 miles from the zip code or the location?
How do you search for key words in the title of an ad using the filter parameter?
File "C:\Python34\lib\site-packages\craigslist\__init__.py", line 3, in <module> from Queue import Queue ImportError: No module named 'Queue'
Looks like it's not Python3 compatible after all.
Hello, I set geotagged = True for housing, but I receive all 'geotag': None. These websites usually have maps and an address underneath. Is this a CL issue?
>>> from craigslist import CraigslistJobs
>>> CraigslistJobs.show_filters()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: type object 'CraigslistJobs' has no attribute 'show_filters'
upon releasing 1.0.2 you got rid of the filter tree you had? I saw some type of tree you had and was referencing it for what filters to apply.
would appreciate it if you re-added it to the help or sent me in the right direction
thanks alot as always!
cl_h = CraigslistForSale(site='vancouver', 'toronto' area='', category='cto',
I want to be able to use multiple sites like the above, instead of just searching vancouver I want to be able to search toronto too.
how can i do this?
thanks
I was having an issue similar to that of @JRHutson but his uninstall/reinstall solution does not work for me. I am attempting to import into a python 3 Jupyter Notebook but it times out with a connection error.
Any ideas? Thanks :)
Assuming due to an update in CL. All that was needed was an extra '/' in rsplit.
!OLD
site = a.attrs['href'].rsplit('/', 1)[1].split('.')[0]
!NEW
site = a.attrs['href'].rsplit('//', 1)[1].split('.')[0]
Works now :)
It looks like craigslist has two ['data-id'] tags that are used interchangeably.
As an example for this request (http://washingtondc.craigslist.org/search/doc/apa?minAsk=3000&maxAsk=5000), 1/2 of the results have a link attribute of "['data-id']" and the other half have a link attribute of "['data-ids']".
This throws an attribute error, so if i might suggest you could just try data-id except for attribute error and try data-ids.
Let me know if you need any more clarification. Thanks!
This isnt working. The readme doesnt help you find another city other than san francisco
Hey @juliomalegria, I was wondering if you've ever used the jsonsearch
endpoint within craigslist and compared it to the regular search? I wasn't sure what the differences are but it seems a tad easier to parse.
Here's a quick script that runs on Python 3.5 and requires the requests
library demonstrating the jsonsearch
endpoint: https://gist.github.com/AlJohri/dc51918a65752099b2a8f4df5dba7f93
Here's an example of the jsonsearch
endpoint that I'm referring to: https://washingtondc.craigslist.org/jsonsearch/apa?postal_code=20071&search_distance=1&sort=date&s=0
'min_price': {'url_key': 'minAsk', 'value': None},
'max_price': {'url_key': 'maxAsk', 'value': None},
http://washingtondc.craigslist.org/search/apa?min_price=900&max_price=1000&availabilityMode=0
Is there something broken with the API? Whenever I "get_results", it returns nothing.
Is the filter "bathrooms" supposed to be an integer or a list? Also confused about "query", is it a string or perhaps a dict?
After installing craigslist module with pip3 successfully, I have following error with Python 3.4.3 on ubuntu 14.04:
Traceback (most recent call last):
File "nybits9.py", line 16, in <module>
from craigslist import CraigslistHousing
File "/usr/local/lib/python3.4/dist-packages/craigslist/__init__.py", line 3, in <module>
from craigslist._search import search
File "/usr/local/lib/python3.4/dist-packages/craigslist/_search/__init__.py", line 23
params = {"s": offset, "sort": sort, **kwargs}
^
SyntaxError: invalid syntax
I would really appreciate any help
thanks you
It would be nice to add the &bundleDuplicates=1 option to the list of base filters.
GET https://sfbay.craigslist.org/sfc/apa/d/san-francisco-open-sun-1245pm-130pm/7037808940.html
Response code: 200
Geotagging result ...
Adding details to result...
Traceback (most recent call last):
[...]
File ".../venv/lib/python3.6/site-packages/craigslist/__init__.py", line 245, in get_results
self.include_details(result, detail_soup)
File ".../venv/lib/python3.6/site-packages/craigslist/__init__.py", line 282, in include_details
body_text = (getattr(e, 'text', e) for e in body
TypeError: 'NoneType' object is not iterable
The relevant page causing the crash contains the following HTML:
<section class="body">
<div id="userbody">
<div class="prevnext js-only">
<a class="prev">◀ prev </a>
<a class="backup" title="back to search">▲</a>
<a class="next"> next ▶ </a>
</div>
<span id="has_been_removed"></span>
<div class="removed">
<h2>
This posting has been deleted by its author.
</h2>
</div>
</div>
</section>
hi @juliomalegria, I was trying to use your library in some code I'm working on where I'm controlling much of the IO myself. I was having a little trouble because the IO and parsing logic is currently intertwined. The parts that I was hoping to import are:
The main reason for handling the IO myself is so I can handle caching, use whichever IO library I want (requests
, aiohttp
, etc.), perform threading or multiprocessing whenever and wherever I want to, and download and parse the actual post itself in addition to the geotagging data.
Is this something you'd be interested in doing? It's definitely not a simple change.
Hello Julio,
I noticed that python-craigslist returns items from Craigslist locations with a tendency to have posting counts that are factors of 120. To clarify, the maximum items returned from a Craigslist location for a category (e.g. apts/housing from santabarbara) is 3000. The 120 comes from the number of listings per page in Craigslist; meaning at maximum, a Craigslist category per location should have 25 pages.
My speculation is that the craigslist .get_results() module jams and exits when transitioning to a different listing page. I noticed this to be especially the case when geotag=True. I am currently looking at apts/housing in the United States through CraigslistHousing. Please see figure below that outlines this observation (i.e. the high peaks):
Where can one get a list of categories?
instead (or perhaps in addition to) using http://www.craigslist.org/about/sites, you can also use the craigslist reference json api: http://www.craigslist.org/about/reference
areas: http://reference.craigslist.org/Areas
categories: http://reference.craigslist.org/Categories
I did some testing, and it seems like the maximum get_results per site, area, and category is 3000.
e.g.
`
from craigslist import CraigslistHousing
x = CraigslistHousing(site='sfbay',category='apa',area='eby')
y = [i for i in x.get_results(sort_by='newest')]
print(len(y)) #3000
`
Would you say Craigslist has an upper limit on posts per area in a certain category?
Love this Python application, thanks. It has been working great, but today, I started getting this error. Any suggestions:
Traceback (most recent call last):
File "./get_search.py", line 28, in <module>
for x in gems.get_results(sort_by='newest', geotagged=True):
File "/home/user47/craigslist_app/lib/python3.4/site-packages/craigslist/__init__.py", line 160, in get_results
p_text = row.find('span', {'class': 'p'}).text
AttributeError: 'NoneType' object has no attribute 'text'
C:\Python27\python.exe "C:/Users/Bobby/PycharmProjects/untitled/complicated loops.py"
Traceback (most recent call last):
File "C:/Users/Bobby/PycharmProjects/untitled/complicated loops.py", line 2, in
from craigslist import CraigslistHousing
File "C:\Python27\lib\site-packages\craigslist__init__.py", line 4, in
import requests
File "C:\Python27\lib\site-packages\requests__init__.py", line 58, in
from . import utils
File "C:\Python27\lib\site-packages\requests\utils.py", line 25, in
from .compat import parse_http_list as parse_list_header
File "C:\Python27\lib\site-packages\requests\compat.py", line 7, in
from .packages import chardet
File "C:\Python27\lib\site-packages\requests\packages__init_.py", line 3, in
from . import urllib3
File "C:\Python27\lib\site-packages\requests\packages\urllib3__init__.py", line 10, in
from .connectionpool import (
File "C:\Python27\lib\site-packages\requests\packages\urllib3\connectionpool.py", line 37, in
from .request import RequestMethods
File "C:\Python27\lib\site-packages\requests\packages\urllib3\request.py", line 6, in
from .filepost import encode_multipart_formdata
File "C:\Python27\lib\site-packages\requests\packages\urllib3\filepost.py", line 8, in
from .fields import RequestField
File "C:\Python27\lib\site-packages\requests\packages\urllib3\fields.py", line 1, in
import email.utils
ImportError: No module named utils
Process finished with exit code 1
I installed the module via pip on my windows computer. It is in the list of modules when I do pip list. Do i have to access a setup file now or what?
This was run in ipython notebook (python 2.7.11)
/Users/megablanc/Library/Python/2.7/lib/python/site-packages/craigslist/__init__.pyc in get_results(self, limit, sort_by, geotagged)
158 price = row.find('span', {'class': 'price'})
159 where = row.find('small')
--> 160 p_text = row.find('span', {'class': 'p'}).text
161
162 result = {'id': id,
AttributeError: 'NoneType' object has no attribute 'text'
Full code:
from craigslist import CraigslistHousing
cl = CraigslistHousing(site='boston', area='gbs', category='roo',
filters={'max_price': 2000, 'min_price': 500})
results = cl.get_results(sort_by='newest', geotagged=True, limit=20)
for result in results:
print result
I get a couple results but eventually it crashes.
Can this wrapper bypass captcha to get the contact information? By anyway is this possible to completely bypass captcha and get the contact info. I want to create a scrapper
The motorcycle category is missing the min engine displacement and max engine displacement fields.
As of today it looks like the wrapper no longer returns results. I've been running it for the past week and it's been fine until now. Is it an issue on my end?
Is there a way to get the fields the query filtered by in the result? For example, when searching for cars we can filter with min_miles
and max_miles
, however the result doesn't return the actual miles. Can this be done with customize_result
function instead? If so can you add example usage in the read me? I've looked at the source code and from what I understand you have to change the flag of custom_result_fields
to true in CraigslistBase
. I'd be nice to have this done once an instance has been created.
Easy to deal with by filtering externally, but might be worth thinking about searches with few results.
For housing, small markets tend to have few listings so searches like CraigslistHousing(site='stillwater', category='apa', filters={'posted_today': True}) will return housing from all over Oklahoma, as scraped from the 'similar listings' section.
This may be applicable for job or sale searches as well, where results to similar queries are shown when there are few or no results.
Hi @juliomalegria , How can I save the result in to a data format, that I can output the data?
thanks
Sophia
This utility is awesome, thanks for all the work. Question about datetime. I was looking in the Rochester, MN area, and received a result that looks like this;
{'name': u'QUIET BUILDING! CLOSE TO DOWNTOWN! BEST DEAL AROUND!', 'area': u'875ft2', 'url': u'https://rmn.craigslist.org/apa/d/rochester-quiet-building-close-to/6748094294.html', 'where': u'Rochester NW', 'price': u'$825', 'bedrooms': u'2', 'geotag': (44.036915, -92.464017), 'repost_of': u'6576850170', 'has_image': True, 'datetime': u'2018-12-12 16:15', 'has_map': True, 'id': u'6748094294'}
The datetime displayed is 2018-12-12. However, if I look at the listing based on its url, the date posted shows as: 2018-11-13. Is this something you've come across?
Is there a way to get the fields the query filtered by in the result? For example, when searching for cars we can filter with min_miles
and max_miles
, however the result doesn't return the actual miles. Can this be done with customize_result
function instead? If so can you add example usage in the read me? I've looked at the source code and from what I understand you have to change the flag of custom_result_fields
to true in CraigslistBase
. I'd be nice to have this done once an instance has been created.
if (total_so_far - start) < RESULTS_PER_REQUEST:
break
depending on the filters used, I'm seeing many pages have less than 100 results per page even if there are multiple pages. here is an example url:
https://washingtondc.craigslist.org/search/apa?search_distance=1&postal=20071&availabilityMode=0
Run document.querySelectorAll("#sortable-results .row").length == 51
despite the top showing 1 to 100 of 1177
.
It seems that all returned data under forsale has a geotag field of "null". Did craigslist change recently?
Hi,
I am very glad to find this package!
Because now I am trying to find an apartment in SoCal area, this package is really helpful for me.
Btw, when I try to use your example, unfortunately, it doesn't work because site.py couldn't get decent data from craigslist so that I just commented a line from 'init' and then now it works perfectly as I expected [1]. If you have time to revise it, this can be considered.
Thanks!
SJ
[1] line 66 (#raise ValueError(msg))
Hi,
I was wondering if there was functionality to get the description of a given post within the module?
Thanks!
Hello guys,
Any ideas how we can filter using neighborhoods? It looks like not supported, correct?
Thanks
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.