brozeph / node-craigslist Goto Github PK
View Code? Open in Web Editor NEWNode driver for searching Craigslist.com listings
License: MIT License
Node driver for searching Craigslist.com listings
License: MIT License
Would it be possible to run some code that reveals the contact info and returns it?
To get this to work for Toronto, I had to change the code like so:
baseHost = '.craigslist.ca',
Maybe a flag in options to allow this to be set in a custom manner?
I can search for items fine, but when I use the price filters they are completely ignored. This happening for anyone else?
Running the #search example in the README returns an error for too many redirects.
Tried adjusting the city and the search query but the redirect limit error is always returned.
reproduced in node versions:
v15.12.0
v14.16.0
var
craigslist = require('node-craigslist'),
client = new craigslist.Client({
city : 'seattle'
});
client
.search('xbox one')
.then((listings) => {
// play with listings here...
listings.forEach((listing) => console.log(listing));
})
.catch((err) => {
console.error(err);
});
throws:
Error: maximum redirect limit exceeded
at ClientRequest.<anonymous> (/mnt/c/Users/mbmcm/example/node_modules/reqlib/dist/index.js:429:29)
at Object.onceWrapper (node:events:476:26)
at ClientRequest.emit (node:events:369:20)
at HTTPParser.parserOnIncomingClient [as onIncoming] (node:_http_client:636:27)
at HTTPParser.parserOnHeadersComplete (node:_http_common:129:17)
at Socket.socketOnData (node:_http_client:502:22)
at Socket.emit (node:events:369:20)
at addChunk (node:internal/streams/readable:313:12)
at readableAddChunk (node:internal/streams/readable:288:9)
at Socket.Readable.push (node:internal/streams/readable:227:10) {
options: {
hostname: 'seattle.craigslist.org',
method: 'GET',
path: '/',
maxRedirectCount: 5,
maxRetryCount: 3,
timeout: 60000,
headers: { 'Content-Type': 'application/json', 'Content-Length': 0 },
[Symbol(context)]: URLContext {
flags: 400,
scheme: 'https:',
username: '',
password: '',
host: 'seattle.craigslist.org',
port: null,
path: [Array],
query: null,
fragment: null
},
[Symbol(query)]: URLSearchParams {}
},
state: {
data: '',
failover: { index: 0, values: [] },
redirects: [ [URL], [URL], [URL], [URL], [URL] ],
tries: 1,
headers: { location: 'https://seattle.craigslist.org/' },
statusCode: 301
}
}
the redirects:
[
URL {
href: 'https://seattle.craigslist.org/search/sss?sort=rel&query=xbox%20one',
origin: 'https://seattle.craigslist.org',
protocol: 'https:',
username: '',
password: '',
host: 'seattle.craigslist.org',
hostname: 'seattle.craigslist.org',
port: '',
pathname: '/search/sss',
search: '?sort=rel&query=xbox%20one',
searchParams: URLSearchParams { 'sort' => 'rel', 'query' => 'xbox one' },
hash: ''
},
URL {
href: 'https://seattle.craigslist.org/',
origin: 'https://seattle.craigslist.org',
protocol: 'https:',
username: '',
password: '',
host: 'seattle.craigslist.org',
hostname: 'seattle.craigslist.org',
port: '',
pathname: '/',
search: '',
searchParams: URLSearchParams {},
hash: ''
},
URL {
href: 'https://seattle.craigslist.org/',
origin: 'https://seattle.craigslist.org',
protocol: 'https:',
username: '',
password: '',
host: 'seattle.craigslist.org',
hostname: 'seattle.craigslist.org',
port: '',
pathname: '/',
search: '',
searchParams: URLSearchParams {},
hash: ''
},
URL {
href: 'https://seattle.craigslist.org/',
origin: 'https://seattle.craigslist.org',
protocol: 'https:',
username: '',
password: '',
host: 'seattle.craigslist.org',
hostname: 'seattle.craigslist.org',
port: '',
pathname: '/',
search: '',
searchParams: URLSearchParams {},
hash: ''
},
URL {
href: 'https://seattle.craigslist.org/',
origin: 'https://seattle.craigslist.org',
protocol: 'https:',
username: '',
password: '',
host: 'seattle.craigslist.org',
hostname: 'seattle.craigslist.org',
port: '',
pathname: '/',
search: '',
searchParams: URLSearchParams {},
hash: ''
}
]
Hej, a quick question is there any possibilty to extend the length of search results ?
Now its only 120 it looks like its only searching on the first page on craigslist.
I'm using your library to build out a small React app that will act as a wrapper for the craigslist client. When calling search() or list() on the client object, I get a TypeError: req.setTimeout is not a function
that originates from req.setTimeout(options.timeout, req.abort);
at Line 306 in web.js
. I am not sure what the cause of this could be as http and https are dependencies of this package (i.e. the object should be there)
I'm getting a 400 or higher status code on my call to list(options) method
node:2037) UnhandledPromiseRejectionWarning: Error: HTTP error received
at IncomingMessage.<anonymous> (/Users/j/PersonalProjects/node_modules/reqlib/dist/index.js:515:29)
at IncomingMessage.emit (events.js:203:15)
at endReadableNT (_stream_readable.js:1145:12)
at process._tickCallback (internal/process/next_tick.js:63:19)
(node:2037) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 1)
(node:2037) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.
Process finished with exit code 0
https://travis-ci.org/brozeph/node-craigslist/jobs/278743585
Error: getaddrinfo ENOTFOUND seattle.craigslist.orghttps seattle.craigslist.orghttps:443
Somehow the url property has a duplicate url inside, for example, https://atlanta.craigslist.orghttps://atlanta.craigslist.org/atl/cto/d/2006-mercedes-benz-e350-sedan/6277839728.html
.
If I have a search query in Chicago and Craigslist displays listings in nearby cities then the url is wrong. It will end up being something like:
https://chicago.craigslist.com//rockford.craigslist.com/....
Should be
Error: maximum redirect limit exceeded
{ host: 'vancouver.craigslist.org',
method: 'GET',
path: '//vancouver.craigslist.ca/vancouver.craigslist.ca/vancouver.craigslist.ca/vancouver.craigslist.ca/vancouver.craigslist.ca/search/sss?sort=rel&query=xbox',
pathname: '//vancouver.craigslist.ca/vancouver.craigslist.ca/vancouver.craigslist.ca/vancouver.craigslist.ca/vancouver.craigslist.ca/search/sss',
rawStream: false,
secure: true },
It appears to be on only some domains not all.
Can I pass request options to search()? ie: { proxy: 'http://xxxx' }
. It would be easy since you're using request
http lib to merge custom request options into your defaults.
I am getting an error while trying to search using the client
:
while using the following example code from the documentation:
craigslist = require('node-craigslist'),
client = new craigslist.Client({
city : 'seattle'
});
client
.search('xbox one')
.then((listings) => {
// play with listings here...
listings.forEach((listing) => console.log(listing));
})
.catch((err) => {
console.error(err);
}); ```
Are there any work-arounds for this?
Looks like Craigslist has changed their markup since the last time this was updated. I'm working on a PR for the changes.
Here is an example element from the new markup:
<li class="result-row" data-pid="5860176271">
<a href="/see/ctd/5860176271.html" class="result-image gallery" data-ids="1:00P0P_3IlMCFImstr,1:00Q0Q_2yT6oiXMXMm,1:00q0q_acQKZKPZrAw,1:00w0w_kLyz6zFzOtT,1:01010_9DjwOoUDpJq,1:00707_6ylrUdl3pnR,1:01414_4qnNxdSbTP1,1:00g0g_8PXG6lYQNhL,1:01717_iFWPkVwo1B7,1:00M0M_8doB1KRfskj,1:00M0M_aQFi2kHTwk4,1:00000_jbQc0aR0CfQ,1:00j0j_4HP8wRGtHfR,1:00g0g_aRJpJdVrWYW,1:00d0d_4uReTVs8kDj,1:00t0t_4IK8FnVWYJW,1:00909_kwyv7hM4uGa,1:00101_6n4ALF8ZmQx,1:00L0L_1LplADcLFqk,1:00i0i_hGWKB4DH2UN,1:01313_3Nvvxpu7u4e,1:00909_3fMOGJ7JQwD,1:00j0j_eP5r5fWHLDF,1:00000_95ycGWFZvkP"></a>
<p class="result-info">
<span class="icon icon-star" role="button">
<span class="screen-reader-text">favorite this post</span>
</span>
<time class="result-date" datetime="2016-11-03 17:40" title="Thu 03 Nov 05:40:41 PM">Nov 3</time>
<a href="/see/ctd/5860176271.html" data-id="5860176271" class="result-title hdrlnk">2006 *GMC Sierra* 1500 Denali AWD - Clean Carfax History! 2006 GMC Sie</a>
<span class="result-meta">
<span class="result-hood"> (*GMC* *Sierra*)</span>
<span class="result-tags">
pic
<span class="maptag" data-pid="5860176271">map</span>
</span>
<span class="banish icon icon-trash" role="button">
<span class="screen-reader-text">hide this posting</span>
</span>
<span class="unbanish icon icon-trash red" role="button" aria-hidden="true"></span>
<a href="#" class="restore-link">
<span class="restore-narrow-text">restore</span>
<span class="restore-wide-text">restore this posting</span>
</a>
</span>
</p>
</li>
I'm getting this error message ,from every record I pull
For example, when I receive a listing from New Orleans that was supposed to be from Mobile, it returns a URL like this:
http://mobile.craigslist.com/http//neworleans.craigslist.com
...
Is it possible to add contact info from the search api?
ie: email
from search() call: .price is $60$60
This package is awesome!
Unfortunately, all my results appear cached after the first request. I've verified this by running the same search on craigslist in the browser. In the browser I see new listings followed by listings that match up with my stale results.
I am using nocache: true
My guess is that craigslist is sending me the same results over and over.
I have copied my code into a repl and was able to pull updated results, but again after the first pull, all subsequent pulls brought in the same data.
Here is a link to that repl:
repl
Here is a link to the same search on craigslist:
CL link
Note: if you test this, you should run my repl once, see if it matches with craigslist, wait a few minutes for more results to be added to craigslist, and then refresh both the repl and craigslist. results will now be out of sync.
I was hoping sending requests through a proxy would allow me to get fresh results every time.
I noticed from this issue that it was once possible to use a proxy. proxy issue
Is it possible to get the proxy option back or is there an alternative solution to this issue?
Thank you
Looks like craigslist now uses max_price
in query params instead
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.