patneedham / google-it Goto Github PK
View Code? Open in Web Editor NEWcommand line Google search and save to JSON
command line Google search and save to JSON
Searching for 'amazon hq2' shows a 'Top stories' section at the top, which should be included in the googleIt
results
${search term} on Twitter
also appears underneath, but with a title and link like other search results. That indicates the 'snippet' property of googleIt
results might need an array of results (each with its own title, link, and snippet), instead of always being plain text.
Details
Version: 1.2.0
NPM version: 6.10.3
Node version: 12.8.0
Bug Info
As the title says, when I search for something using google-it, sometimes I get links, but without titles in response object.
[
{
title: undefined,
link: <some link>
}
]
Why this happens?
Hi,
I'm using google-it in my project. I'm searching lots of query in a small time. After some time i got this error:
Our systems have detected unusual traffic from your computer network. This page checks to see if it's really you sending the requests, and not a robot. Why di d this happen?
How could i pass it ?
Thanks.
Hi, thanks for a great tool!
I found the search result doesn't include google ads. For me it is an important thing. Is it possible to add such an option?
In my use case, I would like to be able to reuse the results of an existing request to Google to feed into google-it for parsing. (My tool would be more efficient if it could use the HTML response from Google that has already been received for parsing.) Thanks!
//this code it does not work at: https://npm.runkit.com/google
var googleIt = require("google-it");
googleIt({'query': 'coffee'}).then(results => {
console.log(results); //results(always) = "[]";
}).catch(e => {
console.log(e); //no error is ever caught
});
There is a way to do pagination with this model?
Is there a way to disable console when requesting...
For an example
const google = require('google-it');
google({ query: query }); // this should return The title. The link and The description
// I was wondering if there's a way to disable that like...
google({ query: query, disable-console: true });
Hi,
Thanks for creating such an useful script. It save me a lot of time using the urls only feature.
Google search may block an IP or put robot captcha if it is used more frequently for grabbing thousands of urls, so a proxy support will be very useful. Something like this:
$ google-it --query="keywords" --only-urls --limit=100 --proxy=proxies.txt --proxytype=socks4 (this is more preferred)
if possible to add a txt file for queries, for example:
$ google-it --query=list.txt --only-urls --limit=100 --proxy=proxies.txt --proxytype=https
$ google-it --query=list.txt --only-urls --limit=1000 --proxy=proxies.txt --proxytype=socks4 -o result.txt
It will be greatly appreciated if you could add these features.
Best Regards
can't change language
Version of google-it
Versions of npm and node
yarn 1.22.5
node v12.16.0
Describe the bug
The returned output has a ,,
in place of the search criteria used, for instance "handling assets". If either of these words are found, or simularities (like asset
), then double ,,
is seen in place (this is happening because of undefined from the mapping, when there is further children)
Expected behavior
To have the actual data
Screenshots
I am using google-it
for a Discord bot to query our docs site for convenience of users
Notice in the image, there is text ,,
which is where the match for the search criteria would be.
Additional context
Diving into the code, I see in the getSnippet(elem)
method an issue.
function getSnippet(elem) {
return elem.children.map(function (child) {
if (!child.data) {
return child.children.map(function (c) {
return c.data;
});
}
return child.data;
}).join('');
}
The return c.data;
is undefined
where this matching data should be. But, the c
var has children (c.children
) that when expanded, c.children[0].data
does have the matching searched criteria. I suppose Google has done it this way so they can highlight the searched criteria, but google-it is failing to pick it up.
As you can see by this screenshot:
Version of google-it
-- [email protected]
Versions of npm and node
6.14.13
v14.17.0
Describe the bug
Cannot use it programmatically. Require command throws errors.
Welcome to Node.js v14.17.0.
Type ".help" for more information.
> const g = require('google-it')
Uncaught:
Error: UNKNOWN: unknown error, lstat 'C:\test\node_modules\htmlparser2\lib'
at Object.realpathSync (fs.js:1796:7)
at toRealPath (internal/modules/cjs/loader.js:349:13)
at tryFile (internal/modules/cjs/loader.js:345:10)
at tryPackage (internal/modules/cjs/loader.js:301:16)
at Function.Module._findPath (internal/modules/cjs/loader.js:521:18)
at Function.Module._resolveFilename (internal/modules/cjs/loader.js:872:27)
at Function.Module._load (internal/modules/cjs/loader.js:730:27)
at Module.require (internal/modules/cjs/loader.js:957:19)
at require (internal/modules/cjs/helpers.js:88:18) {
errno: -4094,
syscall: 'lstat',
code: 'UNKNOWN',
path: 'C:\\test\\node_modules\\htmlparser2\\lib'
}
> console.log(g)
Uncaught ReferenceError: g is not defined
Expected behavior
Require should instantiate g
withour errors.
Version of google-it
If you have it installed globally, please run npm list google-it -g
in your command prompt of choice, and that should result in an output that looks like this:
If you don't have it installed globally, simply remove the -g
flag from that command.
Versions of npm and node
Please run npm --version
and node --version
and paste in the results.
npm --version
9.5.0
node -v
v19.7.0
Describe the bug
A clear and concise description of what the bug is. At a minimum, please include you how used the google-it library (either on command line or within a node program).
Not working on ubuntu (I can access google using the same IP address )
Expected behavior
A clear and concise description of what you expected to happen.
Screenshots
If applicable, add screenshots to help explain your problem.
Additional context
Add any other context about the problem here.
ubuntu@mregyptian ~> google-it --query="Latvian unicorn"
⠙ Loading results
Error: Error in response: statusCode 429. To see the raw response object, please include the 'diagnostics: true' as part of the options object (or -d if using command line)
at /usr/local/lib/node_modules/google-it/lib/googleIt.js:267:16
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
⠸ Loading results⏎ ubuntu@mregyptian ~ [SIGINT]> google-it --query="Latvian unicorn" -d
⠹ Loading results
ubuntu@mregyptian ~> google-it -d --query="Latvian unicorn"
⠼ Loading results
ubuntu@mregyptian ~>
ubuntu@mregyptian ~> google-it -d --query="Latvian unicorn"
⠹ Loading results
ubuntu@mregyptian ~> google-it --query="Latvian unicorn"
⠸ Loading results
Error: Error in response: statusCode 429. To see the raw response object, please include the 'diagnostics: true' as part of the options object (or -d if using command line)
at /usr/local/lib/node_modules/google-it/lib/googleIt.js:267:16
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
⠙ Loading results⏎
according to this comment #65 (comment) he used "gl" parameter but i tried and it's not working or I'm doing something wrong?
googleIt({'query': 'covfefe irony', 'gl':'fr'}).then(results => {
console.log(results)
}).catch(e => {
console.log(e)
})```
Version of google-it
Versions of npm and node
Node: v14.15.3
Yarn: 1.22.4
Describe the bug
I have tested this on Mac and windows.
google-it --query="Obama"
Hi,
Looks like google-it has stoped working in the last 2 days. Tried on a number of devices and internet connections and no results are being returned but no error messages are being thrown.
Steps to reproduce:
Thanks,
C
Version of google-it
1.4.2
Versions of npm and node
npm: 6.13.4
node: 11.6.0
Describe the bug
Use: As an imported package (i.e. not command line)
The main function googleIt does not throw errors when the response from google has status code !== 200. Instead, it returns an empty array. This is because the function googleIt uses the function getResponseBody, but getResponseBody does not throw an error when this occurs. I believe this is due to the way the "request" library works, in that the request library does not set the error parameter in the callback.
Expected behavior
To be able to catch errors where status code !== 200, it would be better to check the status code and reject accordingly. For example, in the callback of the request call of getResponseBody:
if (response.statusCode !== 200) {
reject(response);
}
This way, with a response code like 429 "Too Many Requests", receiving the response in the catch handler would allow for logging and reporting.
I am not sure what effects this change would have on other code in the library and in dependants. It could be a breaking change.
Screenshots
N/A
Additional context
N/A
🚨 You need to enable Continuous Integration on Greenkeeper branches of this repository. 🚨
To enable Greenkeeper, you need to make sure that a commit status is reported on all branches. This is required by Greenkeeper because it uses your CI build statuses to figure out when to notify you about breaking changes.
Since we didn’t receive a CI status on the greenkeeper/initial
branch, it’s possible that you don’t have CI set up yet.
We recommend using:
If you have already set up a CI for this repository, you might need to check how it’s configured. Make sure it is set to run on all new branches. If you don’t want it to run on absolutely every branch, you can whitelist branches starting with greenkeeper/
.
Once you have installed and configured CI on this repository correctly, you’ll need to re-trigger Greenkeeper’s initial pull request. To do this, please click the 'fix repo' button on account.greenkeeper.io.
11.1.0
to 11.2.0
.This version is covered by your current version range and after updating it in your project the build failed.
eslint-config-amex is a devDependency of this project. It might not break your production code or affect downstream projects, but probably breaks your build or test tools, which may prevent deploying or publishing.
The new version differs by 10 commits.
662aad7
chore(release): 11.2.0 [skip ci]
89f41ec
chore(release): enable semantic-release
a3afc72
feat(rules): no-extra-parens (#9)
68e6d48
feat(index): disable jsx-one-expression-per-line (#8)
4dbe9cf
chore(lockfile-lint): add lockfile-lint script
58322cf
chore(npm): add package lock file (#5)
0f45501
Merge pull request #4 from americanexpress/update-readme-image
359ccf0
docs(readme): center image
9cc0d24
docs(readme): update layout and security file (#3)
63d2840
chore(commitlint): add commitlint and husky (#2)
See the full diff
There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.
Your Greenkeeper Bot 🌴
Currently max query result is 10. Anyway to increase that to 1000+ for the same query?
Hello, only 6 results get saved to the json, how can I make it save all of them? When I search it manually there are about 3,500 results.
I am suffering from a problem where using google-it fails about 50% of the time.
After investigation, I assume the following results are the root cause.
Look at the following HTML pattern
success:
https://github.com/ryudenx/google-it_fail/blob/master/takeda_success.html
fail:
https://github.com/ryudenx/google-it_fail/blob/master/takeda_fail.html
These have different HTML format/design.
Despite the same URL, same User-Agent, same IP, Google seems to return different HTML.
If possible, could you please fix this problem?
Or could you support the failing HTML format?
Reproduced code:
const { setTimeout } = require('timers/promises');
const googleIt = require('google-it')
async function main() {
while (true) {
googleIt({ 'query': "takeda", 'no-display': true, 'limit': 10 }).then(results => {
//check
if (results.length > 1) {
console.log("success")
} else {
console.log("fail")
}
}).catch(e => {
console.log(e)
})
await setTimeout(5000);
}
}
main()
log (example):
$ node ./test.js
success
success
success
fail
success
fail
fail
Thank you for your cooperation.
Hello,
Is this possible to use additional search parameters, for example &tbs=qdr:d?
latest version.
-h saves the correct html file, but no visible results.
probably a selector issue? US google.
Version of google-it
1.6.2
Versions of npm and node
Please run npm --version
and node --version
and paste in the results.
% npm --version
7.10.0
% node --version
v16.0.0
Describe the bug
The result's titles/links are out of sync. with this query:
% ./node_modules/.bin/google-it --query="spotify music vs apple"
⠙ Loading results
Apple Music vs. Spotify: The best music streaming service for you ...
https://www.soundguys.com/apple-music-vs-spotify-36833/?sa=X&ved=2ahUKEwi19PKWrqTwAhXQFIgKHZBFCQUQ9QF6BAgIEAI
While ,Apple Music, may have more content, Spotify's music, catalog is still extensive with over 50 million ,songs,, with around 40,000 being added everyday. Unlike ,Apple Music,, ,Spotify, also offers podcasts, with over 700,000 currently on the platform.Mar. 23, 2021
Apple Music vs Spotify (2021) - YouTube
https://www.cnet.com/news/apple-music-versus-spotify-best-music-podcasts-streaming-service-price-catalog-features-plans-compared/
Aug. 3, 2020, · ,Spotify says it has a catalog of over 50 million songs while Apple Music tops 60 million. Both offer early access to certain albums from time to time ...
Apple Music vs. Spotify | Digital Trends
https://www.youtube.com/watch?v=l4QZej2Ae4A
Mar. 27, 2021, · ,The battle of the kings of music streaming services is here. Today we will be comparing Apple ...Duration: ,14:47,,Posted: ,Mar. 27, 2021
As you can see, the second result is from YouTube but the link is not.
Expected behavior
The titles and the links should be in sync.
Screenshots
If applicable, add screenshots to help explain your problem.
Additional context
Add any other context about the problem here.
google-it version: 1.6.2
node version: 14.15.3
npm version: 6.14.9
Describe the bug
Result title is always undefined
Expected behavior
Should return the correct result title
Screenshot covers all the steps from installation and then running the query.
If you need more information, please let me know.
Version of google-it
1.6.2
Versions of npm and node
npm 7.6.3
node v12.4.0
Describe the bug
Attempting to use options in code, as follows:
const googleIt = require('google-it')
const options = {
'limit': 3
};
googleIt({options, 'query': 'covfefe irony'}).then(results => {
// access to results object here
}).catch(e => {
// any possible errors that might have occurred (like no Internet connection)
})
Expected behavior
I expect only the first three results to be returned and displayed in the console. Instead the default 10 appears.
Additional context
I have also tried other options such as 'no-display': true
and those listed in optionsDefinitions.js, to no avail (e.g. 'no-display': true
continues to display the results in the console). Oddly, the only option that seems to work is 'proxy:'
(as per the readme example), and that is not one of the options listed in optionsDefinitions.js.
I have found, however, that putting the options directly in the command path does work, e.g.:
googleIt({'limit': 3, 'no-display': true, 'query': 'covfefe irony'}).then(results => {
// access to results object here
}).catch(e => {
// any possible errors that might have occurred (like no Internet connection)
})
The code behaves as expected when written in this form. My guess is that it has something to do with how the options are unpacked inside the googleIt
function.
Hello,
Is this possible to change language results?. Like force to use a language
Google made an update which makes all the selectors not work now by randomizing the div class names, is there any full proof solution to getting results? Because also on different versions of Google like russia's mobile site they use a different structure for results so it won't parse correct results consistently
You can see here every div class g is randomized by google to prevent scraping
Version of google-it
@1.6.3
Versions of npm and node
npm: 8.11.0 || node: v17.9.1
Describe the bug
results titles always return undefined for me
I would like to be able to programmatically access search stats via google-it (basically retrieving and interpreting the "Page 4 of about 37,700,000 results (0.67 seconds)" text), especially the result count and search time. The div id is "resultStats". Thanks!
I was using this for quite a while, but it suddenly stopped giving me an array when I would call the results please fix
Version of google-it
[email protected]
Versions of npm and node
npm: 6.11.3
node: 8.11.3
Describe the bug
When I run the command
google-it --query="test"
it only returns undefined.
Expected behavior
google searches of test
Version of google-it
1.6.1
Versions of npm and node
npm - 6.14.6
node - v12.18.4
Describe the bug
I was using this module in a NodeJS project (discord.js specifically) and I faced this error:
Error in response: statusCode 429. To see the raw response object, please include the 'diagnostics: true' as part of the options object (or -d if using command line)
Expected behavior
It should have given me results according to the query specified but it didn't.
Additional context
I also enabled the "diagnostics" to see the raw data but it didn't gave me anything. Hope this info is alright for the info.
Hi !!
I use this library these days and love it ❤️
First of all, let me tell you grateful thank you creating this library 👍
When I use this library, sometime I want to ignore some domains which show not useful information.
I think this demand seems like the desire of stackoverflow-github-only
option.
The thing we have to consider is where we save the data of ignore-list.
I'm thinking ~/.config
is pretty nice but I don't know it's the best.
May I ask what you think about that? 😉
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.