google / safebrowsing Goto Github PK

Safe Browsing API Go Client

License: Apache License 2.0

Go 91.58% CSS 4.86% HTML 0.54% JavaScript 1.42% Shell 1.60%

safebrowsing's Introduction

Reference Implementation for the Usage of Google Safe Browsing APIs (v4)

The safebrowsing Go package can be used with the Google Safe Browsing APIs (v4) to access the Google Safe Browsing lists of unsafe web resources. Inside the cmd sub-directory, you can find two programs: sblookup and sbserver. The sbserver program creates a proxy local server to check URLs and a URL redirector to redirect users to a warning page for unsafe URLs. The sblookup program is a command line service that can also be used to check URLs.

This README.md is a quickstart guide on how to build, deploy, and use the safebrowsing Go package. It can be used out-of-the-box. The GoDoc and API documentation provide more details on fine tuning the parameters if desired.

Setup

To use the safebrowsing Go package you must obtain an API key from the Google Developer Console. For more information, see the Get Started section of the Google Safe Browsing APIs (v4) documentation.

How to Build

To download and install from the source, run the following command:

go get github.com/google/safebrowsing

The programs below execute from your $GOPATH/bin folder. Add that to your $PATH for convenience:

export PATH=$PATH:$GOPATH/bin

Proxy Server

The sbserver server binary runs a Safe Browsing API lookup proxy that allows users to check URLs via a simple JSON API.

Once the Go environment is setup, run the following command with your API key:
```
go get github.com/google/safebrowsing/cmd/sbserver
sbserver -apikey $APIKEY
```
With the default settings this will start a local server at 127.0.0.1:8080.

The server also uses an URL redirector (listening on /r) to show an interstitial for anything marked unsafe.
If the URL is safe, the client is automatically redirected to the target. Else, an interstitial warning page is shown as recommended by Safe Browsing.
Try these URLs:

127.0.0.1:8080/r?url=http://testsafebrowsing.appspot.com/apiv4/ANY_PLATFORM/MALWARE/URL/
127.0.0.1:8080/r?url=http://testsafebrowsing.appspot.com/apiv4/ANY_PLATFORM/SOCIAL_ENGINEERING/URL/
127.0.0.1:8080/r?url=http://testsafebrowsing.appspot.com/apiv4/ANY_PLATFORM/UNWANTED_SOFTWARE/URL/
127.0.0.1:8080/r?url=http://www.google.com/

The server also has a lightweight implementation of the API v4 threatMatches endpoint.
To use the local proxy server to check a URL, send a POST request to 127.0.0.1:8080/v4/threatMatches:find with the following JSON body:

{
	"threatInfo": {
		"threatTypes":      ["UNWANTED_SOFTWARE", "MALWARE"],
		"platformTypes":    ["ANY_PLATFORM"],
		"threatEntryTypes": ["URL"],
		"threatEntries": [
			{"url": "google.com"},
			{"url": "http://testsafebrowsing.appspot.com/apiv4/ANY_PLATFORM/MALWARE/URL/"}
		]
	}
}

Refer to the Google Safe Browsing APIs (v4) for the format of the JSON request.

Command-Line Lookup

The sblookup command-line binary is another example of how the Go Safe Browsing library can be used to protect users from unsafe URLs. This command-line tool filters unsafe URLs piped via STDIN. Example usage:

$ go get github.com/google/safebrowsing/cmd/sblookup
$ echo "http://testsafebrowsing.appspot.com/apiv4/ANY_PLATFORM/MALWARE/URL/" | sblookup -apikey=$APIKEY
  Unsafe URL found:  http://testsafebrowsing.appspot.com/apiv4/ANY_PLATFORM/MALWARE/URL/ [{testsafebrowsing.appspot.com/apiv4/ANY_PLATFORM/MALWARE/URL/ {MALWARE ANY_PLATFORM URL}}]

Safe Browsing System Test

To perform an end-to-end test on the package with the Safe Browsing backend, run the following command:

go test github.com/google/safebrowsing -v -run TestSafeBrowser -apikey $APIKEY

safebrowsing's People

Contributors

Stargazers

Watchers

Forkers

mustafamoneer alphara milena7650 curtisz redlotus-corp askabelin ezoic runt18 075kg rajnmithun jahatdurhaka haoduotnt orangetell t-group cnaize noelutz peterkinalex colonelxc lsdlites cpu insanitybit cloudflare phuongvietvu0306 alexxnica 3d24rd0 jroyal nimbleo romanl-g acidburn0zzz gzhousc fperezf adamsenterprises pombredanne yoshipaulbrophy iconnor zalgo2462 devopsmi xsavikx alexmccormack primmus andrewnyaisonga leanhe99 open-ch littlehann felixcui6 intelligentia-llc un3qual ksmaheshkumar cretz vishnuvaradaraj tusharbv kthierrychristian 0xfinch teamnsrg munyola seilamuk sysujayce garigonzalez bujarto hilalisa jordan-wright jcmarchi bhanditz qroc ingmarstein zvelo dalavancloud baganda07 ericlaw1979 wajika jwest75674 nopjmp bernadene1 restaurant-wan backwardn aculler mmjanloo saintxak alexwoz grrrdog acasella1984 nanaluz b7acksec tubai1988 kizooo5 cyb3renigm4 neotim markochenn delvion jhonnyalejandrocarrillogonzalez isabella232 kprobinson suryatmodulus thelazarusnetwork maubertin tsailiting manny27nyc golangleague rabimba cyberflamego

safebrowsing's Issues

Some URLs thare are known to firefox/chrome, but are not detected by the sblookup

Example:

$  echo "https://mypaypal.accounts.loginpage.memekjandaperawan[.]com/main-index.php" | sblookup -apikey=$apikey
safebrowsing: 2018/10/20 09:58:12 database.go:111: no database file specified
safebrowsing: 2018/10/20 09:58:15 database.go:389: database is now healthy
safebrowsing: 2018/10/20 09:58:15 safebrowser.go:551: Next update in 30m17s
Safe URL: https://mypaypal.accounts.loginpage.memekjandaperawan.com/main-index.php

But

https://transparencyreport.google.com/safe-browsing/search?url=https:%2F%2Fmypaypal.accounts.loginpage.memekjandaperawan.com%2Fmain-index.php&hl=de

shows that this page is unsafe. I assume that the data sources differ. I that the expected behavour?

Can you plz clarify the URL format that needs to be sent in the query?

In the example of the API HTTP request URL entries in the query body, the URLs are represented in two different format (e.g. the second entry is with http:// and ends with /, while the first one without http://, https:// and does not end with /.

Does adding http://, https:// or ending the URL with / make difference? I tested with and without them and there seems no difference. But I wonder, how this is the case while the hashes with or without them is different. Can you plz briefly clarify?

Here is the example in your home page that I am referring to:


"threatEntries": [
			{"url": "google.com"},
			{"url": "http://testsafebrowsing.appspot.com/apiv4/ANY_PLATFORM/MALWARE/URL/"}
		]

How are Multiple Encoded URIs Handled?

Hello. The Safe Browsing documentation states repeatedly percent-unescape the URL until it has no more percent-escapes. Looking at the code, I see that you do this by searching an input string for the % character and then checking if the 2 characters that immediately follow it are hexadecimal characters, you determine that it is encoded and proceed to decode it.

My question is, how does this work if one has a URL that is possibly encoded multiple times? Take the following example:

Assume the string 26%2524 makes up a URI component. It is an encoding of the string 26%24. Tracing the code, this will decode it twice to 26$. Eventually, after you encode the entire URL again, you will only get 26%24 and that is what the hash that is used in the database and server lookup computed for.

Can this ever lead to a possible false negative because the original URL in its correctly encoded/decoded form is not checked? If so, are there any recommendations on how to handle this scenario?

Thanks in advance!

Which database is being used to persist the hashes ?

In the code i see that the hashes are being persisted to file and on startup the db is being populated from previously existing file.
What are these files ? Can they be normal text files ?
I'm not familiar with go that's why i'm facing trouble understanding the code.

Database issue ?

safebrowsing: 2017/09/27 03:09:00 database.go:114: database loaded is stale
safebrowsing: 2017/09/27 03:09:01 database.go:186: ListUpdate failure: safebrowsing: unexpected server response code: 400

safebrowsing: 2017/09/27 03:39:01 safebrowser.go:502: background threat list update
safebrowsing: 2017/09/27 03:39:01 database.go:186: ListUpdate failure: safebrowsing: unexpected server response code: 400
safebrowsing: 2017/09/27 03:39:25 safebrowser.go:365: inconsistent database: safebrowsing: unexpected server response code: 400

Also tried with fresh docker and got following errors

safebrowsing: 2017/09/27 07:48:52 database.go:116: load failure: open /root/go/sbdatabase: no such file or directory
safebrowsing: 2017/09/27 07:48:53 database.go:222: ListUpdate failure (1): safebrowsing: unexpected server response code: 400
safebrowsing: 2017/09/27 07:48:53 safebrowser.go:532: Next update in 24m58.104047896s
safebrowsing: 2017/09/27 07:50:45 safebrowser.go:400: inconsistent database: safebrowsing: unexpected server response code: 400

update failure: safebrowsing: invalid compression type

Started to receive error since few hours. In logs:

safebrowsing: 2017/07/18 12:36:48 database.go:110: no database file specified
safebrowsing: 2017/07/18 12:36:48 database.go:255: update failure: safebrowsing: invalid compression type
Starting server at localhost:8086
safebrowsing: 2017/07/18 12:36:48 safebrowser.go:532: Next update in 30m17s
safebrowsing: 2017/07/18 12:41:02 safebrowser.go:400: inconsistent database: safebrowsing: invalid compression type
safebrowsing: 2017/07/18 12:46:06 safebrowser.go:400: inconsistent database: safebrowsing: invalid compression type
safebrowsing: 2017/07/18 12:51:10 safebrowser.go:400: inconsistent database: safebrowsing: invalid compression type
safebrowsing: 2017/07/18 12:56:13 safebrowser.go:400: inconsistent database: safebrowsing: invalid compression type
safebrowsing: 2017/07/18 13:01:17 safebrowser.go:400: inconsistent database: safebrowsing: invalid compression type
safebrowsing: 2017/07/18 13:06:21 safebrowser.go:400: inconsistent database: safebrowsing: invalid compression type
safebrowsing: 2017/07/18 13:07:05 database.go:241: Server requested next update in 30m25.588s
safebrowsing: 2017/07/18 13:07:05 database.go:255: update failure: safebrowsing: invalid compression type
safebrowsing: 2017/07/18 13:07:05 safebrowser.go:532: Next update in 30m25.588s
safebrowsing: 2017/07/18 13:11:24 safebrowser.go:400: inconsistent database: safebrowsing: invalid compression type
safebrowsing: 2017/07/18 13:16:28 safebrowser.go:400: inconsistent database: safebrowsing: invalid compression type

Database Update Failure Modes & Retry Duration

Hi there!

I've been working through integrating this library into Boulder to replace our usage of the legacy v3 API for checking domain names prior to issuing a certificate. I thought it would be valuable to provide some feedback of things we've noticed so far. There are some rough edges that perhaps you have solutions for!

In the process of integration testing with a mock server I've noticed the same behaviour as @smoya in #37 - If the initial /v4/threatListUpdates:fetch call done after constructing a safebrowsing client results in error the library is unable to do any lookups until the situation is rectified (with the DefaultUpdatePeriod this is no sooner than 30 minutes later). This could also be caused by a failed update on refresh requests.

It seems like there should be a separate "knob" to twist for a retry period when an error occurs. If we change DefaultUpdatePeriod to attempt to recover more quickly we will also produce unnecessarily frequent lookups after the error is resolved.

It's also not ideal that the library isn't able to communicate when it is ready to provide lookups with a healthy database. We can work around this by performing our own polling on the client's Status function looking for a nil return value for the error return parameter but ideally there would be a way
for the integrating system to block on, or be notified of, when the database is healthy and the client is capable of doing lookups.

Thoughts? Suggestions?

Thanks for all your work on this project!

Will this library be compatible with the new Web Risks API?

Now that we need to move to the new API, will this library/server implementation still work?

https://cloud.google.com/web-risk/

Why is the API marking Chinese cdns as unsafe while online/browser lookup aren't ?

Hi guys,

Thanks for this amazing tool.

As previously mentioned in this issue, many folks are seeing discrepancies between the client and safe browsing-enabled browsers.

I specifically observed that a huge of amount of cdns/apis of Chinese services are marked as unsafe by the api client. Examples include "hlsa-akm.douyucdn.cn", "gslb.miaopai.com/", and "switch.pcfg.cache.wpscdn.cn". These sites are marked as unsafe by the API. However, the transparency report says it is SAFE.

A sample query result from the API.

{
  "matches": [
    {
      "threatType": "UNWANTED_SOFTWARE",
      "platformType": "ANDROID",
      "threat": {
        "url": "gslb.miaopai.com/"
      },
      "cacheDuration": "300s",
      "threatEntryType": "URL"
    },
    {
      "threatType": "UNWANTED_SOFTWARE",
      "platformType": "ANDROID",
      "threat": {
        "url": "switch.pcfg.cache.wpscdn.cn"
      },
      "cacheDuration": "300s",
      "threatEntryType": "URL"
    }
  ]
}

Any ideas on what's up? I am also curious why these sites are considered "UNWANTED_SOFTWARE" in "ANDROID" only. Thanks

How to make a unit test on my machine?

I mean how to run the unit test?

rate limit the API call

It would be nice to have rate limiter before making a call to API if it is in the local cache.

api.go is using DefaultClient with no timeout

https://golang.org/pkg/net/http/#Client

api.go is using DefaultClient with default timeout 0, meaning no limit. If there is something wrong with the server, the client can hang forever for waiting the server to respond.

LookupURLs first call returns bad data

Code for reproduce:

for i := 0; i < 3; i++ {
	threats, _ := sb.LookupURLs([]string{
		"http://testsafebrowsing.appspot.com/apiv4/ANY_PLATFORM/MALWARE/URL/",
		"http://testsafebrowsing.appspot.com/apiv4/ANY_PLATFORM/MALWARE/URL/",
		"http://testsafebrowsing.appspot.com/apiv4/ANY_PLATFORM/MALWARE/URL/",
		"http://testsafebrowsing.appspot.com/apiv4/ANY_PLATFORM/MALWARE/URL/",
		"https://google.com",
	})
	for _, threat := range threats {
		println(len(threat))
	}
	println("-")
}

Actual result:

curl example in sbserver comments

Minor issue with "Example usage" curl command in comments of /cmd/sbserver/main.go. The url being evaluated should be appended with [PLATFORM]/[THREAT TYPE]/[THEAT ENTRY TYPE]. Working example:

$ curl \
  -H "Content-Type: application/json" \
  -X POST -d '{
      "threatInfo": {
          "threatTypes":      ["UNWANTED_SOFTWARE", "MALWARE"],
          "platformTypes":    ["ANY_PLATFORM"],
          "threatEntryTypes": ["URL"],
          "threatEntries": [
              {"url": "http://testsafebrowsing.appspot.com/apiv4/ANY_PLATFORM/MALWARE/URL/"}
          ]
      }
  }' \
  localhost:8080/v4/threatMatches:find

invalid compression type

Since 17.6. 12:00 AM CET all requests are answered with 500 server error:

safebrowsing: 2017/07/18 13:08:16 safebrowser.go:372: inconsistent database: safebrowsing: invalid compression type

Test URL:

curl localhost:8221/r?url=heise.de

We used the Safe Browsing API Go Client in a docker container without changes and any problems since May 2017.

Use context for timeouts

In #59, a config option was added to set a default timeout for the HTTP client. However, there may be a problem: ListUpdate and HashLookup get the same timeout, but ListUpdate probably takes a lot longer in most cases and is not in the blocking path.

Instead, it would be nice to have a variant of LookupURLs that took a context, and would pass that through to HashLookup, and from there to the HTTP client so that the caller could set a timeout with more granularity.

sblookup network failure always returns "Safe URL"

If running sblookup without network connectivity, URL check will always return "Safe URL". Changing line 91 to make if statement above to if-else (and adding a 'no network' message above it) ensures failed lookup always returns "Unsafe URL", which is the more secure default.

Question about google safe browsing api?

https://developers.google.com/safe-browsing/v4/reference/rest/v4/threatLists/list does not contain any lists of the SUBRESOURCE threat type. Just wanted to confirm if it is going to be made available anytime soon?

What does Google rely on to classify the web-page?

Hi~
We can get three kinds of results from sbserver and they are MALWARE-ANY_PLATFORM-URL, UNWANTED_SOFTWARE-ANY_PLATFORM-URL, SOCIAL_ENGINEER-ANY_PLATFORM-URL.
So, how does Google server classify the web-page into these three types?

Thx

Back-off mode

According to the docs:

Clients that receive an unsuccessful HTTP response (that is, any HTTP status code other than 200 OK) must enter back-off mode. Once in back-off mode, clients must wait the computed time duration before they can issue another request to the server.

Let's suppose we receive a 500 error from the Update API:

api.go returns the error on ListUpdate method.
database.go sets that error on the database doing db.setError in the Update method.

It should then enter in back-off mode but it seems is not (at least i can not find it in the code).

Apart from that, the application remains useless until the next Update (30 mins), which process could also return another 500 error and so on. There is any solution to this?

Thanks

Persistence Database

Sorry buy I have problemas setting a database file.

What is the mean ?

"DBPath is a path to a persistent database file. If empty, SafeBrowser operates in a non-persistent manner. This means that blacklist results will not be cached beyond the lifetime of the SafeBrowser object."

Execution line:

./sbserver -apikey xxxxxxxxxxxxxxxxxxxxxx -db [?]

What is the correct parameter [?] expected by sbserver?

Http Proxy Support

Hi,
could you please add http-proxy support into the sbserver?
I'm going to deploy the sbserver in a server that can access internet only through a http-proxy.
Thank you!

project scope and other questions

Thanks for the hard work on this project.

I wanted to know the scope of this project. Specifically, with regards to support for Safe Browsing API.

Does it support all features that Safe Browsing API provides? Also, are database updates and any maintenance etc taken care of by sbserver automatically? In other words, is there anything that I must take care of in application/client side myself?

I was not sure what is meant by "The server also has a lightweight implementation of the API v4 threatMatches endpoint."

Also, how do I configure the server and where are local database files stored. I guess, more details in documentation might be helpful for new users.

Persistence Database

Sorry buy I have problemas setting a database file.

What is the mean ?

Execution line:

./sbserver -apikey xxxxxxxxxxxxxxxxxxxxxx -db [?]

What is the correct parameter [?] expected by sbserver?

How to compute prefix hashes?

Hi,

I'm interested in how the prefix hashes are computed.
The API docs says "the client needs to compute the hash prefix for each full-length SHA256 hash".
I've been reading this Go code for a while now but can't figure out how you compute the hash prefixes.

Any help would be appreciated!

Thanks!

URLs without trailing slash are not detected as unsafe

This was detected as we were testing a Java implementation of the Update API (v4). The Go client has the same issue

~$ echo "http://malware.testing.google.test/testing/malware" | sblookup -apikey=$GOOGLEAPIKEY
safebrowsing: 2016/09/29 23:32:01 database.go:100: no database file specified
safebrowsing: 2016/09/29 23:32:06 database.go:291: database is now healthy
Safe URL: http://malware.testing.google.test/testing/malware
~$ echo "http://malware.testing.google.test/testing/malware/" | sblookup -apikey=$GOOGLEAPIKEY
safebrowsing: 2016/09/29 23:32:15 database.go:100: no database file specified
safebrowsing: 2016/09/29 23:32:20 database.go:291: database is now healthy
Unsafe URL: [{malware.testing.google.test/testing/malware/ {MALWARE ANY_PLATFORM URL}}]

Yet "Safe Browsing Site Status" status page shows both https://www.google.com/transparencyreport/safebrowsing/diagnostic/#url=http://malware.testing.google.test/testing/malware

as Dangerous.

panic in go arm

I am trying to run the sblookup on arm.

echo "http://testsafebrowsing.appspot.com/apiv4/ANY_PLATFORM/MALWARE/URL/" | ./sblookup -apikey=<redacted>
safebrowsing: 2017/04/28 18:27:49 database.go:106: no database file specified
safebrowsing: 2017/04/28 18:28:05 database.go:336: database is now healthy
safebrowsing: 2017/04/28 18:28:05 safebrowser.go:504: Next update in 30m17s
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x4 pc=0x1265c]

goroutine 1 [running]:
sync/atomic.addUint64(0x1077cddc, 0x1, 0x0, 0x0, 0x0)
	/usr/local/go/src/sync/atomic/64bit_arm.go:31 +0x4c
github.com/google/safebrowsing.(*SafeBrowser).LookupURLs(0x1077cd80, 0x10745e78, 0x1, 0x1, 0x108ba4b0, 0x43, 0x0, 0x0, 0x0)
	/home/pi/go/src/github.com/google/safebrowsing/safebrowser.go:407 +0x3c8
main.main()
	/home/pi/go/src/github.com/google/safebrowsing/cmd/sblookup/main.go:95 +0x22c

Looks like an issue. Am I right? Is this not supported on arm?

Stop database update

Hi,
Is there any way I can prevent the sbserver from updating the local threat list database?
I am trying to use the safebrowsing Update API to label a dataset of web browsing history. The dataset is fixed (there will be no new entries), therefor I don't need an updated threat list.
Thanks

503 error, API outage?

Hey, we started receiving a bunch of 503 responses this afternoon, is the API experiencing any outages?

thanks!

It says in your status codes that 503 means the service is unavailable, is this potentially something on my end? Or are we experiencing issues that are expected?

Request frequency specifications not enforced

According to the documentation:

Both the fullHashes.find response and threatListUpdates.fetch response have a minimumWaitDuration field that clients must obey.

And

Automatic back-off applies to both the fullHashes.find response and threatListUpdates.fetch response.

This issue was first raised in #33, and marked as resolved in #49, however this only addresses the database update, and not fullHashes.find.

On a different note, enforcing minimumWaitDuration seems reasonable for both methods, but would back-off mode make sense for fullHashes.find? eg. do we really want to prevent any further hash confirmation for at least 15 minutes for a single 4xx or 5xx response?

All lookups fail in ~1s between database expiring & updating.

Hi again folks 👋

First-off: Thanks again for the great library! We've been using it in production for Let's Encrypt since the end of July and it has suited our needs perfectly! Kudos to the team working on this and the folks behind the scenes :-) We really appreciate your work!

I've noticed in the past few days that there appears to be a small window between when *database.Status() marks the DB stale and when the database is updated in which all lookups will fail due to the DB being marked stale with messages like:

inconsistent database: safebrowsing: threat list is stale

Here is one concrete example from our logs:

2017-08-13T21:26:38.571064+00:00 safebrowsing: 2017/08/13 21:26:38 safebrowser.go:537: background threat list updated
2017-08-13T21:26:38.571408+00:00 safebrowsing: 2017/08/13 21:26:38 safebrowser.go:532: Next update in 30m29.012s
2017-08-13T21:57:09.490334+00:00 safebrowsing: 2017/08/13 21:57:09 safebrowser.go:400: inconsistent database: safebrowsing: threat list is stale
<snipped a bunch of occurences>
2017-08-13T21:57:09.797071+00:00 safebrowsing: 2017/08/13 21:57:09 safebrowser.go:400: inconsistent database: safebrowsing: threat list is stale
2017-08-13T21:57:10.283417+00:00 safebrowsing: 2017/08/13 21:57:10 safebrowser.go:537: background threat list updated

In one case we saw ~422 lookups fail within the ~1 second window between the database being marked stale and updating. Eek!

The point at which the database is next scheduled to be updated after a successful update is computed in *database.Update() by adding db.configUpdatePeriod to time.Duration(rand.Int31n(60)-30)*time.Second. It looks like *database.Status() decides if the DB is stale by comparing now().Sub(db.last) with (db.config.UpdatePeriod + jitter). In this case jitter is always a fixed 30 * time.Second.

It seems like this strategy doesn't offer enough grace time to account for having to update the database. Especially if we imagine that the initial database update request may fail and a backoff would occur.

Is it possible to schedule the database update ahead of the point at which it will expire by a larger grace period (e.g. 5m?). I believe this would minimize the chances that lookups would fail due to an out-of-date database that is microseconds away from being updated. I'm open to other work arounds as well!

Thanks again,

Guide for developing & running locally

New to go and would like to get this running locally but having some issues. Once I downloaded all dependencies and tried go run main.go in the cmd/sbserver directory I'm getting the following error:

main.go:202:2: use of internal package not allowed

Some guidance for how to get this running would be greatly appreciated.

How can I trigger a fake Malware API response from my test suites?

Is there currently no way to dummy-trigger a harmful-URL response? So I can build Safe Browsing into my test suites?

I have an E2E tests suite, and I'd like to have the E2E test browser type a dummy "bad" URL, which my server sends to Safe Browsing API, and the API replies that the URL is harmful. So I can test my code that deals with harmful URL responses.

For example, Safe Browsing could return "It's harmful & dangerous! For Windows! Malware!" for all URLs like https://safebrowsing-malware-windows.example.com.

The Google Docs (https://developers.google.com/safe-browsing/v4/) doesn't mention any way to do this (that I can find). And in this GitHub repo I also didn't find any way to do that, or GitHub issue about that. This makes me surprised, because, reasonably, everyone who uses this API should want to make such dummy requsets from their test suites?

(I think there was, with API v3, but I cannot find those docs any longer)

Non-persistent mode expects database to be present

Hi,

I am trying to use this configuration to create a client

mykey := os.Getenv("SB_KEY")
conf := &sb.Config{
		APIKey:    mykey,
		Logger:    os.Stdout,
		ThreatLists: []sb.ThreatDescriptor{
			sb.ThreatDescriptor{
				ThreatType:   sb.ThreatType_Malware,
				PlatformType: sb.PlatformType_AllPlatforms,
			},
		},
	}
client, err := sb.NewSafeBrowser(*conf)
...
threats, err := client.LookupURLs([]string{url})
...

When running tests I get this error

safebrowsing: 2017/01/04 14:53:06 database.go:186: ListUpdate failure: safebrowsing: unexpected server response code: 404
safebrowsing: 2017/01/04 14:53:06 safebrowser.go:365: inconsistent database: safebrowsing: unexpected server response code: 404

And looking at the source it seems that it will try to ListUpdate() every request.
What is the right way of running the lookup like documented here?

Thank You

Inconsistency when repeating the same query

I've been checking a set of 10 k of URLs multiple times.
When I post whole set to /v4/threatMatches:find at once, sbserver returned safebrowsing: unexpected server response code: 400 in most of cases.
However, if I then asked only for the first 1 k of the set, it worked, then the first 2 k of the set, it worked... then whole set at once it worked.

To handle this, I chunked the whole set to subsets by 100 items – that mostly worked. Nevertheless, sometimes few items (about 150) are wrongly indicated as being safe. It seems these are always the same URLs (at least hXXp://www.ww.kuplon.cz/darkovy-poukaz was always one of the erroneously skipped URLs). Since I couln't identify what is the élément déclencheur, what is the reason this small subset is sometimes skipped, I'll be rather experimenting with https://github.com/afilipovich/gglsbl .

I post it here as well for future reference but since I don't have any additional information nor evidence I think you may close the issue as a won't fix.

Support go1.6

After receiving the #63 (compression type) error, I tried to do a fresh

go get github.com/google/safebrowsing/cmd/sblookup

whcih results in an

package context: unrecognized import path "context" (import path does not begin with hostname)

sbserver memory usage question

Is it normal for sbserver to use so much memory? On startup it eats away 455MiB

newton:~ bgv$ sbserver -db ~/tmp/sbserver.db -apikey XXX
safebrowsing: 2016/06/08 13:16:09 database.go:96: load failure: open /Users/bgv/tmp/sbserver.db: no such file or directory
safebrowsing: 2016/06/08 13:16:13 database.go:259: database is now healthy
Starting server at localhost:8080

newton:~ bgv$ ps -o pcpu=,rss=,vsz= -p 5307
  0.0 465968 573526448

Few db updates later it's already allocated whooping 1.4GiB

newton:~ bgv$ ps -o pcpu=,rss=,vsz= -p 5307
  0.0 1534480 573526448

project question / adding third-party threats

Hello, Thank you very much for putting this project together!
Is it possible to add and remove your own blocked URLs using the local hash db?
I want to also keep getting the updates from SB-API but add in our own URLs to block.

update API doesn't check minimumWaitDuration

https://developers.google.com/safe-browsing/v4/request-frequency#minimum-wait-duration
According to the spec,

Both the fullHashes.find response and threatListUpdates.fetch response have a minimumWaitDuration field that clients must obey.

If a lot of clients are sending requests and the server has to call excessive api.HashLookup(), will the server result in receiving a response with minimumWaitDuration?

I think the server should check if the response contain minimumWaitDuration.

will sending excessive "fullHashes.find" cause receiving minimumWaitDuration in response?

can the system alert for adult sites?

Expose URL Parsing to Library Users

Hello!

As part of a user-safety project, I'm integrating google/safebrowsing into my employer's chat product as a means of filtering out harmful links. One difficulty we've run into is that our link parsing doesn't exactly line up with the library's. Specifically, we are using https://github.com/mvdan/xurls to find links in a chat message, and we then pass all of those links into the SafeBrowser.LookupURLS method. This has resulted in some infrequent, but regular edge-cases where the link parsing regex provided by xurlshas a more expansive definition of a link than this library, leading to a safebrowser : invalid path errors.

I think the best, and easiest remediation for this would be if this library exposed a simple IsURL function, or provided an ExtractUrls method which would pull out all of the URLs in a string. That way, we could either filter out those URL-like strings which would cause an error, or just use this library's idea of what a URL is throughout our codebase.

Regards,
Ore Babarinsa.

only ThreatEntry.Url may be set

I receive this "only ThreatEntry.Url may be set" when I try to scan a large set of urls, like 20,000. There is nothing wrong with the format of the json file I post. If I limit the number of urls to about 1,100ish, everything works fine. It's like it doesn't recognize the set as being a list URLs but maybe url hashes or something.

safebrowsing api query result is not consistent with chrome browser?

When I use safebrowsing api to query the url "http://0.99730.von1.com/", query result is safe url, but in chrome, query result is unsafe.

I use sblookup cmd to check the url:

echo "http://0.99730.von1.com/" | sblookup -apikey=$MY_API_KEY
safebrowsing: 2017/06/02 18:15:10 database.go:106: no database file specified
safebrowsing: 2017/06/02 18:15:11 database.go:229: Server requested next update in 30m25.5s
safebrowsing: 2017/06/02 18:15:11 database.go:336: database is now healthy
Safe URL: http://0.99730.von1.com/

client should make multiple requests when >500 hash prefixes are queried

The Safe Browsing API has a limit of 500 hash-prefixes per request (see update api). The server returns an error when that limit is exceeded (#87).

This client should batch the requests and reassemble.

Cripto

Questions about Google safebrowsing in China

I am using safebrowsing in a project. However, the server hosting the database and the safebrowsing lists are not accessible in China due to governmont regulation. I have tried to access the server via public proxy servers, but they are blocked with a HTTP 403 response.
Do you know how can I get the database, safebrowsing lists and update the database periodly in chinese internet environment?

sbserver differs from online/browser lookup?

I have noticed that the sbserver returns an empty response for some urls while Chrome browser and online lookup tool ( https://www.google.com/transparencyreport/safebrowsing/diagnostic/ ) does return a correct danger response. I have looked and the server is updating its list. Anyone know what is happening?

A sample url for which this happens.

http://www.precision-mouldings.com/.ls/.https:/.www.paypal.co.uk/uk.web.apps.mpp.home.sign.in.country.a.GB.locale.a.en.GB-6546refhs8ehgf8-890b7fefut9546954543ds867hgf9-1egey3ds4820435t546ggc-u4ydstgu5438gjksssGB/plmgeo.php

API outage? inconsistent database: unexpected server 503

Hi folks 👋,

Between 13:07:00 pm UTC and 13:34:00 UTC today we began seeing 503 errors returned by the Safebrowsing v4 API, causing traces from the database Status and a cached err:

safebrowser.go:400: inconsistent database: safebrowsing: unexpected server response code: 503

A couple questions:

Is there a status page specifically for the Google Safe Browsing API? I wasn't able to find one with some quick searching. Maybe there's a way to find this status in the Google developers panel somewhere? Pointers appreciated!
Can you confirm that this was a temporary API event? Should we expect that the root cause has since been resolved?

Thanks!

Review requests for GitHub repositories

Apologies that this is probably not the correct place to ask, but I couldn't find another contact point.

When a website gets flagged by Google SafeBrowsing as containing malware, normally the site owner can clean up their site and submit a request for a recheck, or just wait for a periodic rescan of their site. My understanding is that the requests are facilitated through a Google Search Console account. However, in the case of a GitHub repository being incorrectly marked as containing malicious files, there is no way for that project owner to sign up for a Google Search Console account for just that project (it must point at a domain or subdomain).

I'm a project contributor to al-Khaser, which is a tool designed to be used by malware analysts in order to see how susceptible their virtualised or sandboxed analysis environments are to detection. Another use of the tool is for testing the efficacy of anti-virus heuristics that look for such VM/debug detection tricks; put simply, the point is for our tool to be detected as malicious even though it is not.

For a while we offered binary releases, but Google SafeBrowsing has picked up these binaries and (quite understandably) marked the repository as potentially malicious. A few months ago we removed the release binaries from the repository in the hope that this would resolve the issue, but the warning remains. I suspect that this is because the release binaries are still available in the commit history. Understandably we cannot remove them at this point.

Is there someone we can contact in order to whitelist the repository?

How to query a historical snapshot?

Hi, sorry if this question outside the issues scope but I will appreciate your help as I could not find answers elesewhere. I used this project to run the Google safe browsing DB and make queries locally using Ubunutu. My terminal shows me lines of updates such as:


 safebrowsing: 2019/06/28 12:01:09 database.go:111: no database file specified
safebrowsing: 2019/06/28 12:01:10 database.go:243: Server requested next update in 30m24.304s
safebrowsing: 2019/06/28 12:01:11 database.go:389: database is now healthy
safebrowsing: 2019/06/28 12:01:11 safebrowser.go:557: Next update in 30m24.304s
Starting server at localhost:8080

I assume that the DB automatically receive updates. Plz correct me if I'm wrong and if I need to do anything to keep it up to date.

The second issue, if I use this project, can I get a snapshot of older specific date? e.g. the DB two months ago? Plz let me know if there is anyway to query the DB on old date.