Git Product home page Git Product logo

ngx_http_google_filter_module's Introduction

Nginx Module for Google

Build Status Gitter

Description

ngx_http_google_filter_module is a filter module which makes google mirror much easier to deploy.
Regular expressions, uri locations and other complex configurations have been built-in already.
The native nginx module ensure the efficiency of handling cookies, gstatic scoures and redirections.
Let's see how easy it is to setup a google mirror.

location / {
  google on;
}

What? Are you kidding me?
Yes, it's just that simple!

Demo Site

Dependency

  1. pcre regular expression support
  2. ngx_http_proxy_module backend proxy support
  3. ngx_http_substitutions_filter_module mutiple substitutions support

Installation

Download sources first
#
# download the newest source
# @see http://nginx.org/en/download.html
#
wget http://nginx.org/download/nginx-1.7.8.tar.gz

#
# clone ngx_http_google_filter_module
# @see https://github.com/cuber/ngx_http_google_filter_module
#
git clone https://github.com/cuber/ngx_http_google_filter_module

#
# clone ngx_http_substitutions_filter_module
# @see https://github.com/yaoweibin/ngx_http_substitutions_filter_module
#
git clone https://github.com/yaoweibin/ngx_http_substitutions_filter_module
Brand new installation
#
# configure nginx customly
# replace </path/to/> with your real path
#
./configure \
  <your configuration> \
  --add-module=</path/to/>ngx_http_google_filter_module \
  --add-module=</path/to/>ngx_http_substitutions_filter_module
Migrate from existed distribution
#
# get the configuration of existed nginx
# replace </path/to/> with your real path
#
</path/to/>nginx -V
> nginx version: nginx/ <version>
> built by gcc 4.x.x
> configure arguments: <configuration>

#
# download the same version of nginx source
# @see http://nginx.org/en/download.html
# replace <version> with your nginx version
#
wget http://nginx.org/download/nginx-<version>.tar.gz
  
#
# configure nginx
# replace <configuration> with your nginx configuration
# replace </path/to/> with your real path
#
./configure \
  <configuration> \
  --add-module=</path/to/>ngx_http_google_filter_module \
  --add-module=</path/to/>ngx_http_substitutions_filter_module
#
# if some libraries were missing, you should install them with the package manager
#   eg. apt-get, pacman, yum ...
#

Usage

Basic Configuration

resolver is needed to resolve domains.

server {
  # ... part of server configuration
  resolver 8.8.8.8;
  location / {
    google on;
  }
  # ...
}
Google Scholar

google_scholar depends on google, so google_scholar cannot be used independently.
Nowadays google scholar has migrate from http to https, and ncr is supported, so the tld of google scholar is no more needed.

location / {
  google on;
  google_scholar on;
}
Google Language

The default language can be set through google_language, if it is not setup, zh-CN will be the default language.

location / {
  google on;
  google_scholar on;
  # set language to German
  google_language de; 
}

Supported languages are listed below.

ar    -> Arabic
bg    -> Bulgarian
ca    -> Catalan
zh-CN -> Chinese (Simplified)
zh-TW -> Chinese (Traditional)
hr    -> Croatian
cs    -> Czech
da    -> Danish
nl    -> Dutch
en    -> English
tl    -> Filipino
fi    -> Finnish
fr    -> French
de    -> German
el    -> Greek
iw    -> Hebrew
hi    -> Hindi
hu    -> Hungarian
id    -> Indonesian
it    -> Italian
ja    -> Japanese
ko    -> Korean
lv    -> Latvian
lt    -> Lithuanian
no    -> Norwegian
fa    -> Persian
pl    -> Polish
pt-BR -> Portuguese (Brazil)
pt-PT -> Portuguese (Portugal)
ro    -> Romanian
ru    -> Russian
sr    -> Serbian
sk    -> Slovak
sl    -> Slovenian
es    -> Spanish
sv    -> Swedish
th    -> Thai
tr    -> Turkish
uk    -> Ukrainian
vi    -> Vietnamese
Spider Exclusion

The spiders of any search engines are not allowed to crawl google mirror.
Default robots.txt listed below was build-in aleady.

User-agent: *
Disallow: /

If google_robots_allow set to on, the robots.txt will be replaced with the version of google itself.

  #...
  location / {
    google on;
    google_robots_allow on;
  }
  #...
Upstreaming

upstream can help you to avoid name resolving cost, decrease the possibility of google robot detection and proxy through some specific servers.

upstream www.google.com {
  server 173.194.38.1:443;
  server 173.194.38.2:443;
  server 173.194.38.3:443;
  server 173.194.38.4:443;
}
Proxy Protocol

By default, the proxy will use https to communicate with backend servers.
You can use google_ssl_off to force some domains to fall back to http protocol.
It is useful, if you want to proxy some domains through another gateway without ssl certificate.

#
# eg. 
# i want to proxy the domain 'www.google.com' like this
# vps(hk) -> vps(us) -> google
#

#
# configuration of vps(hk)
#
server {
  # ...
  location / {
    google on;
    google_ssl_off "www.google.com";
  }
  # ...
}

upstream www.google.com {
  server < ip of vps(us) >:80;
}

#
# configuration of vps(us)
#
server {
  listen 80;
  server_name www.google.com;
  # ...
  location / {
    proxy_pass https://www.google.com;
  }
  # ...
}

Copyright & License

All codes are under the same LICENCE with Nginx
Copyright (C) 2014 by Cube.

ngx_http_google_filter_module's People

Contributors

0x1997 avatar cuber avatar gitsrc avatar tq5124 avatar wzyboy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ngx_http_google_filter_module's Issues

仅开启 ssl 时,点击搜索结果会出现用非ssl协议跳转,导致404错误

请用 https://g.ideavirgin.com 测试。

搜索结果如果是 https 页面,会使用 https 做跳转;搜索结果如果是 http 页面,则会使用 http 做跳转,导致出现 404 错误。

nginx 配置如下:


#user  nobody;
worker_processes  1;

#error_log  logs/error.log;
#error_log  logs/error.log  notice;
#error_log  logs/error.log  info;

#pid        logs/nginx.pid;


events {
    worker_connections  1024;
}


http {
    include       mime.types;
    default_type  application/octet-stream;

    #log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
    #                  '$status $body_bytes_sent "$http_referer" '
    #                  '"$http_user_agent" "$http_x_forwarded_for"';

    #access_log  logs/access.log  main;

    sendfile        on;
    #tcp_nopush     on;

    #keepalive_timeout  0;
    keepalive_timeout  65;

    #gzip  on;

    #server {
    #    listen       80;
    #    server_name  localhost;

    #    #charset koi8-r;

    #    #access_log  logs/host.access.log  main;

    #    location / {
    #        root   html;
    #        index  index.html index.htm;
    #    }

    #    #error_page  404              /404.html;

    #    # redirect server error pages to the static page /50x.html
    #    #
    #    error_page   500 502 503 504  /50x.html;
    #    location = /50x.html {
    #        root   html;
    #    }

    #    # proxy the PHP scripts to Apache listening on 127.0.0.1:80
    #    #
    #    #location ~ \.php$ {
    #    #    proxy_pass   http://127.0.0.1;
    #    #}

    #    # pass the PHP scripts to FastCGI server listening on 127.0.0.1:9000
    #    #
    #    #location ~ \.php$ {
    #    #    root           html;
    #    #    fastcgi_pass   127.0.0.1:9000;
    #    #    fastcgi_index  index.php;
    #    #    fastcgi_param  SCRIPT_FILENAME  /scripts$fastcgi_script_name;
    #    #    include        fastcgi_params;
    #    #}

    #    # deny access to .htaccess files, if Apache's document root
    #    # concurs with nginx's one
    #    #
    #    #location ~ /\.ht {
    #    #    deny  all;
    #    #}
    #}


    # another virtual host using mix of IP-, name-, and port-based configuration
    #
    #server {
    #    listen       8000;
    #    listen       somename:8080;
    #    server_name  somename  alias  another.alias;

    #    location / {
    #        root   html;
    #        index  index.html index.htm;
    #    }
    #}


    # HTTPS server

    server {
        listen       443 ssl;
        server_name  g.ideavirgin.com;

        ssl_certificate      /etc/pki/tls/certs/g.ideavirgin.com.crt;
        ssl_certificate_key  /etc/pki/tls/private/g.ideavirgin.com.key;

        #ssl_session_cache    shared:SSL:1m;
        #ssl_session_timeout  5m;

        #ssl_ciphers  HIGH:!aNULL:!MD5;
        #ssl_prefer_server_ciphers  on;

    resolver 8.8.8.8;
        location / {
        google on;
        }
    }

}

严格按照步骤来,左后打开的网页却是“Welcome to nginx!”

提示:


Welcome to nginx!

If you see this page, the nginx web server is successfully installed and working. Further configuration is required.

For online documentation and support please refer to nginx.org.
Commercial support is available at nginx.com.

Thank you for using nginx.


这是为什么?

centos 经常出现nginx错误 偶尔正常

[alert] 22169#0: *15 socket() failed (97: Address family not supported by protocol) while connecting to upstream, client: 116.234.73.109, server: localhost, request: "GET /search?newwindow=1&site=&source=hp&q=%E6%9F%A5%E7%9C%8Bcentos+log&btnG=Google+%E6%90%9C%E7%B4%A2 HTTP/1.1", upstream: "https://[2404:6800:4005:80a::2004]:443/search?newwindow=1&site=&source=hp&q=%E6%9F%A5%E7%9C%8Bcentos+log&btnG=Google+%E6%90%9C%E7%B4%A2", host: "203.88.160.49", referrer: "http://203.88.160.49/"

建议做得再通俗易懂一些

我在香港拥有几台vps在cn2线路上。但我英文真的很差,我觉得大家都是**人,做个中文版的福利一下像我这样有资源却没文化的人吧!相反国务院审查部的人可能英文水平很高,你把台阶抬高,拦住我们而不是拦他们.

部分浏览器和访客反应重定位过多

很奇怪,我配置都是正常的。我手机、电脑的chrome 正常,电脑safari也正常。但就是手机safari和电脑遨游浏览器不正常,老是说重定位太多。 另外,很多朋友也给我说打不开! 怎么解决呢?https://s.ets.cc/

折腾一下午加一晚上没搞定

无论怎么玩,都是无法访问,certificate 是自己生成的,端口80或443都不行,不是502就是403. 倒是百度能成功。

nginx 编译参数

nginx version: nginx/1.7.8
built by gcc 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5)
TLS SNI support enabled
configure arguments: --user=www --group=www --prefix=/usr/local/nginx --with-http_stub_status_module --with-http_ssl_module --with-http_gzip_static_module --with-ipv6 --with-http_sub_module --with-pcre=/home/simon/soft/lnmp1.1/pcre-8.36 --add-module=/home/simon/soft/lnmp1.1/ngx_http_google_filter_module --add-module=/home/simon/soft/lnmp1.1/ngx_http_substitutions_filter_module

nginx 配置

   server {
       listen      80;
       server_name  g.xxx.net ;
  #    ssl on;
   #    ssl_certificate /usr/local/nginx/conf/vhost/ssl/server.crt;
   #    ssl_certificate_key /usr/local/nginx/conf/vhost/ssl/server.key;
       location / {
           google on;
    #   proxy_pass http://www.baidu.com;
      }
}


能不能多分享几个GoogleIP

upstream www.google.com {
server 173.194.38.1:443;
server 173.194.38.2:443;
server 173.194.38.3:443;
server 173.194.38.4:443;
server 123.205.251.100;
}
能不能多分享几个Google IP,不知道这样写80端口的IP是否可行。
此外我配置完成后手机端打不开,PC正常。
还有,我发现同一个问题搜索,搜到的结果g.wen.lu和我g.cniot.info搜索出的结果不同,请指教。

加入allow deny问题

根据LZ的模块测试成功,但加入
allow x.x.x.x;
deny all;
后,
deny IP访问出来的默认403页面变成乱码

allow的IP访问一切正常。

分享: 移除 Google-Redirection | 有没必要集成, 如果有需要给个选项禁用

location / {
    # https://github.com/dangoakachan/Remove-Google-Redirection
    sub_filter      </html>
        '<script>(function(window){"use strict";function injectFunction(func){var ele=document.createElement("script");var s=document.getElementsByTagName("script")[0];ele.type="text/javascript";ele.textContent="("+func+")();";s.parentNode.insertBefore(ele,s)}function disableURLRewrite(){function inject_init(){Object.defineProperty(window,"rwt",{value:function(){return true},writable:false,configurable:false})}injectFunction(inject_init)}function cleanTheLink(a){if(a.dataset["cleaned"]==1)return;var need_clean=false;var result=/\/(?:url|imgres).*[&?](?:url|q|imgurl)=([^&]+)/i.exec(a.href);if(result){need_clean=true;a.href=result[1]}var val=a.getAttribute("onmousedown")||"";if(val.indexOf("return rwt(")!=-1){need_clean=true;a.removeAttribute("onmousedown")}var cls=a.className||"";if(cls.indexOf("irc_")!=-1)need_clean=true;if(need_clean){var clone=a.cloneNode(true);a.parentNode.replaceChild(clone,a);clone.dataset["cleaned"]=1}}function main(){disableURLRewrite();document.addEventListener("mouseover",function(event){var a=event.target,depth=1;while(a&&a.tagName!="A"&&depth-->0)a=a.parentNode;if(a&&a.tagName=="A")cleanTheLink(a)},true)}main()})(window);</script></html>';
    sub_filter_once on;
  }

访问流程是什么

博主你好,
我对这个流程没弄明白。
难道是先抓下来网页,再访问? 爬虫按ip一条条去找? 速度慢不慢?

最缺的是ip吧? 过一段时间被封了只能手动加吗?

firefox 35 无法显示页面

昨天在VPS上搭了一下,用的StartSSL的免费证书

由于StartSSL默认有个non-www的域名,所以non-www的单独配了个vhost

反代只监听了 server_name 是 www.xxx.xx:443 的域名

现在问题来了,https://xxx.xx 在firefox里面是可以访问的,页面是可以看见的

唯独反代的 https://www.xxx.xx 在火狐里直接显示内容损坏错误

其它浏览器如UC、Chrome等都是正常的

实在是想不到原因了,求帮忙解决下。

# nginx -V
nginx version: nginx/1.7.9
built by gcc 4.4.7 20120313 (Red Hat 4.4.7-11) (GCC)
TLS SNI support enabled

ssl配置

#SSL Certificate 
ssl_certificate /etc/nginx/certs/xxx.xx.crt;
ssl_certificate_key /etc/nginx/certs/xxx.xx.key;
#TLS only
ssl_protocols TLSv1 TLSv1.1 TLSv1.2;
#SSL Session Cache
ssl_session_cache shared:SSL:10m;
ssl_session_timeout 10m;
#OCSP stapling
ssl_stapling on; 
ssl_stapling_verify on; 
ssl_trusted_certificate /etc/nginx/certs/xxx.xx.crt;
resolver 8.8.8.8;
#Disable Beast Attacks
ssl_prefer_server_ciphers on;
#ssl_ciphers HIGH:!aNULL:!MD5:!DSS:!RC4;
#Stronger DHE Parameters
ssl_dhparam /etc/nginx/certs/dh.pem;
#HSTS
add_header Strict-Transport-Security "max-age=31536000";

关于首次访问出现"400 bad request"错误

还是我哈,清除cookie以后,首次访问网站,会出现"400 bad request",我看到地址栏返回的结果是"http:://domain.name/%00,然而第二次以后访问都正常了。

我把nginx日志打开了,看到有这条。

2015/05/15 17:27:12 [info] 17257#0: *1 client sent invalid request while reading client request line, client: 113.87.235.183, server: localhost, request: "GET /%00 HTTP/1.1"

图片搜索的延迟现象

首先多谢cuber提供好工具。
用了这个模块后,进行g图片搜索时,最初只能刷出来一部分。
过了大约半分钟到一分钟,向下翻动时能刷新出许多行。
然后又停顿了,继续等一阵之后才能继续缓冲出来。
http中 gzip和proxy,chache关闭与否都有没有好的效果,
请问这可能是哪出的问题?

重新编译了nginx还是不能访问google

求助,我的nginx重新编译了.然后也照着README设置了,发现还是不行.具体信息如下

# /usr/local/nginx/sbin/nginx -V
nginx version: nginx/1.7.8
built by gcc 4.4.7 20120313 (Red Hat 4.4.7-11) (GCC)
configure arguments: --prefix=/usr/local/nginx/ --with-http_stub_status_module --add-module=../ngx_http_google_filter_module-master/ --add-module=../ngx_http_substitutions_filter_module-master/

配置文件如下


server {
    resolver 8.8.8.8;
    listen 80;
    server_name google.linuxsogood.org;

    access_log logs/google.acc.log;
    location / {
        google on;
    #    google_scholar on;
    #    proxy_pass https://www.google.com;

我用360浏览器打开,结果是502,好奇葩啊..
是我哪里配置有问题吗?

求阅读源码的建议

我是C的初学者,刚刚接触nginx。
想阅读这个项目的源码,该从何入手呢?
需要了解些什么知识呢?

ngx_http_google_util.c:157: error: format '%lu'

@zxy mentioned via v2ex.com

cenos 6 32bit

/root/ngx_module/ngx_http_google_filter_module/src/ngx_http_google_util.c: In function 'ngx_http_google_debug':
/root/ngx_module/ngx_http_google_filter_module/src/ngx_http_google_util.c:157: error: format '%lu' expects type 'long unsigned int', but argument 3 has type 'ngx_uint_t'
make[1]: *** [objs/addon/src/ngx_http_google_util.o] Error 1

不能用啊

设置好以后 服务器返回504 badgetway。

failed: /root/ssl/g4w.me.crt

nginx: [emerg] BIO_new_file("/root/ssl/g4w.me.crt") failed (SSL: error:02001002:system library:fopen:No such file or directory:fopen('/root/ssl/g4w.me.crt','r') error:2006D080:BIO routines:BIO_new_file:no such file)
按照您的操作步骤,在 nginx -t 时抛出上面的错误,是否要下载证书?在哪里下载?谢谢~

Help for nonprogrammer needed.

Can you guys make a tutorial for nonprogrammers to explain briefly how the project works and show us how to use it(graphically)?
I think that would help more people access your project.
Thank you : )

编译时出错

-bash: --add-module=../ngx_http_substitutions_filter_module: No such file or directory

在代理后的页面明确指出这是一个代理

**** 11:07:57
我居然才知道wen.lu是谷歌的域名

鉴于最近越来越多出现此类不明真相的群众,我觉得在代理后的页面应当明确指出这是一个代理。例如可以把 Google Logo 右下角的【简体中文】改成【代理】。

能不能写个脚本直接执行安装?

作为一个小白不懂ngnix的使用,看着你的教程晕头转向,要是你能弄个脚本,像shadowsocks一样支持一键简易配置安装的话。我想应该会得很好的推广的

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.