Comments (17)
proxy support for NcbiImporter added in version 0.1.122.
To enable it, you must configure 3 properties in your conf/gluetools-config.xml (adding these under the XML element).
Here's an example of how the config properties would look:
<!-- HTTP proxy config example -->
<property>
<name>gluetools.core.http.proxy.enabled</name>
<value>true</value>
</property>
<property>
<name>gluetools.core.http.proxy.host</name>
<value>45.232.52.23</value>
</property>
<property>
<name>gluetools.core.http.proxy.port</name>
<value>3128</value>
</property>
from gluetools.
Unfortunately, no luck here. I downloaded the newer versions (BTW with two versions for the .jar in /lib, gluetools.sh uses the older version). The end of my conf/gluetools-config.xml
looks like:
<!-- Cayenne -->
<property>
<name>cayenne.querycache.size</name>
<value>30000</value>
</property>
<!-- HTTP proxy config -->
<property>
<name>gluetools.core.http.proxy.enabled</name>
<value>true</value>
</property>
<property>
<name>gluetools.core.http.proxy.host</name>
<value>158.119.150.18</value>
</property>
<property>
<name>gluetools.core.http.proxy.port</name>
<value>8080</value>
</property>
</properties>
</gluetools>
Corresponding to my environment variables and browser settings for our proxy: 158.119.150.18:8080
. The GLUE commands that don't work as expected:
GLUE version 0.1.122
Mode path: /
GLUE> project hev
OK
Mode path: /project/hev
GLUE> module ncbiHevImporter
OK
Mode path: /project/hev/module/ncbiHevImporter
GLUE> preview --detailed
Error: I/O error during eSearch: Connection timed out (Connection timed out)
Cause: Connection timed out (Connection timed out)
Mode path: /project/hev/module/ncbiHevImporter
The report from tcptrack
:
158.119.178.158:36688 158.119.150.18:8080 ESTABLISHED 2m 0 B/s
158.119.178.158:36828 158.119.150.18:8080 ESTABLISHED 22s 0 B/s
158.119.178.158:36140 158.119.150.18:8080 ESTABLISHED 59s 0 B/s
WRT debugging, if I use these settings for incorrect proxy address:
<property>
<name>gluetools.core.http.proxy.host</name>
<value>158.119.150.19</value>
</property>
I get:
GLUE> preview --detailed
Error: I/O error during eSearch: Connect to 158.119.150.19:8080 [/158.119.150.19] failed: Connection refused (Connection refused)
Cause: Connect to 158.119.150.19:8080 [/158.119.150.19] failed: Connection refused (Connection refused)
Cause: Connection refused (Connection refused)
Mode path: /project/hev/module/ncbiHevImporter
and
158.119.178.158:36140 158.119.150.18:8080 RESET 0s 0 B/s
And if I switch the proxy off in the config:
<property>
<name>gluetools.core.http.proxy.enabled</name>
<value>false</value>
</property>
I eventually get (trying import
instead of preview
):
GLUE> import --detailed
Error: I/O error during eSearch: Connection timed out (Connection timed out)
Cause: Connection timed out (Connection timed out)
Mode path: /project/hev/module/ncbiHevImporter
and
158.119.178.158:56684 130.14.29.110:443 SYN_SENT 2s 0 B/s
So it looks like your modification is behaving as expected WRT to redirection to the proxy, but the objective still isn't achieved . . . is it also listening for responses via the proxy?
from gluetools.
I just tested Python's BioPython
handle = Entrez.esearch(db="nuccore", term="Hepatitis E", field="title", rettype='xml')
print(Entrez.read(handle)[u'QueryTranslation'])
gives
hepatitis e[Title]
and proxy address activity:
158.119.178.158:37124 158.119.150.18:8080 ESTABLISHED 13s 0 B/s
158.119.178.158:36140 158.119.150.18:8080 ESTABLISHED 27s 0 B/s
158.119.178.158:37212 158.119.150.18:8080 CLOSING 20s 0 B/s
I believe Python looks to environment variables http_proxy
and https_proxy
which are:
~/VRD/gluetools$ echo $http_proxy
http://158.119.150.18:8080/
~/VRD/gluetools$ echo $https_proxy
https://158.119.150.18:8080/
from gluetools.
OK, I am guessing the situation is this:
NCBI requires HTTPS to be used to access its API, although this was not always the case, so if you investigate this issue you may see people talking about accessing NCBI via plain HTTP.
BioPython Entrez functionality uses Python's urllib which will pick up proxy settings from environment variables. The endpoint (NCBI) is HTTPS, therefore urllib will use the proxy defined in https_proxy.
You could test this hypothesis by messing with the https_proxy environment variable -- this should break BioPython Entrez functionality. Conversely, messing with http_proxy should have no effect on BioPython Entrez.
In your case (and this is not universally true) the HTTPS proxy in your local site itself uses HTTPS with the client, hence the fact that https://... is the protocol.
So, in the latest release 0.1.123, I've changed GLUE so that you set the HTTPS proxy in glue-config.xml like this:
<!-- HTTPS proxy config example -->
<property>
<name>gluetools.core.https.proxy.enabled</name>
<value>true</value>
</property>
<property>
<name>gluetools.core.https.proxy.url</name>
<value>https://123.45.67.89:8080</value>
</property>
So now you can set this up in a similar way to the https_proxy environment variable. The Ncbi importer will use the protocol you configure in the URL.
PS I have also updated gluetools.sh -- if it finds multiple jars in the lib directory it throws an error. Updated script version here:
https://github.com/giffordlabcvr/gluetools/blob/master/gluetools-core/gluetools/bin/gluetools.sh
from gluetools.
Success! Thanks.
from gluetools.
Almost . . . the first search returned the table of GI numbers, status etc. But a subsequent call of the same command has problems:
GLUE> preview --detailed
Error: I/O error during eSearch: Remote host closed connection during handshake
Cause: Remote host closed connection during handshake
Cause: SSL peer shut down incorrectly
Mode path: /project/enterovirus/module/enterovirusCuratedNcbiImporter
GLUE> preview --detailed
Error: I/O error during eSearch: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
Cause: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
Cause: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
Cause: unable to find valid certification path to requested target
Mode path: /project/enterovirus/module/enterovirusCuratedNcbiImporter
Same result for preview
and import --preview
.
from gluetools.
Quiting GLUE then rerunning GLUE, opening project etc returns second error above.
from gluetools.
This is because we're trying to connect to your proxy via SSL, and Java's SSL layer does not recognise the certificate of your proxy server, probably because it is self-signed. To fix this, you would have to install your proxy server's certificate in the JRE using something like this method:
https://www.grim.se/guide/jre-cert
from gluetools.
grabbing the https proxy's certificate with
echo -n | openssl s_client -connect 158.119.150.18:443 | sed -ne '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p' > /tmp/local_https_proxy.cert
and examining it with
openssl x509 -in /tmp/local_https_proxy.cert -text
Gives: Issuer: C=US, O=DigiCert Inc, OU=www.digicert.com
and I did not have to add any certificates to the browser to get it working through the proxy - so I don't think the HTTPS proxy is using a self-signed certificate.
With my GLUE (ver. 0.1.131) config settings as
<property>
<name>gluetools.core.https.proxy.enabled</name>
<value>true</value>
</property>
<property>
<name>gluetools.core.https.proxy.url</name>
<value>https://158.119.150.18:443</value>
</property>
with and without the above certificate in the key store, manually added with:
CERT="/tmp/local_https_proxy.cert"
CERTALIAS="local_https_proxy"
sudo keytool -import \
-trustcacerts \
-keystore /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/security/cacerts \
-storepass changeit \
-noprompt \
-alias $CERTALIAS \
-file $CERT
I get this error message from within GLUE
Error: I/O error during eSearch: Host name '158.119.150.18' does not match the certificate subject provided by the peer (CN=*.phe.gov.uk, O=Public Health England, L=London, ST=England, C=GB)
Cause: Host name '158.119.150.18' does not match the certificate subject provided by the peer (CN=*.phe.gov.uk, O=Public Health England, L=London, ST=England, C=GB)
I tried an independent test of the local Java's ability to deal with the proxy server over SSL/HTTPS using "SSLPoke" code here which returned:
java -cp ./ SSLPoke 158.119.150.18 443
Successfully connected
So it looks like the HTTPS certificate on my computer, my Java install and this https proxy server are all in order. Is GLUE applying an overly stringent check on matching the domain name *phe.gov.uk against the proxy's IP address? Do I need to supply a full domain name for this proxy server to match the certificate instead of an IP address? I suspect there is an internal domain name that doesn't end in *phe.gov.uk if any for this proxy . . .
from gluetools.
Yes, this proves that the GLUE stack is establishing the correct SSL certificate, and the verification step is failing at the hostname match step.
This is something which the Apache HttpComponents library (which GLUE uses) applies strictly by default. However it seems to be quite configurable so we can probably get it to skip this step if necessary.
It is possible that your proxy server has an internal domain name which ends in .phe.gov.uk. If so you can find it out using this unix command:
% host 158.119.150.18
If that's the case then you can just apply this in the GLUE config. If not then I can investigate switching off the strict hostname match step.
from gluetools.
Indeed there is an appropriate looking host name. I'll test shortly.
from gluetools.
This could be progress!
With https://tmgcol001.phe.gov.uk:443
as value for gluetools.core.https.proxy.url
I get:
GLUE> preview --detailed
Error: Protocol error during eSearch: HTTP/1.1 400 Bad Request ( The data is invalid. )
Mode path: /project/enterovirus/module/enterovirusCuratedNcbiImporter
Maybe we've actually made contact with NCBI at this point?
from gluetools.
Yes, we may have made contact. Something, possibly the proxy or eSearch, is responding with the 400 error code, maybe because the URL or the search query is wrong?
I think we will have to dig into the response to find out more, possibly by adding some more logging in GLUE.
from gluetools.
I strongly suspect this is an interaction between the PHE proxy and ApacheHttpClient rather than an NCBI issue but let's try to confirm that.
I've created a minimal Java program
minimalProxyTest.zip
which attempts to connect to an endpoint optionally via an SSL proxy.
To run it, unzip the attached file and do
% java -jar minimalProxyTest.jar request.properties
It will try to connect to an HTTP endpoint, optionally via an SSL proxy, and if that works, it will output the request details to stdout. It uses the same HTTP java libraries which GLUE uses.
You can twiddle where it tries to connect to and other details using the request.properties file (the '#' character comments out a line).
So from your end it will be informative to first confirm that you can reproduce the HTTP 400 error connecting to NCBI via the proxy. If there's no 400 error then there's some difference between the GLUE setup and the minimal program, which we will need to identify.
Assuming the error is reproduced we could then test for example if we can use the program to GET https://www.google.com via the proxy. If so then the NCBI request is a factor. If not then it's purely something to do with ApacheHttpClient and the proxy. If that does turn out to be the case, I found this thing which supposedly helps debug proxy issues: https://www.charlesproxy.com/.
I have also included the Java source for reference, if you want to build it I can help with that.
from gluetools.
BTW I also released GLUE 0.1.132, which adds some logging at FINEST level, concerning the request sent to NCBI, and the response, if there's an error.
from gluetools.
I believe this is finally fixed in GLUE 1.1.38.
The issue was actually this:
-- connection to NCBI / https via proxy is enabled by these settings in the gluetools-config.xml file:
gluetools.core.https.proxy.enabled
true
gluetools.core.https.proxy.url
-your HTTPS proxy URL -
-- This worked in terms of retrieving data from NCBI
-- However, the XML document returned from NCBI contains a DTD reference with its own URL.
-- The XML parser within GLUE then tries to resolve this DTD reference via the network connection. This network connection is unaware of the need to use a web proxy. Hence, the "connection refused" behaviour.
So the fix was to disable the remote lookup of the DTD reference GLUE's XML parser.
from gluetools.
Thanks Josh - I hope to have time to revisit this later in the summer
from gluetools.
Related Issues (10)
- Alignment computation HOT 2
- error for exampleAlignmentTree.glue HOT 1
- Import source HOT 1
- Import alignment HOT 1
- Trying to import source from a directory filled with FASTA HOT 1
- Trying to import FASTA into source HOT 1
- Exception on "quit" after project build HOT 2
- Importing alignments HOT 4
- Feature request: export to MicroReact
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gluetools.