googlecloudplatform / gcp-token-broker Goto Github PK
View Code? Open in Web Editor NEWLicense: Apache License 2.0
License: Apache License 2.0
The broker-connector.sh init action already restarts the metastore service on master nodes. However, it seems that Hive queries still fail unless one manually restarts the metastore again after SSH'ing into the master node. We should verify that the metastore restarts properly as part of the init action execution.
The current version of token-broker (both Java and Python) have support for Cloud Datastore. It'd be great to have token-broker support JDBC out of the box.
IIUC, the Broker-KDC firewall setting is no longer required since KDC and Broker don't communicate in the new design. Can you confirm this?
gcp-token-broker/terraform/broker.tf
Line 43 in 281a0a2
gcp-token-broker/apps/authorizer/app.py
Line 85 in c5551ed
I think we should have TLS support in the authorizer server, since there can be deployment scenarios where the authorizer is not shielded by a private network.
We can probably make it TLS-enabled by default, and if some flag is provided, then we can turn it off.
Thoughts?
[WARNING] /XXX/gcp-token-broker/apps/broker/src/main/java/com/google/cloud/broker/authentication/SpnegoAuthenticator.java:[33,39] sun.security.krb5.internal.ktab.KeyTab is internal proprietary API and may be removed in a future release
[WARNING] /XXX/gcp-token-broker/apps/broker/src/main/java/com/google/cloud/broker/authentication/SpnegoAuthenticator.java:[34,39] sun.security.krb5.internal.ktab.KeyTabEntry is internal proprietary API and may be removed in a future release
[WARNING] /XXX/gcp-token-broker/apps/broker/src/main/java/com/google/cloud/broker/authentication/SpnegoAuthenticator.java:[52,17] sun.security.krb5.internal.ktab.KeyTab is internal proprietary API and may be removed in a future release
[WARNING] /XXX/gcp-token-broker/apps/broker/src/main/java/com/google/cloud/broker/authentication/SpnegoAuthenticator.java:[52,33] sun.security.krb5.internal.ktab.KeyTab is internal proprietary API and may be removed in a future release
[WARNING] /XXX/gcp-token-broker/apps/broker/src/main/java/com/google/cloud/broker/authentication/SpnegoAuthenticator.java:[53,17] sun.security.krb5.internal.ktab.KeyTabEntry is internal proprietary API and may be removed in a future release
[WARNING] /XXX/gcp-token-broker/apps/broker/src/main/java/com/google/cloud/broker/authentication/SpnegoAuthenticator.java:[63,22] sun.security.krb5.internal.ktab.KeyTabEntry is internal proprietary API and may be removed in a future release
"Test proxy user authentication" in the tutorial did not work for me right away. Reason was that I had enabled the GCE API before 9/6/2018 so the internal DNS resolution on my node was different as per our docs. On my node the internal DNS resolved name is VM_NAME.c.PROJECT_ID.internal while the deployment assumes VM_NAME.ZONE.c.PROJECT_ID.internal
After manually changing the proxy in broker-server.conf to use the correct hostname as it resolved with the internal DNS and redeploying the broker it worked like a charm. Maybe this is something that should be added to the documentation.
Investigate whether health check probes can be added to the broker application by using the grpc-health-probe project. See also this blog post.
It seems the broker-connector is currently reading the tls-certificate of the broker directly from core-site.xml
. Can we allow an option of loading the certificate from a file?
Sessions that don't explicitly get cleaned (e.g. if the Yarn Resource Manager fails to call the "cancel" method), then some expired sessions will remain in the database. Those sessions could be cleaned up by running a clean-up command on a regular basis with a cron job. Something like what Django has:
https://docs.djangoproject.com/en/dev/topics/http/sessions/#clearing-the-session-store
https://docs.djangoproject.com/en/dev/ref/django-admin/#django-admin-clearsessions
BROKER_REALM
can be removed completely since it's not being used anywhere.
ORIGIN_REALM
can be removed from terraform/broker.tf
since the broker app doesn't use it.
The latter is still needed for the demo setup.
I got an error while trying to run the gcs-connector today. The corresponding stack-trace is pasted below.
Currently, the message (i.e. the stack-trace) doesn't clearly specify what exactly went wrong. Either there was an issue with the Kerberos token, or the GetAccessTokenRequest failed. Also, if we are swallowing the GSSException
then I think we should at least log it.
What do you think?
user@box:~/broker$ hadoop --config hadoop-conf/ fs -ls gs://some-sample-bucket
-ls: Fatal internal error
java.lang.RuntimeException: User is not logged-in with Kerberos or cannot authenticate with the broker.
at com.google.cloud.broker.hadoop.fs.BrokerAccessTokenProvider.refresh(BrokerAccessTokenProvider.java:99)
at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.util.CredentialFromAccessTokenProviderClassFactory$GoogleCredentialWithAccessTokenProvider.executeRefreshToken(CredentialFromAccessTokenProviderClassFactory.java:66)
at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.auth.oauth2.Credential.refreshToken(Credential.java:494)
at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.auth.oauth2.Credential.intercept(Credential.java:217)
at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.http.HttpRequest.execute(HttpRequest.java:897)
at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:499)
at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:432)
at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:549)
at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl.getBucket(GoogleCloudStorageImpl.java:1889)
at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl.getItemInfo(GoogleCloudStorageImpl.java:1846)
at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.GoogleCloudStorageFileSystem.getFileInfoInternal(GoogleCloudStorageFileSystem.java:1118)
at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.GoogleCloudStorageFileSystem.getFileInfo(GoogleCloudStorageFileSystem.java:1109)
at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.getFileStatus(GoogleHadoopFileSystemBase.java:1133)
at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:64)
at org.apache.hadoop.fs.Globber.doGlob(Globber.java:285)
at org.apache.hadoop.fs.Globber.glob(Globber.java:151)
at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1656)
at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.globInternal(GoogleHadoopFileSystemBase.java:1382)
at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.globStatus(GoogleHadoopFileSystemBase.java:1280)
at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.globStatus(GoogleHadoopFileSystemBase.java:1241)
at org.apache.hadoop.fs.shell.PathData.expandAsGlob(PathData.java:326)
at org.apache.hadoop.fs.shell.Command.expandArgument(Command.java:235)
at org.apache.hadoop.fs.shell.Command.expandArguments(Command.java:218)
at org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:102)
at org.apache.hadoop.fs.shell.Command.run(Command.java:165)
at org.apache.hadoop.fs.FsShell.run(FsShell.java:315)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.fs.FsShell.main(FsShell.java:372)
I don't think the authorizer depends on redis, so we can remove this.
This purpose of this proposal is to reduce the load on the token broker by implementing the following new caching strategy:
/users/bob/.encrypted-access-token-[JOB-ID]
).Encrypting the access tokens on HDFS reduces the risk in case of exfiltration of the HDFS files. Also, the fact that each job uses a different random encryption means that a rogue job wouldn't be able to steal another job's access token.
This implementation means that the individual tasks would never have to communicate with the TB directly, therefore drastically reducing load on the TB server. Only the client and the Yarn master would send a few requests to create, renew, and delete the session TB session.
Here it seems that the realm of the logged-in user is being used to create the service principal of token-broker.
It might not be the case that both of them are part of the same realm, in which case this will fail.
I suspect that the maximum header size is too low in the broker server. Currently, while creating the server we are not configuring maxInboundMetadataSize
, so it defaults to 8 KB, which I think is too small, since the Kerberos token may exceed the 8 KB limit. (which is being added in the connector side as seen here)
I think it needs to be figured out what the appropriate size should be set to.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.