If no endpoint is provided via .gitconfig or <code cl

setting Content-Disposition to <code class="notransla

Use remote-based endpoint,about git-lfs/git-lfs

Comments (29)

rubyist commented on May 23, 2024

Here are some more thoughts about how this can work. Along with using a remote-based endpoint, there's also the issue of moving alambic onto githubusercontent.com. We want to avoid passing around a user's basic auth credentials across the domains.

Here is the way git media and alambic currently work:

When GETing media, the client contacts media.github.com (alambic) directly, passing its BasicAuth credentials. Alambic then makes an API call to .com to verify the action is OK. The BasicAuth credentials are passed on from alambic to .com.

PUTting (new) media up works in a similar way. The client contacts media.github.com (alambic) directly, passing its BasicAuth credentials. Alambic then makes an API call to .com which will verify the action and create some initial metadata stored by .com. After the data is pushed up to S3, alambic makes another API call to .com to store the rest of the metadata. In both instances, the BasicAuth credentials are passed on from alambic to .com.

Moving things across domains, we could go about this a different way:

Here, the client initially contacts .com via the remote-based media url (e.g. https://github.com/github/rubyistmedia2/info/media). The client passes along its BasicAuth credentials. .com will verify this and return a 307 with a url to alambic (now on media.githubusercontent.com) along with a token. The client then uses this token to proceed with alambic as normal.

PUTting (new) media will work similarly. However, we'll start with (perhaps) an OPTIONS request rather than a straight PUT. This avoids uploading all the data .com which has no need for it. In this instance, .com can not only verify and generate a token but it can also create the initial metadata entry, saving a later call from alambic into the .com API. The client can then PUT to alambic as before, using the token for authentication.

Challenges

We need to generate a token (or other kind of message) from .com that can be passed to alambic and verified by alambic as being legit and having come from .com. This isn't difficult to do, but we want to make sure we do it right.
On PUT we need to authenticate alambic's sendMeta call into .com. We have all the data at this point, but we do not any longer have the user's BasicAuth credentials. We could use the generated token, making the assumption that .com can store and later validate those tokens. Or we could use a higher level mechanism to ensure the link between alambic and .com is secure and authenticated (I'm in favor of this one, for all cases).

I'd also really love for some security folks to put 👀 on this to make sure everything is above board.

/cc @github/git-media @github/security

from git-lfs.

technoweenie commented on May 23, 2024

It'd be really nice to keep Alambic at github.com. The protocol is much simpler from the client's perspective. The user experience of not having to setup Git Media urls and multiple passwords will be much better for new Git Media users. We can do several things to keep the user content from rendering in browsers:

Wrap content in an archive format, like tar.
Require a custom user agent.
Require a custom Accept header.

This is similar to how the Git client interacts with user content on github.com.

from git-lfs.

rubyist commented on May 23, 2024

I'm totally down with keeping it under github.com. The client protocol is a bit simper, but also because it's already working that way. So who or what is behind the decision to keep or move alambic? How can we move that discussion forward? Currently my risks are either building on a protocol that is going to change or building a protocol that isn't going to be used.

All of @technoweenie's proposals are much easier to implement than the token above, in fact we already require a specific Accept header. What are the risks in moving that way?

from git-lfs.

btoews commented on May 23, 2024

I don't have a problem with the git-media alambic endpoint being mounter under github.com if we're taking the precautions that @technoweenie mentions.

from git-lfs.

rubyist commented on May 23, 2024

💥 I'll move in that direction then. Thanks!

from git-lfs.

jbarnette commented on May 23, 2024

🤘

from git-lfs.

btoews commented on May 23, 2024

setting Content-Disposition to attachment would add some more protection against the Flash shit we've seen too.

from git-lfs.

gregose commented on May 23, 2024

The user experience of not having to setup Git Media urls and multiple passwords will be much better for new Git Media users.

So credentials stored for github.com will automatically be sent to media.github.com, but not media.githubusercontent.com?

from git-lfs.

gregose commented on May 23, 2024

setting Content-Disposition to attachment would add some more protection against the Flash shit we've seen too.

We should also set a tar specific or binary content type and the no sniff header.

from git-lfs.

rubyist commented on May 23, 2024

In terms of the git media client pushing and pulling data from alambic, we explicitly forward the Authorization header when alambic makes api calls into .com on the user's behalf.

What I'm working on right now is the git media client interaction with alambic, and it seems to me like the security concern is loading media content in the browser. Is it infeasible for the git media client to use media.github.com and the browser to use media.githubusercontent.com?

I think I am maybe unclear about what the risks are in terms of the git media client with using github.com.

from git-lfs.

gregose commented on May 23, 2024

I think I am maybe unclear about what the risks are in terms of the git media client with using github.com.

We have been bitten in the past by browsers and plugins, flash specifically, doing unexpected things and disregarding headers, parsing unexpected content as flash, and introducing cross-site scripting. By moving all user controlled raw data off of *.github.com we can avoid edge cases that we haven't thought of yet. The current risk is low, but higher than only serving these on a non github.com domain. Consider it defense-in-depth and that we want to avoid any potential future flash shit shows.

Along with using a remote-based endpoint, there's also the issue of moving alambic onto githubusercontent.com. We want to avoid passing around a user's basic auth credentials across the domains.

I don't see how having alambic on media.github.com vs media.githubusercontent.com makes any difference in passing around auth creds. Why the architecture needs to be drastically different? Creds will have to be passed for either media.github.com or media.githubusercontent.com regardless. I may be missing something obvious here.

from git-lfs.

rubyist commented on May 23, 2024

Looking through my notes, I think @technoweenie's concern was sending any auth credentials to the cdn. I'm not familiar enough with fastly to know the risks there, so he might be the better one to comment on that.

So my understanding of this so far is:

git media client we shouldn't have to really change anything with how this is currently working. There is no browser involved in this process, just a command line tool interacting with alambic. If git media blobs are served to the git media client via a cdn, there is perhaps an issue with auth headers hitting the cdn(?).

browsers when we render a wad of git media data in the browser, we need to take the precautions outlined above.

from git-lfs.

technoweenie commented on May 23, 2024

So, I may have been using media.github.com and media.githubusercontent.com interchangeably. I really just mean any non github.com url. I developed this originally with media.github.com (before we were on the guc domains), and find it more pleasant to type.

Forcing git media to use this separate domain makes the experience worse for newcomers and implementers:

Users have to know how to configure the media URL on their repositories, instead of deriving it from the Git HTTPS remote.
Users have to enter their password for github.com AND media.githubusercontent.com (or whatever url we use).
The spec is more complicated. Clients and servers have to deal with a convoluted redirection/token passing system.

We're trying to make this a natural extension of Git. I don't want GitHub's implementation to be clunky. It's not the end of the world though.

Though I guess we can build a prototype browser plugin exhibiting this flaw, and then submit it to any competitor to force them to use an alternate url too

from git-lfs.

technoweenie commented on May 23, 2024

Looking through my notes, I think @technoweenie's concern was sending any auth credentials to the cdn.

That's a good point. I see we're setting up cloud.githubusercontent.com, so we can use that for CDN content too. Then that frees up media.githubusercontent.com for just git media, directly to our servers.

from git-lfs.

technoweenie commented on May 23, 2024

@rubyist: The git media spec, and probably CLI client will be open source. So even though git-media is a CLI client, we're assuming that an attacker would try to use it for hacks through rogue flash/browser plugins.

from git-lfs.

rubyist commented on May 23, 2024

@technoweenie ah, that's a valid point 👍

from git-lfs.

gregose commented on May 23, 2024

Chatted with @technoweenie.

Clarification / summary as I understand wrt @github/security:

Currently using a domain that is not github.com requires a user to configure a media endpoint and configure credentials for that domain. This is true if we use media.github.com or whatever.githubusercontent.com and is not ideal from a user experience.

The ideal case is to have the media endpoint be something like https://github.com/foo/bar/info/media. There are two ways we can go about this.

Have Alambic sit directly at https://github.com/foo/bar/info/media. Proxy basic auth creds as shown in the first example. Con: raw data will come from github.com. Pro: creds will only be passed by the user to github.com. In talking with @mastahyeti this is very similar to how we proxy git currently and the con technically exists there as well.
Have Alambic sit on media.something. Creds will first be sent to github.com/foo/whatever and then auth will be delegated to media.something using an expiring auth token. Pro: raw data is served from media.something. Con: we need to implement remote auth token between the two. Protocol is more complex w/ redirects, etc.

If we can completely control the content and ensure browser UAs / plugins will never parse it as executable content (flash, javascript, etc) I think the pros of 1. are worth it. It is no worse than our current Git HTTPS proxy (which we should review for similar concerns).

from git-lfs.

gregose commented on May 23, 2024

FYI, tar will not be a good format for this. The first 100 bytes of the tar file are the filename of the first file in the archive and would be user controlled. ASCII encoded flash files would be able to be embedded as the filename in a tar file.

from git-lfs.

technoweenie commented on May 23, 2024

Git Media refers to files by content addressable sha-256 OIDs, so this isn't a concern. Also, the tar format is only for the network calls, so we can set the filename to whatever we want.

I'm open to any archive format that's suitable for streaming large files. I thought about zip, but I don't know how we'd protect against tiny zips that can expand to massive files...

from git-lfs.

gregose commented on May 23, 2024

@technoweenie it doesn't really matter with regards to security as long as the initial bytes are not user controlled. We could just add a static header to the file and would be fine with us.

from git-lfs.

technoweenie commented on May 23, 2024

Sure, I was hoping to pick something standard. We could just add a header like:

0014 git media v1
{content}

It could be built with:

# write_header(res.body, "git media v1")
def write_header(io, header)
  full = " #{header}\n"
  prefix = full.bytesize.to_s.rjust(4, "0")
  io.write "#{prefix}#{full}"
end

Git's HTTPS server does something similar, for each line. Basically we split the packfile into chunks, prefixed with a 4 byte size in hex.

https://gist.github.com/technoweenie/4b6cf6d714e30e090ed0#file-server-go-L147-L174
https://gist.github.com/technoweenie/4b6cf6d714e30e090ed0#file-server-go-L61-L78

That's pretty detailed, and makes it really difficult to calculate the content-length header (not sure if we care though). Ideally the Git Media client downloads the file, and then verifies it matches by checking its SHA-256 signature against the oid in the URL.

Or for that matter, can we just start with a "0" or null byte?

from git-lfs.

btoews commented on May 23, 2024

Looks like the magic numbers for Flash are FWS, CWS, and ZWS. So long as we control the first byte, we're probably safe. Still, I'd rather have something more than just a single byte. I'd feel okay with a static string like git media v1 plus a newline.

from git-lfs.

btoews commented on May 23, 2024

Here's the magic file for Flash (from the file command)

#------------------------------------------------------------------------------
# $File: flash,v 1.10 2014/03/06 16:07:24 christos Exp $
# flash:    file(1) magic for Macromedia Flash file format
#
# See
#
#   http://www.macromedia.com/software/flash/open/
#   http://wwwimages.adobe.com/www.adobe.com/content/dam/Adobe/\
#   en/devnet/swf/pdf/swf-file-format-spec.pdf page 27
#
0   string      FWS     Macromedia Flash data,
>3  byte        x       version %d
!:mime  application/x-shockwave-flash
0   string      CWS     Macromedia Flash data (compressed),
!:mime  application/x-shockwave-flash
>3  byte        x       version %d
0   string      ZWS     Macromedia Flash data (lzma compressed),
!:mime  application/x-shockwave-flash
>3  byte        x       version %d
# From: Cal Peake <[email protected]>
0   string      FLV     Macromedia Flash Video
!:mime  video/x-flv

#
# Yosu Gomez
0       string AGD2\xbe\xb8\xbb\xcd\x00 Macromedia Freehand 7 Document
0       string AGD3\xbe\xb8\xbb\xcc\x00 Macromedia Freehand 8 Document
# From Dave Wilson
0   string AGD4\xbe\xb8\xbb\xcb\x00 Macromedia Freehand 9 Document

from git-lfs.

technoweenie commented on May 23, 2024

@mastahyeti Cool. Having some numerical header means we can change that header without breaking parsers. They just know to read the first 4 bytes, and then the next x bytes, before getting to the actual content.

from git-lfs.

btoews commented on May 23, 2024

👍

from git-lfs.

rubyist commented on May 23, 2024

The custom header is in place for GETing media.

Next, we're intending to do an OPTIONS call before a PUT to avoid sending all the data to alambic only to be denied. Currently, the PUT handler is what does the authentication check (via the sendMeta calls into the API). I think we have two paths we can take here:

Leave PUT just the way it is, and OPTIONS serves as kind of a courtesy method that a client SHOULD do to avoid using unnecessary bandwidth.
Modify PUT so that a client MUST do an OPTIONS first. The OPTIONS handler can pass back a token that alambic tracks in order to prevent a PUT without OPTIONS.

I've started coding against the first option, but moving to the second is easy if there are strong opinions on this. We should just make sure the OPTIONS call doesn't create any metaresources on .com in case the client never sends the PUT.

We're also going on the assumption that we're going to have lb3 haproxy rewrite the .com info/media urls to go directly to alambic rather than having .com being involved with redirects or verification. If anyone has any objections to that, please chime in. /cc @github/ops

from git-lfs.

technoweenie commented on May 23, 2024

I'm 👍 on idea 1. I think we only verify that the authentication works, and that the user has PUSH access to the repository. No meta resources need to be created.

from git-lfs.

rubyist commented on May 23, 2024

We totally did this.

from git-lfs.

technoweenie commented on May 23, 2024

@gregose, @mastahyeti: thanks so much for your help and feedback on this. Let us know if you want to do a security review now that this is all up and running on github.com (behind a feature flag!).

from git-lfs.

Use remote-based endpoint about git-lfs HOT 29 CLOSED

Comments (29)

Challenges

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent