Conversation
| @@ -0,0 +1,194 @@ | |||
| # SSH protocol proposal | |||
|
|
|||
| We'd like to implement a protocol for Git LFS that uses SSH protocol | |||
There was a problem hiding this comment.
I think that it was good that you mentioned in your commit message that this proposal is an aspiration, and not a promise that we will implement it, or a requirement of the same to anyone else. Do you think that it would be worthwhile to mention that here, too?
| ## A more usable approach | ||
|
|
||
| Git already has some places we can look for inspiration. Its SSH protocol is | ||
| based on the Git native protocol, which is based on the pkt-line scheme. |
There was a problem hiding this comment.
I don't think that this is related to your proposal at all, but we do have a good implementation of pkt-line in package git, thanks to @larsxschneider's work during the Git 2.11-era push towards filter.lfs.process.
I guess all of that is to say: this makes implementing a pkt-line-based protocol less work for us, since we don't have to first implementation serialization over that protocol.
But, one thing we definitely don't have is an implementation that will provide us convenient data structures over protocol v2 concepts, so I think that the work here lays there. Perhaps pkt-line and this proposed new implementation should reside in a separate repository?
There was a problem hiding this comment.
It's possible. I don't know how many people are going to find pkt-line work in Go super useful; I expect all of the people who are going to be doing that work are implementers of Git-related things, either client or server side. It may be useful to start work and then split things out if we find a logical unit that works on its own.
|
|
||
| To initiate a connection, Git LFS should run the folllowing command: | ||
|
|
||
| $ ssh [{user}@]{server} git-lfs-transfer {path} {operation} |
There was a problem hiding this comment.
Good, this matches with my expectation of what we'd want, since the program that we execute today over SSH is called git-lfs-authenticate. I'm glad that we're keeping the git-lfs- prefix, and the same order of arguments.
docs/proposals/ssh_adapter.md
Outdated
| $ ssh [{user}@]{server} git-lfs-transfer {path} {operation} | ||
|
|
||
| If authentication fails, or some other connection error occurs, errors will be | ||
| read fronm standard error and displayed to the user. The operation may be |
docs/proposals/ssh_adapter.md
Outdated
|
|
||
| If authentication fails, or some other connection error occurs, errors will be | ||
| read fronm standard error and displayed to the user. The operation may be | ||
| `upload`, `download`, or `meta` (for items which are not transfers). |
There was a problem hiding this comment.
Can you explain what meta is?
There was a problem hiding this comment.
Yeah, I had intended to use it for locking and other cases where we're performing operations on the repo that are not strictly related to uploads or downloads, but that's probably not a good name. If anyone has a better idea, I'm all ears.
There was a problem hiding this comment.
Hm, perhaps we could get rid of meta entirely, and specify locking instead, allowing any further extensions to be negotiated during the initial connection?
There was a problem hiding this comment.
I agree we should have an operation for locking instead. It may be useful to have a way to discover capabilities without having to commit to an operation, but I don't feel strongly about it either way.
docs/proposals/ssh_adapter.md
Outdated
| json-message = 1*PKT-LINE(json-data) | ||
| ``` | ||
|
|
||
| It is recommended to specify a `size` argument, which is the number of bytes in |
There was a problem hiding this comment.
I think that this is good. What do you think about making this a requirement so that implementers don't have to decipher between the two?
There was a problem hiding this comment.
Sure. I was thinking since we had the pkt-line encoding of the data that would be like chunked transfer encoding, and the size could be omitted in that case, but I agree that in pretty much every case, we're going to know the size (or, like with JSON, can trivially compute it), so it doesn't make sense to omit it.
docs/proposals/ssh_adapter.md
Outdated
| status-json-response = status-command | ||
| *arguments | ||
| delim-pkt | ||
| json-message |
There was a problem hiding this comment.
Same comment applies here about json-message vs spelling the protocol out in detail over the wire.
docs/proposals/ssh_adapter.md
Outdated
| json-message | ||
| flush-pkt | ||
| status-command = PKT-LINE("status " http-status-code LF) | ||
| http-status-code = 3DIGIT |
There was a problem hiding this comment.
Nice, I really like this. I think that this is an appropriate place for us to make a transition between the wire protocol and the old-style HTTP protocol. I think that this also jives with @PastelMobileSuit's current work, and makes our lfsapi implementation last longer.
There was a problem hiding this comment.
Yeah, I didn't want to try to define what error codes might occur in this situation. The HTTP RFC folks are much more experienced than me in defining potential failure cases. We can always expose a human-readable error in the JSON blob or other data.
| The `http-status-code` portion of the response is an HTTP status code, identical | ||
| to those used if the request is made over HTTP. | ||
|
|
||
| ### Downloads |
There was a problem hiding this comment.
Ah, ignore my previous comments made about unifying the batch protocol into its individual components over the wire. If my understanding is correct, that is exactly what you aim to do here. In this case, what do you think about getting rid of the batch protocol entirely for the wire protocol?
I think that there is certainly an appealing quality about introducing something (even though we know that it will be deprecated eventually) that is designed to ease the burden of implementers. (At this point, I'm requesting 👀 from @git-lfs/implementers.)
But, I'm worried that this is a deferred cost with high interest that allows us to put off the harder change, and makes the hard transition even harder. What do you think?
There was a problem hiding this comment.
As I mentioned above, I think the batch protocol is useful to determine whether we need to upload objects at all. Since we'd like to avoid uploading unneeded objects, we should maintain the batch protocol or specify that the client has to verify objects before uploading them. The latter potentially means a large number of round trips, which is bad for high-latency connections. Maybe that's not very much of a concern for Git LFS, which kind of assumes that a stable, reasonably performant connection is available, but I can imagine people traveling on airplanes being unhappy about that.
I'm fine simply specifying that one should use verify-object if you think that's best. If so, then I'll simply drop the id and token arguments as well.
There was a problem hiding this comment.
I think leaving it as-is is fine, though I'll defer to your judgement in the final case.
|
✨ This looks good to me! I don't have any feedback outside of what @ttaylorr has already raised |
There has been a significant amount of interest in a purely SSH-based protocol for Git LFS. Add a proposal for an SSH-based protocol based on Git's pkt-line and protocol v2, while still mapping cleanly onto HTTP for ease of implementation on the server-side. Document the basic upload and download protocol, as well as locking. Coalesce the lock listing and lock verification endpoints in the JSON API into one that provides both features. Take care that we allow paths and refs to contain arbitrary byte sequences, including spaces, by always providing them in an unambiguous way at the end of a pkt-line.
|
@bk2204 @ttaylorr This seems like a pretty huge deal! Just to confirm, does Git LFS officially work over SSH now? If not, do you have a timeline for what version of Git will include these changes? And even more importantly, do you know when code hosting providers (particularly GitHub, BitBucket, GitLab, and/or Azure DevOps) will support these changes? Thank you both again for your work on this much-needed feature! |
|
Sorry, no, Git LFS doesn't yet work completely over SSH. This is the protocol documentation proposal that outlines how it is supposed to work when implemented. I do plan to spend some time working on the actual implementation in the near future, but it's not earmarked for a particular release at the moment. There is some discussion about doing a 3.0.0 release when these changes finally merge, but no final decisions have been made. |
|
Okay, I didn't look through the Files Changed, I see now that this is just specification stuff. I saw "SSH" and "merged" before and got really excited 😛 Good luck implementing! |
There has been a significant amount of interest in a purely SSH-based protocol for Git LFS. Add a proposal for an SSH-based protocol based on Git's pkt-line and protocol v2, while still mapping cleanly onto HTTP for ease of implementation on the server-side. Implementers will likely have all of this machinery (either now or in the future) available, so this should be fairly easy to implement.
This is merely a proposal, not a commitment to implement. Implementers who prefer to use HTTP may continue to do so. Not specified yet is a technique to discover the SSH-based protocol, although it will probably be overloaded onto
git-lfs-authenticate.Comments and suggestions welcome.
/cc @git-lfs/core
/cc #1044