Skip to content

RFC: SSH protocol#3290

Merged
bk2204 merged 1 commit intogit-lfs:masterfrom
bk2204:ssh-protocol
Oct 10, 2018
Merged

RFC: SSH protocol#3290
bk2204 merged 1 commit intogit-lfs:masterfrom
bk2204:ssh-protocol

Conversation

@bk2204
Copy link
Copy Markdown
Member

@bk2204 bk2204 commented Sep 27, 2018

There has been a significant amount of interest in a purely SSH-based protocol for Git LFS. Add a proposal for an SSH-based protocol based on Git's pkt-line and protocol v2, while still mapping cleanly onto HTTP for ease of implementation on the server-side. Implementers will likely have all of this machinery (either now or in the future) available, so this should be fairly easy to implement.

This is merely a proposal, not a commitment to implement. Implementers who prefer to use HTTP may continue to do so. Not specified yet is a technique to discover the SSH-based protocol, although it will probably be overloaded onto git-lfs-authenticate.

Comments and suggestions welcome.

/cc @git-lfs/core
/cc #1044

Copy link
Copy Markdown
Contributor

@ttaylorr ttaylorr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, @bk2204. I have a few comments to get the discussion going below:

@@ -0,0 +1,194 @@
# SSH protocol proposal

We'd like to implement a protocol for Git LFS that uses SSH protocol
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that it was good that you mentioned in your commit message that this proposal is an aspiration, and not a promise that we will implement it, or a requirement of the same to anyone else. Do you think that it would be worthwhile to mention that here, too?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can do that.

## A more usable approach

Git already has some places we can look for inspiration. Its SSH protocol is
based on the Git native protocol, which is based on the pkt-line scheme.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that this is related to your proposal at all, but we do have a good implementation of pkt-line in package git, thanks to @larsxschneider's work during the Git 2.11-era push towards filter.lfs.process.

I guess all of that is to say: this makes implementing a pkt-line-based protocol less work for us, since we don't have to first implementation serialization over that protocol.

But, one thing we definitely don't have is an implementation that will provide us convenient data structures over protocol v2 concepts, so I think that the work here lays there. Perhaps pkt-line and this proposed new implementation should reside in a separate repository?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible. I don't know how many people are going to find pkt-line work in Go super useful; I expect all of the people who are going to be doing that work are implementers of Git-related things, either client or server side. It may be useful to start work and then split things out if we find a logical unit that works on its own.


To initiate a connection, Git LFS should run the folllowing command:

$ ssh [{user}@]{server} git-lfs-transfer {path} {operation}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good, this matches with my expectation of what we'd want, since the program that we execute today over SSH is called git-lfs-authenticate. I'm glad that we're keeping the git-lfs- prefix, and the same order of arguments.

$ ssh [{user}@]{server} git-lfs-transfer {path} {operation}

If authentication fails, or some other connection error occurs, errors will be
read fronm standard error and displayed to the user. The operation may be
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: fronm.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.


If authentication fails, or some other connection error occurs, errors will be
read fronm standard error and displayed to the user. The operation may be
`upload`, `download`, or `meta` (for items which are not transfers).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain what meta is?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I had intended to use it for locking and other cases where we're performing operations on the repo that are not strictly related to uploads or downloads, but that's probably not a good name. If anyone has a better idea, I'm all ears.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, perhaps we could get rid of meta entirely, and specify locking instead, allowing any further extensions to be negotiated during the initial connection?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we should have an operation for locking instead. It may be useful to have a way to discover capabilities without having to commit to an operation, but I don't feel strongly about it either way.

json-message = 1*PKT-LINE(json-data)
```

It is recommended to specify a `size` argument, which is the number of bytes in
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this is good. What do you think about making this a requirement so that implementers don't have to decipher between the two?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I was thinking since we had the pkt-line encoding of the data that would be like chunked transfer encoding, and the size could be omitted in that case, but I agree that in pretty much every case, we're going to know the size (or, like with JSON, can trivially compute it), so it doesn't make sense to omit it.

status-json-response = status-command
*arguments
delim-pkt
json-message
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment applies here about json-message vs spelling the protocol out in detail over the wire.

json-message
flush-pkt
status-command = PKT-LINE("status " http-status-code LF)
http-status-code = 3DIGIT
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I really like this. I think that this is an appropriate place for us to make a transition between the wire protocol and the old-style HTTP protocol. I think that this also jives with @PastelMobileSuit's current work, and makes our lfsapi implementation last longer.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I didn't want to try to define what error codes might occur in this situation. The HTTP RFC folks are much more experienced than me in defining potential failure cases. We can always expose a human-readable error in the JSON blob or other data.

The `http-status-code` portion of the response is an HTTP status code, identical
to those used if the request is made over HTTP.

### Downloads
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, ignore my previous comments made about unifying the batch protocol into its individual components over the wire. If my understanding is correct, that is exactly what you aim to do here. In this case, what do you think about getting rid of the batch protocol entirely for the wire protocol?

I think that there is certainly an appealing quality about introducing something (even though we know that it will be deprecated eventually) that is designed to ease the burden of implementers. (At this point, I'm requesting 👀 from @git-lfs/implementers.)

But, I'm worried that this is a deferred cost with high interest that allows us to put off the harder change, and makes the hard transition even harder. What do you think?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned above, I think the batch protocol is useful to determine whether we need to upload objects at all. Since we'd like to avoid uploading unneeded objects, we should maintain the batch protocol or specify that the client has to verify objects before uploading them. The latter potentially means a large number of round trips, which is bad for high-latency connections. Maybe that's not very much of a concern for Git LFS, which kind of assumes that a stable, reasonably performant connection is available, but I can imagine people traveling on airplanes being unhappy about that.

I'm fine simply specifying that one should use verify-object if you think that's best. If so, then I'll simply drop the id and token arguments as well.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think leaving it as-is is fine, though I'll defer to your judgement in the final case.

@PastelMobileSuit
Copy link
Copy Markdown
Contributor

PastelMobileSuit commented Sep 28, 2018

✨ This looks good to me! I don't have any feedback outside of what @ttaylorr has already raised

There has been a significant amount of interest in a purely SSH-based
protocol for Git LFS.  Add a proposal for an SSH-based protocol based on
Git's pkt-line and protocol v2, while still mapping cleanly onto HTTP
for ease of implementation on the server-side.

Document the basic upload and download protocol, as well as locking.
Coalesce the lock listing and lock verification endpoints in the JSON
API into one that provides both features.

Take care that we allow paths and refs to contain arbitrary byte
sequences, including spaces, by always providing them in an unambiguous
way at the end of a pkt-line.
Copy link
Copy Markdown
Contributor

@ttaylorr ttaylorr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

@bk2204 bk2204 merged commit 45c4568 into git-lfs:master Oct 10, 2018
@bk2204 bk2204 deleted the ssh-protocol branch October 10, 2018 14:52
@Rabadash8820
Copy link
Copy Markdown

Rabadash8820 commented Nov 25, 2018

@bk2204 @ttaylorr This seems like a pretty huge deal! Just to confirm, does Git LFS officially work over SSH now? If not, do you have a timeline for what version of Git will include these changes? And even more importantly, do you know when code hosting providers (particularly GitHub, BitBucket, GitLab, and/or Azure DevOps) will support these changes?

Thank you both again for your work on this much-needed feature!

@bk2204
Copy link
Copy Markdown
Member Author

bk2204 commented Nov 26, 2018

Sorry, no, Git LFS doesn't yet work completely over SSH. This is the protocol documentation proposal that outlines how it is supposed to work when implemented. I do plan to spend some time working on the actual implementation in the near future, but it's not earmarked for a particular release at the moment.

There is some discussion about doing a 3.0.0 release when these changes finally merge, but no final decisions have been made.

@ghost
Copy link
Copy Markdown

ghost commented Nov 26, 2018

Okay, I didn't look through the Files Changed, I see now that this is just specification stuff. I saw "SSH" and "merged" before and got really excited 😛 Good luck implementing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants