The following requirements informed the specification of this protocol.
Automate keyboard-interactive authentication. We're motivated in the first place by the observation that the general SSH userauth method ‘keyboard-interactive
’ (defined in [RFC4256]) can be used for many kinds of challenge/response or one-time-password styles of authentication, and in more than one of those, the necessary responses might be obtained from an auxiliary network connection, such as an HTTPS transaction. So it's useful if a user doesn't have to manually copy-type or copy-paste from their web browser into their SSH client, but instead, the process can be automated.
Be able to pass prompts on to the user. On the other hand, some userauth methods can be only partially automated; some of the server's prompts might still require human input. Also, the plugin automating the authentication might need to ask its own questions that are not provided by the SSH server. (For example, ‘please enter the master key that the real response will be generated by hashing’.) So after the plugin intercepts the server's questions, it needs to be able to ask its own questions of the user, which may or may not be the same questions sent by the server.
Allow automatic generation of the username. Sometimes, the authentication method comes with a mechanism for discovering the username to be used in the SSH login. So the plugin has to start up early enough that the client hasn't committed to a username yet.
Future expansion route to other SSH userauth flavours. The initial motivation for this protocol is specific to keyboard-interactive. But other SSH authentication methods exist, and they may also benefit from automation in future. We're making no attempt here to predict what those methods might be or how they might be automated, but we do need to leave a space where they can be slotted in later if necessary.
Minimal information loss. Keyboard-interactive prompts and replies should be passed to and from the plugin in a form as close as possible to the way they look on the wire in SSH itself. Therefore, the protocol resembles SSH in its data formats and marshalling (instead of, for example, translating from SSH binary packet style to another well-known format such as JSON, which would introduce edge cases in character encoding).
Half-duplex. Simultaneously trying to read one I/O stream and write another adds a lot of complexity to software. It becomes necessary to have an organised event loop containing select
or WaitForMultipleObjects
or similar, which can invoke the handler for whichever event happens soonest. There's no need to add that complexity in an application like this, which isn't transferring large amounts of bulk data or multiplexing unrelated activities. So, to keep life simple for plugin authors, we set the ground rule that it must always be 100% clear which side is supposed to be sending a message next. That way, the plugin can be written as sequential code progressing through the protocol, making simple read and write calls to receive or send each message.
Communicate success/failure, to facilitate caching in the plugin. A plugin might want to cache recently used data for next time, but only in the case where authentication using that data was actually successful. So the client has to tell the plugin what the outcome was, if it's known. (But this is best-effort only. Obviously the plugin cannot depend on hearing the answer, because any IPC protocol at all carries the risk that the other end might crash or be killed by things outside its control.)