A lot of PuTTY's code is written in a style that looks structurally rather like an object-oriented language, in spite of PuTTY being a pure C program.
For example, there's a single data type called ssh_hash
, which is an abstraction of a secure hash function, and a bunch of functions called things like ssh_hash_
foo that do things with those data types. But in fact, PuTTY supports many different hash functions, and each one has to provide its own implementation of those functions.
In C++ terms, this is rather like having a single abstract base class, and multiple concrete subclasses of it, each of which fills in all the pure virtual methods in a way that's compatible with the data fields of the subclass. The implementation is more or less the same, as well: in C, we do explicitly in the source code what the C++ compiler will be doing behind the scenes at compile time.
But perhaps a closer analogy in functional terms is the Rust concept of a ‘trait’, or the Java idea of an ‘interface’. C++ supports a multi-level hierarchy of inheritance, whereas PuTTY's system – like traits or interfaces – has only two levels, one describing a generic object of a type (e.g. a hash function) and another describing a specific implementation of that type (e.g. SHA-256).
The PuTTY code base has a standard idiom for doing this in C, as follows.
Firstly, we define two struct
types for our trait. One of them describes a particular kind of implementation of that trait, and it's full of (mostly) function pointers. The other describes a specific instance of an implementation of that trait, and it will contain a pointer to a const
instance of the first type. For example:
typedef struct MyAbstraction MyAbstraction;
typedef struct MyAbstractionVtable MyAbstractionVtable;
struct MyAbstractionVtable {
MyAbstraction *(*new)(const MyAbstractionVtable *vt);
void (*free)(MyAbstraction *);
void (*modify)(MyAbstraction *, unsigned some_parameter);
unsigned (*query)(MyAbstraction *, unsigned some_parameter);
};
struct MyAbstraction {
const MyAbstractionVtable *vt;
};
Here, we imagine that MyAbstraction
might be some kind of object that contains mutable state. The associated vtable structure shows what operations you can perform on a MyAbstraction
: you can create one (dynamically allocated), free one you already have, or call the example methods ‘modify’ (to change the state of the object in some way) and ‘query’ (to return some value derived from the object's current state).
(In most cases, the vtable structure has a name ending in ‘vtable
’. But for historical reasons a lot of the crypto primitives that use this scheme – ciphers, hash functions, public key methods and so on – instead have names ending in ‘alg
’, on the basis that the primitives they implement are often referred to as ‘encryption algorithms’, ‘hash algorithms’ and so forth.)
Now, to define a concrete instance of this trait, you'd define a struct
that contains a MyAbstraction
field, plus any other data it might need:
struct MyImplementation {
unsigned internal_data[16];
SomeOtherType *dynamic_subthing;
MyAbstraction myabs;
};
Next, you'd implement all the necessary methods for that implementation of the trait, in this kind of style:
static MyAbstraction *myimpl_new(const MyAbstractionVtable *vt)
{
MyImplementation *impl = snew(MyImplementation);
memset(impl, 0, sizeof(*impl));
impl->dynamic_subthing = allocate_some_other_type();
impl->myabs.vt = vt;
return &impl->myabs;
}
static void myimpl_free(MyAbstraction *myabs)
{
MyImplementation *impl = container_of(myabs, MyImplementation, myabs);
free_other_type(impl->dynamic_subthing);
sfree(impl);
}
static void myimpl_modify(MyAbstraction *myabs, unsigned param)
{
MyImplementation *impl = container_of(myabs, MyImplementation, myabs);
impl->internal_data[param] += do_something_with(impl->dynamic_subthing);
}
static unsigned myimpl_query(MyAbstraction *myabs, unsigned param)
{
MyImplementation *impl = container_of(myabs, MyImplementation, myabs);
return impl->internal_data[param];
}
Having defined those methods, now we can define a const
instance of the vtable structure containing pointers to them:
const MyAbstractionVtable MyImplementation_vt = {
.new = myimpl_new,
.free = myimpl_free,
.modify = myimpl_modify,
.query = myimpl_query,
};
In principle, this is all you need. Client code can construct a new instance of a particular implementation of MyAbstraction
by digging out the new
method from the vtable and calling it (with the vtable itself as a parameter), which returns a MyAbstraction *
pointer that identifies a newly created instance, in which the vt
field will contain a pointer to the same vtable structure you passed in. And once you have an instance object, say MyAbstraction *myabs
, you can dig out one of the other method pointers from the vtable it points to, and call that, passing the object itself as a parameter.
But in fact, we don't do that, because it looks pretty ugly at all the call sites. Instead, what we generally do in this code base is to write a set of static inline
wrapper functions in the same header file that defined the MyAbstraction
structure types, like this:
static inline MyAbstraction *myabs_new(const MyAbstractionVtable *vt)
{ return vt->new(vt); }
static inline void myabs_free(MyAbstraction *myabs)
{ myabs->vt->free(myabs); }
static inline void myimpl_modify(MyAbstraction *myabs, unsigned param)
{ myabs->vt->modify(myabs, param); }
static inline unsigned myimpl_query(MyAbstraction *myabs, unsigned param)
{ return myabs->vt->query(myabs, param); }
And now call sites can use those reasonably clean-looking wrapper functions, and shouldn't ever have to directly refer to the vt
field inside any myabs
object they're holding. For example, you might write something like this:
MyAbstraction *myabs = myabs_new(&MyImplementation_vtable);
myabs_update(myabs, 10);
unsigned output = myabs_query(myabs, 2);
myabs_free(myabs);
and then all this code can use a different implementation of the same abstraction by just changing which vtable pointer it passed in in the first line.
Some things to note about this system:
MyImplementation
’ contains the abstraction type (‘MyAbstraction
’) as one of its fields. But that field is not necessarily at the start of the structure. So you can't just cast pointers back and forth between the two types. Instead:
MyAbstraction
field. You can see the example new
method above doing this, returning &impl->myabs
. All new
methods do this on return.
MyAbstraction *myabs
parameter has to recover a pointer to the specific implementation type MyImplementation *impl
. The idiom for doing that is to use the ‘container_of
’ macro, also seen in the Linux kernel code. Generally, container_of(p, Type, field)
says: ‘I'm confident that the pointer value ‘p
’ is pointing to the field called ‘field
’ within a larger struct
of type Type
. Please return me the pointer to the containing structure.’ So in this case, we take the ‘myabs
’ pointer passed to the function, and ‘down-cast’ it into a pointer to the larger and more specific structure type MyImplementation
, by adjusting the pointer value based on the offset within that structure of the field called ‘myabs
’.
This system is flexible enough to permit ‘multiple inheritance’, or rather, multiple implementation: having one object type implement more than one trait. For example, the ProxySocket
type implements both the Socket
trait and the Plug
trait that connects to it, because it has to act as an adapter between another instance of each of those types.
It's also perfectly possible to have the same object implement the same trait in two different ways. At the time of writing this I can't think of any case where we actually do this, but a theoretical example might be if you needed to support a trait like Comparable
in two ways that sorted by different criteria. There would be no difficulty doing this in the PuTTY system: simply have your implementation struct
contain two (or more) fields of the same abstraction type. The fields will have different names, which makes it easy to explicitly specify which one you're returning a pointer to during up-casting, or which one you're down-casting from using container_of
. And then both sets of implementation methods can recover a pointer to the same containing structure.
pubkey_bits
method in ssh_keyalg
.
new
method that allocates and returns a new object. You can think of it as a ‘virtual constructor’ – another concept C++ doesn't have. (However, not all types need one of these: see below.)
The effect of all of this is that you can make other pieces of code able to use any instance of one of these types, by passing it an actual vtable as a parameter. For example, the hash_simple
function takes an ssh_hashalg
vtable pointer specifying any hash algorithm you like, and internally, it creates an object of that type, uses it, and frees it. In C++, you'd probably do this using a template, which would mean you had multiple specialisations of hash_simple
– and then it would be much more difficult to decide at run time which one you needed to use. Here, hash_simple
is still just one function, and you can decide as late as you like which vtable to pass to it.
BinaryPacketProtocol
has lots of these.
With a crypto primitive like a hash algorithm, the constructor call looks the same for every implementing type, so it makes sense to have a standardised virtual constructor in the vtable and a ssh_hash_new
wrapper function which can make an instance of whatever vtable you pass it. And then you make all the vtable objects themselves globally visible throughout the source code, so that any module can call (for example) ssh_hash_new(&ssh_sha256)
.
But with other kinds of object, the constructor for each implementing type has to take a different set of parameters. For example, implementations of Socket
are not generally interchangeable at construction time, because constructing different kinds of socket require totally different kinds of address parameter. In that situation, it makes more sense to keep the vtable structure itself private to the implementing source file, and instead, publish an ordinary constructing function that allocates and returns an instance of that particular subtype, taking whatever parameters are appropriate to that subtype.
ssh_compression_alg
contains methods to create, use and free ssh_compressor
and ssh_decompressor
objects, which are not interchangeable – but putting their methods in the same vtable means that it's easy to create a matching pair of objects that are compatible with each other.
new
method is not compulsory: if a given new
implementation is only used by a single vtable, then that function can simply hard-code the vtable pointer that it writes into the object it constructs. But passing the vtable is more flexible, because it allows a single constructor function to be shared between multiple slightly different object types. For example, SHA-384 and SHA-512 share the same new
method and the same implementation data type, because they're very nearly the same hash algorithm – but a couple of the other methods in their vtables are different, because the ‘reset’ function has to set up the initial algorithm state differently, and the ‘digest’ method has to write out a different amount of data.
One practical advantage of having the myabs_
foo family of inline wrapper functions in the header file is that if you change your mind later about whether the vtable needs to be passed to new
, you only have to update the myabs_new
wrapper, and then the existing call sites won't need changing.
sesschan
type that handles the server side of an SSH terminal session will sometimes transform in mid-lifetime into an SCP or SFTP file-transfer channel in this way, at the point where the client sends an ‘exec
’ or ‘subsystem
’ request that indicates that that's what it wants to do with the channel.
This concept would be difficult to arrange in C++. In Rust, it wouldn't even make sense, because in Rust, objects implementing a trait don't even contain a vtable pointer at all – instead, the ‘trait object’ type (identifying a specific instance of some implementation of a given trait) consists of a pair of pointers, one to the object itself and one to the vtable. In that model, the only way you could make an existing object turn into a different trait would be to know where all the pointers to it were stored elsewhere in the program, and persuade all their owners to rewrite them.
ssh_sha256_sw
and ssh_sha256_hw
, each of which has its own data layout and its own implementations of all the methods; and then there's a top-level vtable ssh_sha256
, which only provides the ‘new’ method, and implements it by calling the ‘new’ method on one or other of the subtypes depending on what it finds out about the machine it's running on. That top-level selector vtable is nearly always the one used by client code. (Except for the test suite, which has to instantiate both of the subtypes in order to make sure they both pass the tests.)
As a result, the top-level selector vtable ssh_sha256
doesn't need to implement any method that takes an ssh_cipher *
parameter, because no ssh_cipher
object is ever constructed whose vt
field points to &ssh_sha256
: they all point to one of the other two full implementation vtables.