programming linux c zig
This article is mainly intended to document my own learning so I can reference the details in the future when I forget them. I am not a POSIX expert.
If you want the tl;dr spoiler to the title: Some example code that I first found on the web when I looked at how to handle socket control messages is incorrect because it doesn't take into account the possibility of denial of service via file descriptor table pollution. Not blindly using example code you don't understand is generally a good rule of thumb. You can skip to the section on leaking file descriptors to just read about this if you wish.
I have very recently learnt Zig. It is a fun little language that fits in your head. It feels like a better C. I was looking at reimplementing the Wayland protocol in Zig. This triggered me looking into the how the Wayland protocol is implemented, and more relevantly, how the Wayland protocol uses local Unix sockets for communication.
Some of the messsages that the Wayland protocol specify requires passing file descriptors between client and server.
Wayland achieves this via sending and receiving control messages over the socket,
especifically the SCM_RIGHTS
ancillary message.
You cannot serialize file descriptors as integers then send them over a socket because this is nonsensical: file descriptors are process-unique identifiers; they are handles that reference your process's file descriptor table. In order to transfer an entry from your file descriptor table into another process's, you require the intervention of the kernel. Thus, the kernel has to interpret cmsgs and do the appropriate transfer and translation of the fds.
As I had never dealt with cmsgs before, there were a few questions that immediately came to mind and caused me to stall here until I was satisfied:
Let us answer the third question first — How are cmsgs represented?
This is pretty easy to answer by looking at the Linux source code.
struct user_msghdr {
void __user *msg_name; /* ptr to socket address structure */
int msg_namelen; /* size of socket address structure */
struct iovec __user *msg_iov; /* scatter/gather array */
__kernel_size_t msg_iovlen; /* # elements in msg_iov */
void __user *msg_control; /* ancillary data */
__kernel_size_t msg_controllen; /* ancillary data buffer length */
unsigned int msg_flags; /* flags on received message */
};
User programs send and receive control messages via
sendmsg
and
recvmsg
.
These functions take a pointer to a struct msghdr
.
Control messages are also known as ancillary data.
This struct
allows one to specify a buffer containing the control messages using msg_control
and msg_controllen
.
The associated control messages are concatencated together to fill this buffer.
struct cmsghdr {
__kernel_size_t cmsg_len; /* data byte count, including hdr */
int cmsg_level; /* originating protocol */
int cmsg_type; /* protocol-specific type */
};
Each control message has a standard header which specifies the size of the message and the type of message. The payload of the control message is appears immediately after the header.
#define CMSG_ALIGN(len) ( ((len)+sizeof(long)-1) & ~(sizeof(long)-1) )
#define CMSG_DATA(cmsg) \
((void *)(cmsg) + sizeof(struct cmsghdr))
#define CMSG_USER_DATA(cmsg) \
((void __user *)(cmsg) + sizeof(struct cmsghdr))
#define CMSG_SPACE(len) (sizeof(struct cmsghdr) + CMSG_ALIGN(len))
#define CMSG_LEN(len) (sizeof(struct cmsghdr) + (len))
Each control message is padded to the alignment of a long
(see CMSG_SPACE
).
struct cmsghdr *cmsg;
for (cmsg = CMSG_FIRSTHDR(msg); cmsg; cmsg = CMSG_NXTHDR(msg, cmsg)) {
if (!CMSG_OK(msg, cmsg)) goto error;
/* ... process cmsg ... */
}
The macros CMSG_FIRSTHDR
, CMSG_NXTHDR
and CMSG_OK
are provided to iterate over cmsgs.
These macros walk over the cmsgs by incrementing by cmsg_len
.
When sending a control message,
we fill a buffer up with cmsgs each of size CMSG_SPACE(len)
and give this to msghdr::msg_control
which is then sent off by passing it to sendmsg
.
When receiving a control message,
we provide recvmsg
with a buffer via msghdr::msg_control
which is at least of the appropriate size to receive the cmsgs we expect.
The fact we were dealing with a stream stumped me the most. How do we associate these effectively out-of-band control messages with specific bytes in the data stream? I wanted to be able to write straightforward code that would be able to handle data at the same time as its associated control messages. Were there any guarantees that help with this?
It turns out that Linux's behavior exposes the underlying packet
-based reality somewhat.
The control message is attached to the data bytes that were sent in the same sendmsg
call.
However, such packets
may split by the kernel, for example, if one is doing multiple reads that are smaller than the buffer originally sent.
When this happens, the control message will be associated with the
first split packet
of the range.
Interestingly, if a range of bytes contains a control message, Linux will stop a read early. This occurs whenever your remote sends a control message, even if you are not expecting to receive one. This means you will should never have to read more than one batch of control messages at once.
Unfortunately, Wayland does not make use of the above guarantee at all. Wayland specifies that control messages can turn up at any point during the data stream. To quote the Wayland protocol documentation: (emphasis mine)
The protocol does not specify the exact position of the ancillary data in the stream, except that the order of file descriptors is the same as the order of messages and fd arguments within messages on the wire.
In particular, it means that any byte of the stream, even the message header, may carry the ancillary data with file descriptors.
Clients and compositors should queue incoming data until they have whole messages to process, as file descriptors may arrive earlier or later than the corresponding data bytes.
If you look at how the wire protocol is implemented in libwayland
,
there is indeed no guarantee that the file descriptors are sent with the associated data bytes.
This means (for Wayland at least) we are forced to buffer data and fds until whole messages are available to parse.
Yay.
One final question to ask is if there are any pitfalls to watch out for when passing file descriptors over Unix sockets?
Here we'll mainly just focus on a specific UNIX socket control message, specifically the SCM_RIGHTS
message.
There are several pitfalls to be aware of when sending SCM_RIGHTS
messages:
If you do not provide a sufficiently large buffer to receive control messages, the list of file descriptors will be truncated.
This means that you may receive fewer number of file descriptors than you expect; these file descriptors are lost to you.
Linux closes any such truncated file descriptors for us so these won't pollute your fd table.
Linux also specifies an upper limit to file descriptors you can send at once (SCM_MAX_FD
),
so if you want to guarantee you receive all file descriptors, you can provide a sufficiently large buffer.
For Wayland specifically, if you wish to ensure compatibility with libwayland
,
this means you need to be able to receive at least
28 file descriptors
since that is the maximum number it will send at once.
This is perhaps an application of Postel's law.
I would also recommend sending much fewer than this at once to ensure compatibility with other implementations.
A random snippet of example code that I found on the internet follows (one of the first hits on Google):
static int do_recvmsg(int sock) {
struct msghdr msg;
struct cmsghdr *cmsghdr;
struct iovec iov[1];
FILE *fp;
ssize_t nbytes;
int i, *p;
char buf[CMSG_SPACE(sizeof(int))], c;
iov[0].iov_base = &c;
iov[0].iov_len = sizeof(c);
memset(buf, 0x0d, sizeof(buf));
cmsghdr = (struct cmsghdr *)buf;
cmsghdr->cmsg_len = CMSG_LEN(sizeof(int));
cmsghdr->cmsg_level = SOL_SOCKET;
cmsghdr->cmsg_type = SCM_RIGHTS;
msg.msg_name = NULL;
msg.msg_namelen = 0;
msg.msg_iov = iov;
msg.msg_iovlen = sizeof(iov) / sizeof(iov[0]);
msg.msg_control = cmsghdr;
msg.msg_controllen = CMSG_LEN(sizeof(int));
msg.msg_flags = 0;
nbytes = recvmsg(sock, &msg, 0);
if (nbytes == -1)
return (1);
p = (int *)CMSG_DATA(buf);
printf("recvmsg: %d\n", *p);
fp = fdopen(*p, "w");
fprintf(fp, "OK\n");
fclose(fp);
return (0);
}
This code allocates CMSG_LEN(sizeof(int))
space for a control message, expecting to receive a single int
.
It then assumes that when it recieves a message, it has recieved exactly one file descriptor.
There are multiple issues with this code.
The first and most obvious error that this code does is fail to check for the presence of control messages at all. It:
cmsghdr
into the buffer (which will be overwritten by the recvmsg
call)
msg_controllen
)
cmsghdr
that may be present in the buffer
The second error is more subtle: It only expects to receive one file descriptor.
This a mistake.
Recall that the CMSG_SPACE
aligns the length with long
.
On a 64-bit system, this means you can fit two file descriptors into that buffer.
If you expect only one, you will leak the other file descriptor.
A misbehaving remote can exhaust your file descriptor table in this manner.
This error is unfortunately common. You can see examples of this, on
Stack
Overflow,
on GitHub,
and on blogposts.
(The last example is subtle: The example code does check for the exactly one file descriptor,
would end up leaking two file descriptors if the remote sends two since it close
s neither.)
What you should do in this instance is to read cmsghdr::cmsg_len
and calculate how many file descriptors you have received.
You should also iterate over the buffer to deal with all present cmsgs.
This was a fun detour into an area of Unix I was previously unfamiliar with. I hope you learnt something.