~yosh@unix.dog

some C habits I employ for the modern day

posted

modified

despite it being the first “proper” programming language I learned–by reading K&R front-to-back no less–I don’t write C too terribly often nowadays. playing resonite has gotten me into writing a load of C# for modding the game, and most of what I do day-to-day is automating the tedium on my computer, which gets delegated to shell or python because of all the existing infrastructure.

alas, every now and then, something arises where I have to or just want to write some C (or C++). sometimes it’s when I need to make some bindings for a library; sometimes it’s to fill a niche of a language/architecture gap. it also remains as my favorite language to prototype stuff in, though I’m not quite sure why.

in any case, C is an interesting language without much standardization on the whole “style” or “practices” part. most other languages have very clear “this is the best way to use X” messages, either subtly embedded in the syntax itself or through “official” documentation channels. C doesn’t have an official documentation channel, nor does it have syntax or standard library constructs that encourage one particular way of doing things. from this, there’s a bunch of inconsistencies in how people do things, and–especially in the early days of the language and standard library–the landscape and general practice is quite error prone. as such, I’ve developed my own habits when writing C, usually picked up from blog posts, writing C# or rust, or just out of perfectionist brain.

I’m not saying you should write stuff this way, nor am I claiming it is the best way to write C all the time. I break some of these practices when working with embedded systems or when I’m writing things to be as fast as they can possibly be. but it is the baseline I tend to start with for most projects, and if I don’t write it down, I’ll never be consistent with it.

the basics

I usually use C23 for my new C projects. when contributing to other projects, I of course use their revision, but C23 enables a fair amount of the things possible in this post, so it’s what I stick with for projects that aren’t trying to target absolute maximum portability or embedded architectures (i.e. I’d only care about GCC, clang, and maybe MSVC). almost every platform, including any POSIX-compliant one, has CHAR_BIT set to 8, so I like to make it explicitly clear that this is what the project is for by putting this in there:

#if CHAR_BIT != 8
	#error "CHAR_BIT != 8"
#endif

something very small that I liked a lot when learning rust was its short way of referring to fixed-length types. combine that with chris wellon’s other typedefs and I end up having all these typedefs in my projects:

typedef uint8_t   u8;
typedef int8_t    i8;
typedef int16_t   i16;
typedef uint16_t  u16;
typedef int32_t   i32;
typedef uint32_t  u32;
typedef uint64_t  u64;
typedef float     f32;
typedef double    f64;
typedef uintptr_t uptr;
typedef ptrdiff_t isize;
typedef size_t    usize;

you may notice that the byte and b32 from wellon’s post are missing here. as said before, when employing this style, I don’t care for systems where char isn’t 8 bit, so the distinction between u8 and char doesn’t mean anything to me here. additionally, the intent of whether the buffer is used as “raw” memory chunks versus a meaningful u8 is pretty clear from the code that it gets used in, so I’m not worried about confusing intent with it.

b32 is missing because I just use the C23 bool type. if I’m working with >=C99, I use stdbool.h and bool. its semantics are familiar to me already, and it just feels more right to use for, well, booleans.

I’ve long been employing the length+data string struct. if there was one thing I could go back and time to change about the C language, it would be removal of the null-terminated string. alas, we do not live in the perfect world, so I work with this for now:

typedef struct {
	// includes null terminator (for bad functions)
	u8 *data;
	// excludes null terminator (for copying memory)
	isize len;
} String;

along with some functions to initialize from an existing cstr or buffer, copy the string safely, etc.

parsing, not validating

I think one of the most eye-opening blog posts I read when getting into programming initially was the evergreen parse, don’t validate post. if you haven’t read it yourself, I highly encourage you to do so. the TL;DR of it is to work with your language’s type system to the fullest extent, making function signatures take very strict types that can only really be created through trusted interfaces that you provide. in doing so, you get to have an incredibly robust API with very clear, usually compile-time indications when something goes wrong.

I was thinking about how to apply this sort of philosophy to C for a bit–I knew that the most flexible and strong part of C’s comparatively weak type system is the humble struct–but I didn’t know how to go about the whole “can only really be created through trusted interfaces” bit and promptly put it out of my mind when working with C. a few years later, I saw someone link it again on the web, and curiously looked up to see if anyone else tried applying it to C. this took me to lelanthran’s blog post that talks about it, where I learned that you can in fact create arbitrary opaque types and thus properly employ this style.

I was going to embed an in-progress rewrite I’m doing of blsm that employs this principle, but I accidentally overwrote most of the code via a mishap with syncthing. oops. so, you’ll just have to trust me when I say that I actually do this nowadays for non-playground projects :)

tuples!

one of the more exciting changes of C23 is the explicit standardization that tagged types (struct, union, enum) with the same name and contents are completely compatible with one another. oftentimes, I wish to return multiple values from a function, but feel as if there’s not much semantic need to assign explicit names to the structure or the members within. enter the humble tuple:

#define Tuple2(T1, T2)           \
	struct Tuple2_##T1##_##T2 {  \
		T1 a;                    \
		T2 b;                    \
	}

others may use different names for the members–I’ve seen _0, _1, etc. to roughly mirror rust’s semantics, but I find that a little awkward. the english alphabet is more than familiar to me, so it’s what I’ve stuck to.

unfortunately, as one might have noticed, this cool feature gets severely limited by the fact that this does not apply to anonymous tagged types (I’d personally consider “untagged” tagged types with the same content to be compatible, but I guess WG13 disagrees). as such, every instance of this feature requires an actual name to be bound to the type. this means that pointers will throw a wrench into it:

$ cc kmp.c
In file included from kmp.c:5:
kmp.c: In function 'main':
kmp.c:29:20: error: pasting "*" and "_" does not give a valid preprocessing token
   29 |         Tuple2(char*, int) buffer;
      |                    ^
../common/tuple.h:5:25: note: in definition of macro 'Tuple2'
    5 |         struct Tuple2_##T1##_##T2 {  \
      |                         ^~

this can be worked around by either making a typedef for the pointer or by having the user provide an explicit name for the struct. neither are exactly the most ergonomic, so I don’t find myself using this feature as much as I wish.

in any case, for tuples specifically, I’m usually not storing a pointer in them. when it gets to that point, I probably want an actual struct with semantic meaning instead.

returning results

with the idea of trying to encode as much as possible in the type system itself, I’m a pretty big fan of sum types when it makes sense for the language to have them. naturally, I try to employ using them in C.

in the absence of proper language support, “sum types” are just structs with discipline. these structs will usually be used for public-facing apis and/or have semantic meaning in their own right, so I’m more comfortable making them a typedef as opposed to trying to fuddle in a “result” type into this whole equation. maybe if C ever allows anonymous tagged types to be compatible will this change, but for now it’s what I have to work with.

the structures generally follow this pattern:

typedef enum {
	...
} ErrorCode;

typedef struct {
	char *val;
} SafeBuffer;

typedef struct {
	bool ok;
	union {
		SafeBuffer *val;
		ErrorCode err;
	};
} MaybeBuffer;

along with functions to handle printing error messages/etc. for the various error codes that can be returned. the idea is for the caller to always check the ok upon return. you could probably add on a “match” function to specifically handle this without the ability to actually introspect on the struct like before, but that’s incredibly verbose and poor ergonomically in my opinion–this is all compensation for the lack of proper sum types anyway.

when these result types are combined with the previous “parse, don’t validate” approach, I find that I don’t actually dread handling errors in C. if I encounter a function that I know “parses” something, I know it’s going to return a Maybe struct and check ok. if I’m implementing a function that uses one of those SafeBuffer types, I know that there are certain invariants that I can assume to be true, since that type could have only been created through the parsing function.

it sacrifies a bit of verbosity (since C lacks all the nice functional syntax sugar that makes this concise) for much nicer flowing (and safer!) programs

on dynamic memory management

I don’t personally do things that require dynamic memory management in C often, so I don’t have many practices for it. I know that wellons & co. have been really liking the arena, and I’d probably like it too if I actually used the heap often. but I don’t, so I have nothing to say.

if I find myself needing a bunch of dynamic memory allocations and lifetime management, I will simply start using another language–usually rust or C#. but I also don’t find myself programming that kind of stuff often in general, so it’s really just something I need to explore more as a whole.

anything else?

A few smaller things:

there isn’t much else that I really feel the need to mention. I check the documentation for any external functions I use–because there’s always a wrench that can hide in man–and try to check any nontrivial accesses or funky stuff that might need to be done. maybe I’ll try to go about making a sort of “slice” type to make that stuff safer, but for now, nothing’s bitten me just yet.

I hope this can get anyone to try thinking about their own C style. I love this language a lot of times yet hate it many other times. at least the limitations can make for some fun problems to solve and safeguards to create :)

back