Github repository
Skip my rambling
Essentially I'm annoyed at many languages (I hate them all equally).
C is really damn good for an old language but having to write an additional
standard library on top of libc gets annoying really quick.
Rust is a good modern language however interacting with C code manually is pretty bad.
The borrow checker, in my opinion, is not a good solution to memory safety.
C++ needs a LOT of linters/rules to be a readable and maintainable language, and it is really ugly
Odin is really cool, one of my favourites right now, but the `go` way of handling errors, where you don't have
to check for errors is bad. Again in my opinion.
Zig is really annoying with all the "control" you get on the code, uses LLVM (applies to Rust), which makes
the compilation times really slow
So that's why I want to make my own language. No LLVM, and just my personal preferences in a language.
I'm going to write it in C, since using a barebones language makes it easier to self-host the language and
that's it.
Before any development can begin, we need to decide on the build system to use.
As for the build system, I'm really drawn to Tsodings nob.h
It's really simple (being a single header file), really extensible, cross-platform, and is just a pleasure to
use.
So I'm just going to go with it.
To build our current project (single main.c) file we just do this in nob.c (our build script):
// This is an stb style library therefore this #define includes the implementations of the library
#define NOB_IMPLEMENTATION
#define NOB_STRIP_PREFIX
#include "nob.h"
// We have to take the arguments so that the build system can recompile itself when we change it
int main(int argc, char** argv) {
NOB_GO_REBUILD_URSELF(argc, argv);
Cmd c = {0};
mkdir_if_not_exists("build");
cmd_append(&c, "cc", "src/main.c", "-o", "build/bongc", "-Wall", "-Wextra");
if (!cmd_run_sync_and_reset(&c)) return false;
}
NOTE: I will not document all changes to this script to see them you can go to here
typedef struct {
uint8_t* buffer;
size_t capacity;
size_t used;
} Arena;
Arena arena_new(size_t size) {
Arena a = {0};
a.capacity = size;
a.buffer = calloc(size, sizeof(uint8_t));
assert(a.buffer && "Ran out of RAM?");
return a;
}
void* arena_alloc(Arena* a, size_t size) {
assert(a->buffer);
assert(a->used + size < a->capacity);
void* buf = a->buffer + a->used;
a->used += size;
return buf;
}
void arena_free(Arena* a) {
assert(a->buffer);
free(a->buffer);
memset(a, 0, sizeof(Arena));
}
I just do assertions instead of error handling since if we can out of memory, what can we do?
Before we can extract the tokens, we have to read the file into memory.
Since nob requires some hoops to jump through to use a custom allocator and I don't want to make
our allocator global (to be more explicit). I'll implement the famous da_append(...) macro, and accept an arena.
Also the String, and StringView primitives to make strings easier in C. To explain them, they are just non-null terminated strings,
String is an owned string (like Rusts String) and String_View is a view (&str), it's immutable.
To ease reporting errors in the future I'll also define a struct for a source file (String of the source contents, and a name of the file)
When it comes to lexing/tokenizing source code, the most common way to represent the results
of this process, is a tagged union.
Because I just want to layout the structure of the lexer, I'm going to implement a single token kind - an unsigned integer.
typedef enum {
TT_NUMBER
} TokenType;
typedef struct {
TokenType type;
size_t offset;
union {
uint64_t number;
};
} Token;
Ok so after some bickering with C, we have our own specified da_push generic macro for dynamic arrays.
Time to actually lex code.
This is our lexer state
typedef struct {
SourceFile const* source;
size_t pos;
} Lexer;
The `SourceFile` keeps track of the actual content and name of the file that we are processing.