Bong

Github repository
Skip my rambling

Motivation

Essentially I'm annoyed at many languages (I hate them all equally).
C is really damn good for an old language but having to write an additional standard library on top of libc gets annoying really quick.
Rust is a good modern language however interacting with C code manually is pretty bad. The borrow checker, in my opinion, is not a good solution to memory safety.
C++ needs a LOT of linters/rules to be a readable and maintainable language, and it is really ugly
Odin is really cool, one of my favourites right now, but the `go` way of handling errors, where you don't have to check for errors is bad. Again in my opinion.
Zig is really annoying with all the "control" you get on the code, uses LLVM (applies to Rust), which makes the compilation times really slow
So that's why I want to make my own language. No LLVM, and just my personal preferences in a language. I'm going to write it in C, since using a barebones language makes it easier to self-host the language and that's it.

Day 1 - Some book keeping

Commits covered

Before any development can begin, we need to decide on the build system to use.
As for the build system, I'm really drawn to Tsodings nob.h It's really simple (being a single header file), really extensible, cross-platform, and is just a pleasure to use. So I'm just going to go with it.
To build our current project (single main.c) file we just do this in nob.c (our build script):

        
// This is an stb style library therefore this #define includes the implementations of the library
#define NOB_IMPLEMENTATION
#define NOB_STRIP_PREFIX
#include "nob.h"

// We have to take the arguments so that the build system can recompile itself when we change it
int main(int argc, char** argv) {
    NOB_GO_REBUILD_URSELF(argc, argv);
    Cmd c = {0};
    mkdir_if_not_exists("build");
    cmd_append(&c, "cc", "src/main.c", "-o", "build/bongc", "-Wall", "-Wextra");
    if (!cmd_run_sync_and_reset(&c)) return false;
}
        
    
NOTE: I will not document all changes to this script to see them you can go to here
Yes. That's it just build nob.c to an executable and run it. When the source (nob.c) is changed it will automatically rebuild and run the new version of the script. No "configuration" step. Just build and run.
We will need dynamic memory allocations therefore, we will have to implement an arena allocator. Why do I want to have arenas?
  1. It makes lifetime management easier, you can make lots of tiny allocations, and keep that region in cache (since you will hit it so many times).
  2. Freeing trees (such as ASTs) is way easier since you don't have to do a recursive traversal
The entire interface to an arena is this:
        
typedef struct {
    uint8_t* buffer;
    size_t capacity;
    size_t used;
} Arena;

Arena arena_new(size_t size) {
    Arena a = {0};
    a.capacity = size;
    a.buffer = calloc(size, sizeof(uint8_t));
    assert(a.buffer && "Ran out of RAM?");
    return a;
}

void* arena_alloc(Arena* a, size_t size) {
    assert(a->buffer);
    assert(a->used + size < a->capacity);
    void* buf = a->buffer + a->used;
    a->used += size;
    return buf;
}

void arena_free(Arena* a) {
    assert(a->buffer);
    free(a->buffer);
    memset(a, 0, sizeof(Arena));
}
        
    
I just do assertions instead of error handling since if we can out of memory, what can we do?
It's getting late. So this is all I did in day 1, holy productivity...

Day 2 - Take a look at some tokens

Commits covered

Before we can extract the tokens, we have to read the file into memory. Since nob requires some hoops to jump through to use a custom allocator and I don't want to make our allocator global (to be more explicit). I'll implement the famous da_append(...) macro, and accept an arena. Also the String, and StringView primitives to make strings easier in C. To explain them, they are just non-null terminated strings, String is an owned string (like Rusts String) and String_View is a view (&str), it's immutable. To ease reporting errors in the future I'll also define a struct for a source file (String of the source contents, and a name of the file)
When it comes to lexing/tokenizing source code, the most common way to represent the results of this process, is a tagged union.
Because I just want to layout the structure of the lexer, I'm going to implement a single token kind - an unsigned integer.

        
typedef enum {
    TT_NUMBER
} TokenType;

typedef struct {
    TokenType type;
    size_t offset;
    union {
        uint64_t number;
    };
} Token;
        
    
Ok so after some bickering with C, we have our own specified da_push generic macro for dynamic arrays. Time to actually lex code. This is our lexer state
        
typedef struct {
    SourceFile const* source;
    size_t pos;
} Lexer;
        
    
The `SourceFile` keeps track of the actual content and name of the file that we are processing.
Since our current task is so simple we can just skip whitespace (since that is insignificant, looking at python...) This logic is described in this code snippet And just do the lexing, which can be viewed in the repo now :)

Day 3 - Discussing parsing trees, and error reporting

Ok first I just implemented a printer which is not that complicated (Source)
And now we need operators, of which I'll just go with `+` and `-`. Since they are single characters lexing them is really easy (Source).
I hope you appreaciate my web dev skillz (surely)
Time for a parser (weeeeeeeeeeeeeeee). This is the part I hate just because operator precedence feels like magic, even though I understand that. We essentially just put the operators with "higher binding power" lower in the tree. The helping hand is the Crafting interpreters book by Robert Nystrom, if you want to build a language that book is a MUST have. The book is paid so I don't know the legality of providing a snippet of the BNF, however such BNF can be derived by hand (which I did in my prior projects when I didn't know about the previously mentioned book. What I forgot about the lexer is that we need to report errors. I was already keeping track of the bbyte offset of the token within the file, but I wasn't tracking the length and the origin of the token, so I need to fix that up that up real quick.
Ok so C was a bit more of a beast than I expected however I pulled through

Day 4 - Expressions

Time to implement some expression parsing, since Bong will be expression based (no distinction between statements and expressions) Everything: if, while, let statements, will be an expression. I did implement the expressions after a slight hiccup when wrapping my head around the way that data flows during parsing of the binary expressions, however I did implement the most basic thing (+ and -).