Recreating an old Visualizer I swear I saw

I have vague memories of a visualizer that I will likely never encounter without looking for it again. It was a waveform display but it had these rectangles that would ride them like boats, not really like surfing. Now, you might be wondering why I'm doing this when my library creates, at best, video.

Well, I'm not even going to recreate that much of it, doing the discrete waveform handling is always a massive pain, so instead we'll cheese it with the combined magic of img-genner, and the magic of limits.

(require :img-genner)
#|------------------------------------------------------------------------------
 | The idea here is that the rectangles ripple along a sine wave, rotating with
 | it as it moves.
 |
 | To determine the angle the derivative of the wave must be known at the
 | center of the rectangles, and that must be turned into an angle
 |----------------------------------------------------------------------------|#

(defvar *rectangles* (loop for i from 0 below 24
                           collect(make-instance 'img-genner:rectangle
                                                 :width 10 :height 10
                                                 :origin
                                                 (img-genner:point (* 30.0 i) 240.0))))

With that in mind we can implement our derivation code, and the slope to angle conversion function. Normally, I wouldn't recommend using defconstant unless you're really sure that the constant won't change.

(defconstant +epsilon+ 0.0001)
(defun derive-at-point(function x)
  (/ (-
      (funcall function (+ x +epsilon+))
      (funcall function x))
     +epsilon+))

(defun slope-to-angle(rise-over-run)
  (atan rise-over-run))

Then we have to write the wave and coordinate conversion functions. We need the coordinate conversions because trigonometric functions aren't very great over the 640.

(defun wave-1(x)
  (+ (sin (* 2.0 x))
     (sin (/ x 3))
     (sin x)))

(defun xcoord(x time)
  (+ (/ x 10.0) time))

Then we need to be able to update each rectangle for a given time and with a given wave function.

#|
 | Move the rectangles to match the wave's height at that point
 |#
(defun match-wave-height(func time rectangle)
  (let* ((origin (slot-value rectangle 'img-genner:origin))
         (x (xcoord (aref origin 0) time)))
    (setf (aref origin 1) (+ 240 (* (funcall func x) 10)))
    )
  )
#|
 | This does the same for the angles
 |#
(defun match-wave-angle(func time rectangle)
  (let* ((origin (slot-value rectangle 'img-genner:origin))
         (x (xcoord (aref origin 0) time))
         (dy (derive-at-point func x)))
    (setf (slot-value rectangle 'img-genner:rotation) (slope-to-angle dy))
    )
  )
; And this runs both
(defun update-rectangles(func time)
  (loop for i in *rectangles*
        do(match-wave-height func time i)
        do(match-wave-angle func time i))
  )

Then we need to handle the streams and the drawing of it all.

(defparameter *image* (img-genner:make-image 640 480))
(defun draw-rectangles()
  (loop for i in *rectangles*
        do(img-genner:fill-shape i *image* (img-genner:static-color-stroker (img-genner:rgb 255 0 0)))))

(defun draw-wave(func time)
  (loop for ox from 0 below 640
        for x = (+ time (/ ox 10.0))
        for y = (- 480 (+ 240 (* 10 (funcall func x))))
                ; This is a cheap way to do a thick line poorly
                do(loop for off from 0 below 2
                        do(setf (aref *image* (floor (+ y off)) ox 2) 255))
        ))

(loop for time from 0.0 by (/ 1.0 25.0)
      for frame from 0
      repeat 1000
      do(update-rectangles #'wave-1 time)
      do(reset-image *image*)
      do(draw-rectangles)
      do(draw-wave #'wave-1 time)
      do(img-genner:save-image *image* (uiop:process-info-input *ffmpeg-writer*))
      do(print frame)
      )

(uiop:close-streams *ffmpeg-writer*)
(uiop:wait-process *ffmpeg-writer*)

And with that we get the above. There's a lot of room for improvement

img-genner Switching away from cl-png

cl-png is a long lived package that mostly works, but it contains a single flaw, it relies on the presence of a library and the competence of the implementation in getting to it.

It grabs the source and tries to build it. That's one reason that I don't much care for it. As much as I don't mind C, I'd rather not force common lisp programmers to interact with that whole can of worms, and if that is its behavior in windows, then there may simply not be a suitable C compiler.

In the place of cl-png will slide pngload and zpng. zpng is a much more capable interface and pngload is actually remarkably fast. However, zpng and pngload are likely heavier than libpng, but, ultimately, the thing that takes time in this library is the rendering itself or pixel manipulation.

In the face of the immense demands of requiring a C compiler introduces, it's not unreasonable to trade a little bit of performance somewhere non-critical for ease of distribution.

The switch was remarkably painless for pngload, but zpng had a few differences that proved a tad inconvenient for me in paritcular

  1. The format was different than the one I had chosen.
    In cl-png the format used for images was a 3 dimensional array rather than a 1 dimensional array as zpng uses.

    I use a 3 dimensional array because it maps more closely to how I think about images,
  2. zpng has an accessor to obtain the 3 dimensional array, but it lacks a setter(which is probably all well and fair as the translation could be troublesome on the other end).
    What would've made this easier for me would've been if zpng had provided a conversion function(maybe it did and I missed it), but writing it myself wasn't terribly difficult in the end.

Rescaling Video Using img-genner

There's not much of a reason to do this because ffmpeg has a much faster and nicer scaler, but maybe we want to write common lisp to handle an effect. When I introduced img-genner, I wasn't sure that this was the direction to take, but now it seems appropriate, if nothing else.

You will most likely want to consult the installation instructions if you do not yet have img-genner set up.

So let's start out with the following to import the libraries we need, as well as start the ffmpeg processes.

(ql:quickload :img-genner)
(ql:quickload :pngload)
(defparameter *ffmpeg-reader* (uiop:launch-program "ffmpeg -i wander.mp4 -c:v png -pix_fmt rgb24 -f image2pipe -" :output :stream))
(defparameter *ffmpeg-writer* (uiop:launch-program "ffmpeg -f png_pipe -i - -y -b:v 1M hello4.webm" :input :stream))

We need pngload because it handles reading from the pipe correctly whereas the libpng binding I have been using does not seem to. I chose to output a webm because it can provide playback sooner, but of course there's no reason that you can't use whatever funky codec format that you prefer.

Okay, so overall, the flow should look like this:

graph LR; a(*ffmpeg-reader*) a -->scaler scaler --> b b(*ffmpeg-writer*)

Hmm, that's not very interesting though, so let's add the implicit stuff and a neat function I recently wrote.

graph LR; a(*ffmpeg-reader*) --> decode(pngload:decode) -->data(pngload:data)--> scaler; scaler --> colorizer(colorize-naive); colorizer-->b(*ffmpeg-writer*) b-->a;

There, that's an excuse to use this library!

In common lisp I couldn't remember if there was a way to check for eof explicitly, so instead I just have it handle the end-of-file condition to terminate this.

(handler-case
    (loop with i = 0
          with colors = (loop repeat 16 collect (img-genner:get-random-color))
          for input = (pngload:data (pngload:load-stream (uiop:process-output *ffmpeg-reader*)))
          for output = (img-genner:colorize-naive (img-genner:upscale-image-by-factor input 0.5 0.5) colors)
          do(img-genner:save-image output (process-input *ffmpeg-writer))
          )
  (end-of-file (c)
    )
  )

Before you run the code like this though, you'll need to add this at the end:

(uiop:close-streams *ffmpeg-reader*)
(uiop:wait-process *ffmpeg-reader*)
(uiop:close-streams *ffmpeg-writer*)
(uiop:wait-process *ffmpeg-writer*)

This closes the input stream, causing ffmpeg to close as gracefully as possible and then collecting its exit code in order to let the process be cleaned up.

Okay, but that's not very efficient. For one thing, it's fairly slow, but it's also parallelizable. You can use the pcall library that img-genner uses in order to make it faster. However, this will require some significant changes in order to make it work well. We need to turn it into tasks and then wait for them in order.

We want to load as many as possible at once, but we have a limited amount of memory and if you're using sbcl it may be a rather painful experience once you hit your heap limit, so we can't load the entire video at once unless it is very small or you have a monstrous amount of memory and have configured your lisp to use it.

(loop ...
      for jobs = (loop with memory-used = 0
                       for input = (handle-case 
                                      (pngout:data (pngload:load-stream (uiop:process-output *ffmpeg-reader*)))
                                      (end-of-file (c) '())
                       while input
                       do(incf memory-used (array-total-size input))
                       until (> memory-used 20000000)
                       collect (let ((input input) (i (incf i)))
                                    (pcall:pexec 
                                       (prog1 (img-genner:colorize-naive (img-genner:upscale-image-by-factor input 0.75 0.75) colors)
                                              (print i)))))
     while jobs
     do(loop for i in jobs
             do(img-genner:save-image (pcall:join i) (uiop:process-input *ffmpeg-writer*)))

This is a very substantial change to the control flow as you can see. A few things that are important to note, in pexec calls, you need to have made a closure of whatever variables that change that you use inside pexec, otherwise they may be modified mid use, and quite simply, that's not ideal.

This is what could be called a "Bulk Sequential Process" design, and while it uses all the processors on your system effectively, the utilization tends to have ups and downs unless it is dispatched very carefully. In this case in particular, even if we did everything perfectly, we're just communicating with ffmpeg over a pipe, and there's a lot of overhead there, and even then, it is possible to outpace the encoder ffmpeg uses, even in common lisp(This is more likely with AVIF or vp8/vp9 rather than h264/h265 in my experience), and during these times your program is sitting around doing nothing, and while ffmpeg might be keeping your computer busy at these times, it likely as not won't be able to parallelize on the images of a size that are useful, and for things with unpredictable runtimes, it very well be a bottleneck.

Now theoretically, it should be possible to also splice the audio from the first file into the second with something like a named pipe, or if that fails, just exporting the audio on its own and then using it in the encoding command, but that sounds like a project in itself.

Animating direct to mp4 with img-genner

Two posts ago I started talking about img-genner. Guiding my authorship of the library by using it. Today I made a change that allows for what I think are exciting possibilities.

One of the best uses I've found for this library has been using it to generate animations, surprisingly I've started to take a liking to my own library(oh dear, I've started warming to my own work, whatever will I do with my non-existent objectivity?).

Today, I learned something very useful from the ffmpeg-user mailing list, that it is possible to just concatenate png files and pass them to ffmpeg. This enables an easy method for img-genner to be used to generate animations directly without having to turn the png files into video.

(ql:quickload :img-genner)
(defparameter *ffmpeg-format* "ffmpeg -f png_pipe -i - ~a ~a")
(defparameter *ffmpeg-input* (uiop:launch-program (format nil *ffmpeg-format* "-deadline best" "movie.webm") :input :stream))
(defparameter *image* (img-genner:make-image 640 480))

(defparameter *circles* nil)

(defun reset-image(image)
  ;Taken from the previous post
  (loop for i from 0 below (array-total-size image)
        do(setf (row-major-aref image i) 0)))
(defun bound(pos min-x min-y max-x max-y)
  ;As is this
  (setf (aref pos 0) (if (> (aref pos 0) max-x) min-x (if (< (aref pos 0) min-x) max-x (aref pos 0)))
        (aref pos 1) (if (> (aref pos 1) max-y) min-y (if (< (aref pos 1) min-y) max-y (aref pos 1)))))
(defun tick()
  (reset-image *image*)
  (loop for (i . (dx  dy c)) in *circles*
        do(img-genner:move-by i dx dy)
        do(bound (slot-value i 'img-genner:origin) 0.0 0.0 640.0 480.0)
        do(img-genner:rotate-by i (- (random 1.5) 0.75))
        do(img-genner:fill-shape i *image* (img-genner:static-color-stroker c))))
(loop repeat 20
      do(push
         (cons (img-genner:make-ellipse (random 640.0) (random 640.0) (+ 10.0 (random 10.0)) (+ 10.0 (random 10.0)))
               (list (1- (random 2.0)) (1- (random 2.0)) (img-genner:get-random-color)))
         *circles*))
(loop repeat 1000
      do(tick)
      (img-genner:save-image *image* (uiop:process-info-input *ffmpeg-input*)))

(uiop:close-streams *ffmpeg-input*)
(uiop:wait-process *ffmpeg-input*)

You will notice the return of some code from the last one, but it avoids the usage of hashtables.

The important part is the uiop:launch-program. It creates a process which runs asynchronously, that is, that it doesn't block, and you can obtain the input stream, which is what :input :stream does. Obtaining the stream is done by calling uiop:process-info-input on the ffmpeg-input structure.

The next step will be reading from ffmpeg, but I'll leave that for my next post.

Drawing with img-genner

I talked about img-genner's glitch functions last time. Now you get to see the messier part of the library, as of (january 15th 2022), the part that needs refactoring and reconsideration. For one thing it is enormously picky in terms of types. So much so that much of the code broke regularly with only slightly wrong input. In addition, since I was not yet entirely aware of how the common lisp object system(CLOS) worked, the design does not flow nicely with the mechanics.

I need to warn you that I have a low opinion of this code and thus am looking to rewrite it, it doesn't have a consistent interface and too much of it is hidden away from the user, so this is part tutorial, part self-guided criticism; this isn't a very well organized blog post.

So, is there much value in it? Well, I tend to think so, but I suppose that I might be biased. What works well is the polygon code, the line code, the ellipse code, when it's not complaining that it's not an explicitly typed array you passed in, is quite functional.

Let's start with another file with something simple. Let's get a spray of ellipses.

(ql:quickload :img-genner)
(defparameter *image* (img-genner:make-image 800 800))
(defparameter *circles* nil)
(loop for i from 0 to 10
      do(push (img-genner:make-ellipse (random 800) (* i 20) 20.0 10.0) *circles*))
(loop for i in *circles*
      do(img-genner:fill-shape i *image* (img-genner:static-color-stroker (img-genner:get-random-color))))
(img-genner:save-image *image* "ellipses.png")
A spray of various colors of ellipses.

In this, make-ellipse is a helper function designed to save the user from specifying everything necessary in a make-instance call. Static-color-stroker creates a stroker, which is similar to a brush in various graphics systems where they probably decided deliberately to call it anything other than a stroker. In the case of static-color-stroker it creates a closure over a function that just contains the desired color value, the definition may be seen below.

(defun static-color-stroker(color)
  (lambda (i x y frac)
    (declare (ignore frac)
             (type fixnum x y)
             (dynamic-extent color))
    (set-pixel i x y color)
    ))

Other strokers are possible, gradient-color-stroker creates a gradient over the value of frac, or radial-gradient-stroker, which takes two colors, a coordinate in the image, and a max-radius parameter that specifies the distance from the first color to shift to the second.

There are of course other varieties of shapes, such as polygon and convex-polygon(though, they should convert between each other automatically), and they have different interfaces.

We can add (img-genner:fill-shape (img-genner:make-regular-polygon 400.0 400.0 5 30.0) *image* (img-genner:static-color-stroker (img-genner:get-random-color))) to get an additional pentagon as below

The ellipses from before but now with a pentagon in the center

As you can see, there are issues between boundaries in the triangles that it breaks it down into.

classDiagram class Shape{ float rotation; float~2~ origin; } class Polygon{ List~Point~ points; } class Ellipse{ float~2~ radius; } class Rectangle{ float width float height } Shape *-- Ellipse Shape *-- Polygon Shape *-- Rectangle Polygon *--ConvexPolygon

I suppose that rectangles are a type of polygon, but unlike proper polygons, it doesn't make sense to represent them as a series of points, but representing that relationship properly would require more work than it seems like it's worth.

I've been finding that animating with this library is interesting. So let's make those ellipses vibe. I know, this is a still image library, but that doesn't have to stop us. All we have to do is write two functions, one to reset the image, and another to move the ellipses.

We'll also have to assign the ellipses consistent colors, otherwise they will flash constantly, and I doubt that's considered appealing.

(defun reset-image()
  (loop for i from 0 below (array-total-size *image*)
        do(setf (row-major-aref *image* i) 0)))
(defparameter *circles* nil)
(defparameter circle-colors (make-hash-table :test 'equal))
(defparameter circle-speeds (make-hash-table :test 'equal))
(defun mutate()
  (loop for i in *circles*
        do(incf (aref (gethash i circle-speeds) 0) (- (random 1.0) 0.5))
        do(incf (aref (gethash i circle-speeds 0.0) 1) (- (random 1.0) 0.5))
        do(img-genner:rotate-by i (- (random 1.0) 0.5))
        do(img-genner:move-by i (aref (gethash i circle-speeds) 0) (aref (gethash i circle-speeds) 1))))
...
(loop for i in *circles*
      for c = (img-genner:get-random-color)
      do(setf (gethash i circle-colors) c)
      do(setf (gethash i circle-speeds) (img-genner:point 0.0 0.0))
      do(img-genner:fill-shape i *image* (img-genner:static-color-stroker (img-genner:get-random-color))))

Then we run an animation loop for 160 frames

(loop for i from 0 below 160
      do(reset-image)
      do(mutate)
      do(loop for circle in *circles*
              do(img-genner:fill-shape circle *image* (img-genner:static-color-stroker (gethash circle circle-colors))))
        do(img-genner:save-image *image* (format nil "~4,'0d.png" i))
      )

And then with the proper ffmpeg invocation, you should end up with something like the following

The ellipses vibe subtly

For the above animation, the following command was used to generate it.

ffmpeg -i %04d.png -vcodec libwebp -filter:v 'fps=fps=20' -pix_fmt yuv420p -q:v 6 -p loop 0 output.webp

This is nice and all, but if we ran the animation for longer then eventually all the ellipses might end up offscreen, which makes viewing significantly less interesting. So we need another function to wrap the position to within the bounds.

(defun bound(pos min-x min-y max-x max-y)
  (setf (aref pos 0) (if (> (aref pos 0) max-x) min-x (if (< (aref pos 0) min-x) max-x (aref pos 0)))
        (aref pos 1) (if (> (aref pos 1) max-y) min-y (if (< (aref pos 1) min-y) max-y (aref pos 1)))))

And then we need to actually use slot-value to obtain the 'img-genner:origin symbol, and then all we need is to add the bounding to the mutate function.

(defun mutate()
  (loop for i in *circles*
        do(incf (aref (gethash i circle-speeds) 0) (- (random 1.0) 0.5))
        do(incf (aref (gethash i circle-speeds 0.0) 1) (- (random 1.0) 0.5))
        do(img-genner:rotate-by i (- (random 1.0) 0.5))
        do(img-genner:move-by i (aref (gethash i circle-speeds) 0) (aref (gethash i circle-speeds) 1))
        do(bound (slot-value i 'img-genner:origin) 0.0 0.0 800.0 800.0)))

If everything worked properly, then it should be easy to change the number of frames to 250 and generate something like the following.

In the future however, I think I would like to refactor this so that it isn't necessary to work with the shape objects as the objects are really overkill for drawing ellipses and regular polygons, most libraries at this sort of level just use functions to write directly to the bitmap. I think that's much easier to fool around with since there's less state to manage.

Update: See this post to avoid needing to run ffmpeg on the output.

Introducing img-genner, a Common Lisp Image Synthesis Library

I have been working on the img-genner library(repo here) for quite a long time now, over three years if my memory serves me correctly. It contains a number of different modules, and frankly, it is a bit of a Frankenstein. At first it was about polygon graphics, then it became about pixel 'glitch' effects, such as pixel sorting and other stuff, partially because I wanted to write the algorithms myself and gain an understanding of them from implementation.

I'm still not quite sure about the triangularization algorithm(which we won't demonstrate here because I'm not sure it still works), but for the most part it has worked in that way.

For the examples here I am assuming that you have a functional roswell/common lisp environment with quicklisp and all the other amenities.

Getting the library

First clone it to your local projects directory, the command that will work for you will vary from this if you are not using roswell. As of now I have no plans to put this library on quicklisp unless there is some interest.

$> git clone https://github.com/epsilon-phase/img-genner ~/.roswell/local-projects/img-genner

Now you should be able to execute ros run and successfully run (ql:quickload :img-genner) in the repl. But that's probably not a great experience. We need an environment that is at least somewhat durable and reset-able. So let's make and go into a project folder

$> mkdir post-examples-that-are-totally-cool
$> cd post-examples-that-are-totally-cool

Add two very cool images.

Lunala Avi
lunala.png
fox-profile.png

And a file called something like post.lisp

To start we should probably add load the library, but it would also be useful to have some way to clean up the folder if we don't like what we have so far.

(ql:quickload :img-genner)
(defun clean()
  (loop for i in (directory "*.png")
        when(not (member (pathname-name i) '("lunala" "fox-profile") :test #'string=))
          do(print i) and do(delete-file i)))

With that we can clear any new images that we decide to make.

We should also add (ql:quickload :img-genner) to the top of the file to make loading it easier(no need for defpackage or asdf yet).

So, let's see, let's... downscale the lunala picture. So add this to the end of post.lisp

                                        ; Using defparameter reloads the file each time the compiler 
                                        ; processes it again, which might be useful
(defparameter *lunala* (img-genner:load-image "lunala.png"))
(img-genner:save-image (img-genner:downscale-x2 (img-genner:downscale-x2 *lunala*)) "lunala-smol.png")
lunala-smol.png

Well, that looks more or less as you would expect.

Let's also load the fox-profile picture and then combine the two images.

(defparameter *fox-profile* (img-genner:load-image "fox-profile.png"))
                                           ; The 30 30 refers to the size of the tiles in pixels. 
                                           ; In general, the smaller the tiles the longer this takes
(img-genner:save-image (img-genner:mosaify *lunala* *fox-profile* 30 30) "mosaic.png")
mosaic.png

So this has combined the two images by replacing blocks with the most similar blocks of the other image. We can change the way that it scores similarity by changing the distance metric used.

If we use the color-brightness distance function then we can care more about the brightness of the tiles rather than color.

(img-genner:save-image (img-genner:mosaify *lunala* *fox-profile* 30 30 
                                           :distance #'img-genner:color-brightness)
                       "mosaic-bright.png")

You can see the result below.

mosaic-bright.png

And before I finish this post, let me show you a pixel-sort from the library. (This will be rather slow)

(img-genner:save-image (img-genner:fuck-it-up-pixel-sort *lunala* 600 600) "sorted.png")
Sorted.png(run through pngquant to reduce the file size)

Forth and Optimization

Forth is a strange old language. It's among a relatively small number of concatenative languages that achieved any measure of success. Anyway, this isn't something we're going to go into much here because we're thinking about something nearly tangential to the language itself.

How you can make it fast.

Stacks are somewhat slow compared to registers. Using the cache has a cost, even when your program is using optimal access patterns, especially on load store architectures(which seem to be coming back into vogue again).

Anyway, on register machines, forth isn't much in terms of hot shit. Sure there are compilers that manage to do great with it, but that doesn't change the fact that as an execution model it kinda leaves a lot to be desired these days. The other issue is that it's not entirely clear how one is to write a compiler that does that kind of transformation on the execution, or it isn't to us, though that's probably more a gap in our knowledge than any sort of extreme difficulty.

Okay, so forth is simple enough, it has a stack of words of some size, a return stack, and no particular datatypes handled seamlessly other than integers of the word size. Each function(in forth such things are called 'words' because they, at least in the past, were looked up in a 'dictionary), each function can consume any number of items from the stack and result in any number of items being placed onto the stack. There's some stuff about immediate mode execution (compile time stuff), but it uses the same kind of model.

This is hard to map onto registers because a function can return any number of things. But there's a possibility for analysis here that might help. Let's make a simple function that rotates the top three items on the stack twice, so that the topmost item becomes the bottom-most.

: rot2 ( a b c -- b a c ) 
   rot ( 3 in, 3 out)
   rot ( 3 in, 3 out) ;

The first line, the definition doesn't really enforce anything here. The ( a b c -- c b a ) is just a comment to help meagre creatures that need reminders of such things (as opposed to the gods of forth or whoever). But what we can see is that by the end of the function there it remains clearly balanced, it will consume and emit 3 items no matter what.

Let's assume we can assemble rot into something sensible, we're going to use some generic bastardized assembly here because it's the best we can do.

ROT:
  move S0, T0
  move S1, S0
  move S2, S1
  move T0, S2
  ret

Of course, in general you'd want to use something like Single Static Assignment as an intermediate here, but we are even less familiar with that than assembly.

To call this without any analysis we think you would need to load the information from the stack into the registers and then call into the function, but since we know how many registers we need, among other things, we can do something like compile rot2 into something like this.

rot2:
  pop S2
  pop S1
  pop S0
  call ROT
  call ROT
  push S0
  push S1
  push S2
  ret

But then that's not really realistic for our desired solution. Ideally it would generate native code through a transpiler into C, then calling whatever C compiler the user selects.

void rot(word *a, word *b, word *c){
    word tmp=*a;
    *a=*b;
    *b=*c;
    *c=tmp;
}
: quadratic ( a b c x -- n )
 >R swap r@ * +
 swap R> dup * * + ;
: rot2 ( a b c -- c b a )
  rot rot ;
: quadratic ( a b c x -- n )
  >R ( R+1, S-1 )
  swap ( R,S )
  R@ ( R, S+1)
  * ( R, S-2,S+1 )
  + ( R, S-2,S+1 ) 
  R> ( R-1, S )
  dup ( R, S+1)
  * ( R, S-1)
  R@ ( R,S+1) 
  * (R, S-1) + ;

With some luck, the C compiler has a large attention span and optimization space. If it doesn't, then it may not be able to optimize the threaded code at all. And more to the point, the compiler still needs to try to be smart about combining operations prior to that, because some idioms in forth may not have obvious optimization opportunities to a C compiler.

This means that at some point, there is a need to accept the call, there's a need to write a compiler that understands the semantics as they apply to the underlying machine that you are targeting.

Of course, when you're Forth and everything is just words to you, maybe C compilers can peer deeply enough to handle most arithmetic optimizations you might want.

But part of what makes Forth attractive is that it's simple to implement. That it's easy to write something that'll give you lots of program space. It is a low level language, it doesn't bother itself tracking types, it trusts the user, for better for worse, and that's what makes forth so fast. It doesn't pause for garbage collection just as it doesn't tag its types.

Ultimately this sort of environment is more like DOS than it is our memory protected machines these days, and that means that this is likely less than suitable for handling untrusted input(not that you can't write 'provably' correct code in forth).

SB-SIMD, Early and Promising

Common lisp is a dream. Not always a great dream(such as when using strings), but it's sufficient and SBCL is a remarkably interesting compiler. One that tells you how you might let it make the code faster, such as adding type hints, etc.

More recently, I've had the opportunity to work with sb-simd, a library which adds automatic vectorization for certain forms. (Right now it needs to be done in a do-vectorized and it seems to be missing a few SIMD intrinsics, not to mention that if it can't vectorize what you give it it will fail loudly). The general recommendations are to also separate this sort of specialized code into another package that is loaded when it is available.

It's very rough around the edges, but for very simple things it works very well, as in, billions of subtractions a second more than otherwise.

However, it is a framework that was intended to provide acceleration for some sort of machine learning task, so it is much more developed with regards to the floating point type intrinsics than the integer types, but that actually appears to be more a limitation of what intel provides, but maybe that will change(and I'm lead to believe some of the missing operations can be substituted for, but I'm not sure which).

In any case it is likely to improve somewhat further before it makes it into SBCL's contributed libraries, and who knows, in the future it might be part of SBCL's built-in optimizer.

For example, below are a standard lisp version of a diff-image function, and an auto-vectorized one.

(defun diff-image(i1 i2 &optional result-image)
  (declare (type (simple-array u8 (* * 3)) i1 i2)
           (type (or (simple-array u8 (* * 3)) null) result-image)
           (optimize speed (safety 0)))
  (let ((r (if result-image
               result-image
               (make-array (list (array-dimension i1 0) (array-dimension i1 1) 3) :element-type 'u8 :initial-element 0))))
    (declare (type (simple-array u8 (* * 3)) r))
    (do-vectorized (x 0 (1- (the u64 (array-total-size i1))))
      (setf (u8-row-major-aref r x)
            (u8- (u8-row-major-aref i1 x)
                  (u8-row-major-aref i2 x))))
    r))


(defun diff-image-slow(i1 i2 &optional result-image)
  (declare (type (simple-array u8 (* * 3)) i1 i2)
           (type (or null (simple-array u8 (* * 3))) result-image)
           (optimize speed (safety 0)))
  (loop with result = (if result-image result-image (make-array (list (array-dimension i1 0) (array-dimension i1 1) 3) :element-type 'u8 :initial-element 0))
        for y from 0 below (array-dimension i1 0)
        do(loop for x from 0 below (array-dimension i1 1)
                do(setf (aref result y x 0)
                        (- (aref i1 y x 0) (aref i2 y x 0))
                        (aref result y x 1)
                        (- (aref i1 y x 1) (aref i2 y x 1))
                        (aref result y x 2)
                        (- (aref i1 y x 2) (aref i2 y x 2))))
        finally(return result)))

Interestingly, according to the author of sb-simd, this is probably the first code ever written to use it with integers, so it took some work on their end before it worked here.

Optimizing Pointless things

So, we’ve been playing rimworld a lot recently. One of the mods we use is called Rim Factory, which is quite nice overall for building automated bases(yes this breaks the game’s economy, what of it?). One of the mechanics it adds is a buildable object called a sprinkler, which, instead of watering your plants(this isn’t necessary in vanilla rimworld), it causes planets within a certain radius(the second tier is within twelve tiles), to grow by two hours. This is, completely unbalanced and, when you get down to it, just another item in the pursuit of excess, but that’s how we like to play the game.

Anyway, we were wondering about figuring out an optimal pattern for the usage, and since the more rigorous tools we have to measure such things are largely forgotten, we built a little simulation script in python to try and get a feel for various optimization strategies, and the most fun part, of course, is how we went about building this kind of tool.

This tool is remarkably sketchy, it does not attempt to be rigorous, or even follow the watering pattern other than cycling through all of the tiles in range.

Since we’ve been building stuff with python lately, this is in python, which, as always, never ceases to amaze how easy the ecosystem makes it to do this kind of task.

Anyway, to get an idea of where this is going, let’s start with the import block at the top:

from typing import *
import numpy as np
import itertools
import functools
import math
import statistics
import matplotlib
import matplotlib.pyplot as plt
import random
from enum import IntEnum, auto

The building blocks

The core of the script is the iterators that provide the information about where the focus is, and since this is a script without much intention of going particularly far, we can use globals and constants, so let’s start with those!

# The size of the map being kept track of.
#
# In this case, this means that the map consists of the area from (-24,-24) to (24,24)
MAP_SIZE = 24
# The radius of a sprinkler mkii is 12 blocks
RADIUS = 12

Iterators

The heart and soul of this program is in its iterators, so let’s start by defining one that will make the rest of our work easier.

def cartesian(start: int, end: int) -> Iterator[Tuple[int,int]]:
    """
    Turn a single range iterator as in range(start,end) into a cartesian product
    """
    return itertools.product(range(start,end),repeat=2)

That’ll be useful for iterating over the map. The next are a bit less sensible, but, they work, and this isn’t meant to be perfect.

def map_tiles():
    r = RADIUS+1
    z = itertools.product(range(-MAP_SIZE,MAP_SIZE), repeat=2)
    z = list(z)
    random.shuffle(z)
    for x in z:
        yield x

def sprinkler(position:Tuple[int,int]) -> Iterator[Tuple[int,int]]:
    for i, j in cartesian(0, RADIUS+1):
        if i==0 and j == 0:
            continue
        s = i**2+j**2
        if math.sqrt(s) <= RADIUS:
            yield (i+position[0], j+position[1])
            yield (position[0]+i,position[1]-j)
            yield (position[0]-i,j+position[1])
            yield (position[0]-i,position[1]-j)

Fitness

Next comes the fitness function, which, to start off with, will just be a simple mean of how many times each tile has been hit by a sprinkler.

def evaluate_fitness(field: Dict[Tuple[int,int], int]) -> float:
    return sum(field.values()) / (2*MAP_SIZE)**2

Each round in the simulation consists of iterating across a given number of sprinklers n times, and is called for each candidate to figure out the best one.

Putting that together

def simulate(positions: Iterable[Tuple[int,int]], rounds: int)\
             -> Dict[Tuple[int,int],int]:
    field = {(x,y):0 for x,y in cartesian(-MAP_SIZE, MAP_SIZE)
             if (x,y) not in positions}
    for i in positions:
        if i in field:
            del field[i]
    sprinklers = list(map(lambda x: itertools.cycle(sprinkler(x)), positions))
    for _ in range(rounds):
        for s in sprinklers:
            pos = next(s)
            if pos in field:
                field[pos] += 1
    return field

So now that we can evaluate each choice we might make, we have to make it possible to put these together for any number of sprinklers, so we write another function called evaluate_alternatives. Here we decided to limit the breadth of each search step to a given number, here it’s called to_try.

def evaluate_alternatives(positions: Iterable[Tuple[int,int]],
                          to_try: int,
                          rounds: int) -> List[Tuple[int,int]]:
    # Eliminate already extant positions
    options = set(map_tiles()) - set(positions)
    options = list(options)
    random.shuffle(options)
    to_try = min(to_try, len(options)-1)
    options = options[0:to_try]
    best_fitness = -2**48
    best_found = None
    for i in options:
        f = simulate(positions + [i], rounds)
        fitness = evaluate_fitness(f)
        if fitness > best_fitness:
            best_found = i
            best_fitness = fitness
    print(f"best fitness found this round {best_fitness}")
    return positions + [best_found]

Next we put together the final pieces in a function we called optimize.

def optimize(n: int, candidates:int, rounds: int):
    positions = []
    while len(positions) < n:
        positions = evaluate_alternatives(positions, candidates, rounds)
    fields = simulate(positions, rounds)
    heatmap = np.array([[(fields[x,y] if (x,y) in fields else -5)
                             for x in range(-MAP_SIZE, MAP_SIZE)]
                        for y in range(-MAP_SIZE,MAP_SIZE)])
    fig, ax = plt.subplots()
    im = ax.imshow(heatmap)
    # You can uncomment these if you want to have a label on each cell in the heatmap,
    # for 48x48, it is rather overwhelming
##    for (x,y), v in fields.items():
##        ax.text(x+MAP_SIZE,y+MAP_SIZE, math.floor(fields[x,y]), ha='center', va='center', color='w')
    fig.tight_layout()
    plt.show()

and to top it off, we invoke it at the end of the file:

optimize(10, 80, 1500)

The output should then look something like this

A Heatmap

Okay, that’s fine, but it’s not necessarily what you might want from such a tool. What if you’d rather have as much coverage as possible with a given number of sprinklers?

Well, we can do that.

Alternative fitness metrics

Okay, so what we need to optimize for to get the best coverage possible is to simply count the fields which are touched. In all likelihood, we might end up wanting further metrics to measure by, so we might as well make it a bit… well… Configurable, if you’re going to use a charitable term. So let’s introduce an Enum right before the fitness function.

class FitnessMetric(IntEnum):
    MEAN = auto()
    COVERAGE = auto()

fitness_metric = FitnessMetric.COVERAGE

Then we need to modify the evaluate_fitness function.

def evaluate_fitness(field: Dict[Tuple[int,int],int]) -> float:
    if fitness_metric == FitnessMetric.COVERAGE:
        r = len(list(filter(lambda x:x>0, field.values())))
    elif fitness_metric == FitnessMetric.MEAN:
        r = sum(field.values())/(2*MAP_SIZE)**2
    return r

Let’s run the script again and see what falls out.

Coverage That certainly seems to have the right effect. The majority of the map is covered by the sprinklers.

Complications

So, if you’re familiar with this mod, you might realize there are some things this isn’t even trying to account for, namely any infrastructure. Well, that should be easy to add. 🤞

Let’s add a set of positions that we don’t want filled. For now, let’s leave it to the center(we’re assuming that maybe you have a harvester station set up there), so let’s add another constant in front of evaluate_alternatives (yeah, this is getting pretty messy).

excluded_positions = frozenset([(0,0)])

Then we just have to modify evaluate_alternatives to avoid these positions.

def evaluate_alternatives(positions: Iterable[Tuple[int,int]],
                          to_try: int,
                          rounds: int) -> List[Tuple[int,int]]:
    # Eliminate already extant positions
    options = set(map_tiles()) - set(positions) - excluded_positions
#    --- SNIP ---
    heatmap = np.array([[(fields[x,y] if (x,y) in fields and (x,y) not in excluded_positions else -5)
                         for x in range(-MAP_SIZE, MAP_SIZE)]
                    for y in range(-MAP_SIZE,MAP_SIZE)])
#    --- SNIP ---

Alright, what does that make it do?

Alright, that’s a further improvement, we suppose, if nothing else it permits more flexibility in terms of describing a given scenario you want to optimize for.

Okay, so let’s say that you have a specific area that you want to optimize coverage for. How would we go about adding that?

Well, at this point it’s pretty clear that all this nonsense with enums is going to be an increasingly weak abstraction for dealing with additional fitness functions, but let’s keep it for now, if only because we’re not sure how we want to go about fixing this ugly feeling design as of writing this very paragraph.

Okay, so let’s assume that you want to optimize for an automatic harvester with an area between (-5,-5) and (5,5).

special_region = set(cartesian(-5,5))

And we can alter evaluate_fitness to this:

def evaluate_fitness(field: Dict[Tuple[int,int], int]) -> float:
    r=0
    if fitness_metric == Fitness.MEAN:
        r = sum(field.values())/(MAP_SIZE*2)**2
    elif fitness_metric == Fitness.COVERAGE:
        r = len(list(filter(lambda x:x>0, field.values())))
    elif fitness_metric == Fitness.SPECIAL_REGION:
        r = sum(map(lambda x:x[1], filter(lambda x: x[0] in special_region,field.items())))
    return r

We end up with output like this:

You might not like this output because, among other things, it doesn’t penalize the sprinklers for showing up inside of our growing zone. So, we need to add another line to our much suffering evaluate_fitness function.

def evaluate_fitness(field: Dict[Tuple[int,int], int]) -> float:
    r=0
    if fitness_metric == Fitness.MEAN:
        r = sum(field.values())/(MAP_SIZE*2)**2
    elif fitness_metric == Fitness.COVERAGE:
        r = len(list(filter(lambda x:x>0, field.values())))
    elif fitness_metric == Fitness.SPECIAL_REGION:
        r = sum(map(lambda x:x[1], filter(lambda x: x[0] in special_region,field.items())))
        r /= max(1, len([(x,y) for x,y in special_region
                         if (x,y) not in field.keys()
                         if (x,y) not in excluded_positions]))
    return r

And this helps a bit: Penalizing the region Let’s label a similar run to see if it has excluded the region altogether, if a square ends up labelled ‘Se’ it indicates that it’s a sprinkler in an excluded area.

Well, damn. Our code has a mistake in it that permitted the single offending sprinkler. We can fix this by changing the function, to this:

def evaluate_fitness(field: Dict[Tuple[int,int], int]) -> float:
    r=0
    if fitness_metric == Fitness.MEAN:
        r = sum(field.values())/(MAP_SIZE*2)**2
    elif fitness_metric == Fitness.COVERAGE:
        r = len(list(filter(lambda x:x>0, field.values())))
    elif fitness_metric == Fitness.SPECIAL_REGION:
        r = sum(map(lambda x:x[1], filter(lambda x: x[0] in special_region,field.items())))
        penalty_squares = len([(x,y) for x,y in special_region
                              if (x,y) not in field.keys()
                              if (x,y) not in excluded_positions])
        
        r/= penalty_squares+1
    return r

Alright, and this fixes that issue(and with another fix applied it labels the spots with sprinklers correctly as ‘S’).

Summary

Okay, so what we want you to be able to take away from this post:

  1. When you’re processing data for something pointless, don’t bother being clean if doing it in a dirty way lets you figure out what matters faster
  2. Separate it into concerns that model, however roughly, the components that matter
  3. Build a metric for evaluating how good a result is
  4. Descend along randomly chosen positions based on how optimal they are

Feel free to ask questions or comment. The full, considerably messier code can be found here.

Trying Out Zig

So, we have heard good things about Zig. These boil down to the following things:

  • Good speed
  • Fast compilation
  • Decent type system
  • Simple Syntax

So far, a lot of these things seem to be born out by the experience we've had, though, we have some criticism that is probably more along the lines of our taste in programming, rather than issues we expect to be universal. For fairness sake, let's start with things we like, once again expressing our preferences rather than universals.

  • Low Level
    Low level languages feel liberating to us, because we get to make the decisions, rather than the compiler or virtual machine.
  • Trusts you, mostly.
    This is mostly in comparison to Rust, whose borrow checker we have wrestled with a great deal and whose Turing complete type system has, in the past, left us mystified when we have run into errors.
    And if you really, really want to throw caution to the wind, you can mostly just tell zig that you don't give a damn about whether or not the pointer is to a single element or an array.
  • Compile Time computation, Generics
    This is how generics are implemented, and for the most part, it provides a good experience, especially since it compares exceedingly favorably with pre-c++17 C++.
  • Not afraid of Bare Pointers
    Bare pointers aren't something to worry about too much, especially if they're typed. Bare pointers are quite a bit faster and lighter than other (tagged) references.

And honestly, we're impressed at how tight the language feels. There are some facilities we're not sure about, and some that we wish it had, but overall, this isn't a bad language. We like it more than Rust(which, admittedly, really isn't saying much for us), though, from familiarity alone we're likely to revert to using C or C++, but we're not nearly as skeptical as we once were about this little language.

The syntax is familiar enough, pretty similar to C likes, though, with a few elements that we can't really attribute to any language we know(which doesn't mean very much). So, let's see what something simple looks like.

const std = @import("std");
pub fn main() !void {
    const stdout = std.io.getStdOut().writer();
    var i: usize = 1;
    while (i < 10) {
        try stdout.print("{}\n", .{i});
        i += 1;
    }
}

You can see that Zig is careful about errors, requiring that you either mark that main can return an error (designated with !void), or handle the error somehow. Anything that can fail in Zig has a type denoted by !<type>, which requires you to think about what errors can happen, or so communities as Go users and Rust users insist.

Let's see something that differs even more significantly than C.

const std = @import("std");
fn arraylist_user(alloc: *std.mem.Allocator) !std.ArrayList(i32) {
    var al = std.ArrayList(i32).init(alloc);
    var i: i32 = 0;
    while (i < 100) {
        var item = try al.addOne();
        item.* = i;
        i += 1;
    }
    return al;
}
pub fn main() !void {
    var gpalloc = std.heap.GeneralPurposeAllocator(.{}){};
    defer std.debug.assert(!gpalloc.deinit());
    const gpa = &gpalloc.allocator;
    const stdout = std.io.getStdOut().writer();
    var array_list = try arraylist_user(gpa);
    defer array_list.deinit();
    for (array_list.items) |*item| {
        try stdout.print("{}\n", .{item.*});
    }
}

So, in this little snippet we have used several new things, Generic types, Allocator, and the defer keyword(which go users will immediately recognize). Zig does not have a default allocator, and it wants you to pass the allocator you want to use to the function that allocates memory. This is rather different than C, where you would probably just use malloc/realloc/calloc/free to manage memory(Or define a macro that evaluates to that by default if you're building that kind of library). The reason that the documentation has an assert in gpalloc.deinit() is because this particular allocator can track memory leaks, so this causes it to report such occurrences. It also shows one of the syntactic divergences from C, for(array_list.items) |*item|. Pipes don't get much use in C other than as logical operators, but here it tells the loop to unwrap the value(a pointer to a position in the list), and calls it item. In Zig, you dereference this pointer with item.*. Another point of comparison to C++, the addOne method doesn't add the item itself, but returns a pointer to it, so that you can assign it afterwards.

Interestingly, for such a low level language, array_list.items isn't a pointer, it's a slice, which is a pointer+length, so in C++, we would say that it has most in common with a string_view.

Okay, so what if we want to go further? What if we want to do a reprisal of our C N-body simulator written with Raylib? In fact, that's not too bad. It's almost even easy. In fact, we can even use raylib without too much trouble.

To start off, we need to import raylib. Luckily, as it turns out, Zig's compiler can handle C interop quite nicely, it even has builtins for it. (Though, if you do this then you need to call the compiler with -lc -lraylib to make sure that it has all the c libraries linked into it).

const std = @import("std");
const ray = @cImport({
    @cInclude("raylib.h");
});

Okay, so the first thing we should do here is define a vector type, not the resizable array type of vector, but the directional vector. Since we prize reuse here, we're going to make it generic even though we probably don't really need to.

pub fn vector2(comptime value: type) type {
    return struct {
        const This = @This();
        x: value,
        y: value,
        //Write methods here!
    };
}

Okay, so that's an easy enough definition. But let's add some vector math oriented methods to make using it easier and more pleasant.

    return struct {
        const This = @This();
        x: value,
        y: value,
        pub fn add(this: This, other: This) This {
            var t: This = v2{ .x = 0, .y = 0 };
            t = .{ .x = this.x + other.x, .y = this.y + other.y };
            return t;
        }
        pub fn sub(this: This, other: This) This {
            return this.add(.{ .x = -other.x, .y = -other.y });
        }
        pub fn scale(this: This, v: value) This {
            return .{ .x = this.x * v, .y = this.y * v };
        }
        pub fn distance(this: This, other: This) value {
            const sqrt = std.math.sqrt;
            const pow = std.math.pow;
            return sqrt(pow(value, this.x - other.y, 2) + pow(value, this.y - other.y, 2));
        }
        //Wrap the value around, like when pacman reaches the edge of the map
        pub fn wrap(c: This, a: This, b: This) This {
            var r: This = .{ .x = c.x, .y = c.y };
            if (r.x < a.x) {
                r.x = b.x;
            } else if (r.x > b.x) {
                r.x = a.x;
            }
            if (r.y < a.y) {
                r.y = b.y;
            } else if (r.y > b.y) {
                r.y = a.y;
            }
            return r;
        }
        pub fn magnitude(a: This) value {
            const sqrt = std.math.sqrt;
            return sqrt(a.x * a.x + a.y * a.y);
        }
        pub fn normalize(a: This) This {
            return a.scale(1.0 / a.magnitude());
        }
    };
}

Alright, the main things to note here are how we need to call std.math.pow, namely that we need to call it with a compile time type, in this case value. Later on we'll see it called with f32.

Now we need to define the type we use for particles, and while we're at it, we're going to make a shortcut to the kind of vector we're using here.

const v2 = vector2(f32);
const particle = struct {
    const This = @This();
    position: vector2(f32) = v2{ .x = 0, .y = 0 },
    velocity: vector2(f32) = v2{ .x = 0.0, .y = 0.0 },
    acceleration: vector2(f32) = v2{ .x = 0.0, .y = 0.0 },
    mass: f32,
    //Methods here!
};

We also need a radius property, but since it's derived from the mass, and later on that can change as the bodies absorb each other, so it needs to be a method. We should also write methods to determine if the particles overlap and just moving the position along by the velocity, as well as calculating the attraction between two particles.

const particle = struct {
    const This = @This();
    position: vector2(f32) = v2{ .x = 0, .y = 0 },
    velocity: vector2(f32) = v2{ .x = 0.0, .y = 0.0 },
    acceleration: vector2(f32) = v2{ .x = 0.0, .y = 0.0 },
    mass: f32,
    pub fn radius(this: *const This) f32 {
        var result: f32 = std.math.ln(this.mass) / std.math.ln(3.0);
        if (result > 40) {
            return 40 - std.math.ln(this.mass) / std.math.ln(5.0);
        }
        return result;
    }
    //Returns true if the two particles overlap
    pub fn overlaps(this: *const This, other: *const this) bool {
        var r1 = this.radius();
        var r2 = other.radius();
        var dist = this.position.distance(other.position);
        return (r1 + r2) > dist;
    }
    //Handles the base movement
    pub fn motion_step(this: *This, timestep: f32) void {
        this.position = this.position.add(this.velocity.scale(timestep));
        this.velocity = this.velocity.add(this.acceleration.scale(timestep));
    }
    pub fn attraction(this: *const This, other: *const This, g: f32) vector2(f32) {
        var dist = this.position.distance(other.position);
        var vector_to = other.position.sub(this.position).normalize();
        return vector_to.scale(g * (this.mass * other.mass) / std.math.pow(f32, dist, 2));
    }
};

Now we should build a container for many particles, and the properties necessary to simulate them that aren't appropriate to put in the particle structures themselves. We need a method to initialize the particles to random positions, a method to handle the gravity simulation and such, and a method to actually draw the particles.

const ParticleCollection = struct {
    const This = @This();
    particles: [100]particle,
    window_start: vector2(f32) = v2{ .x = 0.0, .y = 0.0 },
    window_end: vector2(f32) = v2{ .x = 100, .y = 100 },
    timestep: f32 = 0.01,
    gravitational_constant: f32 = 1e-3,
    pub fn init_particles(this: *This, rand: *std.rand.Random) void {
        for (this.particles) |*p| {
            p.mass = @intToFloat(f32, rand.intRangeLessThan(i32, 1, 100));
            p.position = .{
                .x = @intToFloat(f32, rand.intRangeLessThan(i32, @floatToInt(i32, this.window_start.x), @floatToInt(i32, this.window_end.x))),
                .y = @intToFloat(f32, rand.intRangeLessThan(i32, @floatToInt(i32, this.window_start.y), @floatToInt(i32, this.window_end.y))),
            };
            p.acceleration = .{ .x = 0.0, .y = 0.0 };
            p.velocity = .{ .x = 0.0, .y = 0.0 };
        }
    }
    pub fn step_world(this: *This) void {
        for (this.particles) |*p| {
            p.motion_step(this.timestep);
            p.position = p.position.wrap(this.window_start, this.window_end);
        }
        for (this.particles) |*a| {
            a.acceleration = .{ .x = 0.0, .y = 0.0 };
            for (this.particles) |*b| {
                //No self attraction please, allowing that would result in division by zero
                if (a == b)
                    continue;
                a.acceleration = a.acceleration.add(a.attraction(b, this.gravitational_constant));
            }
        }
    }
    pub fn drawSystem(this: *const This) void {
        for (this.particles) |p| {
            ray.DrawCircle(@floatToInt(c_int, p.position.x), @floatToInt(c_int, p.position.y), p.radius(), ray.BLACK);
        }
    }
};

Note how we have to pass a random number generator into init_particles, this is inline with how Zig also requires that you pass the allocators into functions that require memory allocations to be done. You also see some of the somewhat jagged interaction between Zig and C, namely that Zig doesn't specify that its i32 type is equivalent to C's int type(which on many architectures it might not be), it also requires explicit conversions between floating point numbers and integers.

The main function here is the simplest part yet.

pub fn main() !void {
    const width = 800;
    const height = 450;
    ray.InitWindow(width, height, "Nbody");
    ray.SetTargetFPS(60);
    //This is very much *not* good practice, but it's the easiest way to start this
    var rand = std.rand.Xoroshiro128.init(0);
    //Don't initialize the particles yet.
    var p: ParticleCollection = .{ .particles = undefined };
    p.window_end = .{ .x = width, .y = height };
    p.init_particles(&rand.random);
    while (!ray.WindowShouldClose()) {
        p.step_world();
        ray.BeginDrawing();
        defer ray.EndDrawing();
        ray.ClearBackground(ray.RAYWHITE);
        p.drawSystem();
    }
}

And so we have a working prototype for an nbody simulator, considerably shorter than the C version of the same program.

Interestingly, it appears to be smaller compiled than the original program in C, with just zig build-exe nbody.zig -lc -lraylib, we get an executable of around 784Kb. With zig build-exe nbody.zig -lc -lraylib -OReleaseFast, we can get it down to 92Kb, and with the -OReleaseSmall option, we can get down to 84Kb.

All in all, we'd definitely watch Zig carefully, it's very well thought out, and if they get build their package manager right, then their ecosystem might become competitive with Rust's quite quickly. The langauge already quite nice to use, and it might not a bad choice for your next project you might consider doing in C or Rust if you're looking for a new language to mess around in.