Measuring Capacity of Constellations

Having just the other day finished off my fourth year project (and in doing so, completed my degree! hooray!), I thought I’d share a little tidbit from it which perhaps might be useful to someone else.


Part of the project involved QAM constellation shaping. That is, moving points around in a constellation — in this case, to increase the capacity.
A typical QAM constellation would look something like this, shamelessly taken from wikipedia (created by user Splash):

Whereas here is an example of a shaped constellation which I investigated (coloured by decision region for hard decision decoding):


Comparing these constellations required that I be able to measure their capacities. Here I’m considering a very simple system:
It’s simply a two-dimensional AWGN channel.

Grid Evaluation

This is the first method I tried. It didn’t work very well, and I’ll explain why.

The capacity of the channel is definied by the mutual information between the input and the output:

Here, H(Y|X)=H(X+N|X)=H(N), which is simply the entropy of the gaussian distribution and has an analytic form. In this case, where the gaussian is bivariate and isotropic (that is, the \Sigma matrix is diagonal), this becomes:

H(Y) on the other hand is a nastier affair – it’s the entropy of the noisy signal, or that of a sum of gaussians, -\iint{f_Y(y)log(f_Y(y))dy}. This has no closed form so instead the approach I took was to sample f_Y(y) over a grid, and then sum across it for -\sum{f_Y(y)log(f_Y(y))}

This worked quite nicely for some signals, like this one: pdf-good
Most of the probability mass is included in the sampled grid (in fact it sums to 0.9922 here), so the summation serves as a good approximation to the integral.

But, at low SNR, a lot of probability mass spills over the sides:
Here the probability mass inside the grid only sums to 0.6353. This gave utterly meaningless results (as I’ll show later). I also tried re-scaling the probabilities so that they’d sum to 1, but ultimately this didn’t help either.

Monte Carlo Method

When I mentioned this to my supervisor, he mentioned another possible method, using Monte Carlo instead.

First, flip the mutual information expression around and consider it the other way:

In this case H(X) is easy — X is drawn equiprobably from an alphabet size of M. So H(X)=log(M).

H(X|Y) is the tricky part here, but… we can find it by Monte Carlo.
Pick a load of random points from the joint f_{XY}(x,y):
(This is an example from one of my optimised constellations)

Then for each point, -log(p_{X|Y}(x|y) is calculated and the average is taken. Since the process is ergodic, this tends to the expectation, -\mathbb{E}log(p_{X|Y}(x|y))=H(X|Y)


The Monte Carlo method produced much more sensible results:

Where the grid evaluation method gives meaningless negative capacities at low SNR, the Monte Carlo method gives results tending gracefully to zero.

In hindsight, this could probably have also been done using the original capacity expression, but calculating H(Y) by Monte Carlo… I’ll leave that as an exercise for the reader.


Just for a little bit of fun, here’s a few constellations of different orders compared using the method above.

That’s about it for this post. I’ve run out of enthusiasm to write any more. In fact I did about half an hour ago. Maybe it shows. I almost gave up and deleted the post but I figured I’d finish it in case anyone cared. Do let me know in the comments if you found this useful! It’s good to know what posts interest people, and it helps for motivation too…

Here’s my MATLAB code for the two implementations. It’s not particularly tidy and the call to mvnpdf breaks in octave, but perhaps it will be useful as a reference for someone:
ConstellationInformation.m, ConstellationInformationMC.m


Reading BBC ROMs with an mbed

A quick post this time. I’ve had an mbed (LPC1768) lying around for ages which I’d never gotten around to doing anything more than blinking an LED with, and I also — for reasons too boring to go in to here — have some BBC computer ROMs which I don’t know much about. I was interested to know what was on them though, so I’m just going to say a little bit about how I used the mbed to read them.


The ACORNOS and ACORNBASIC ROMs look legit, but as for the others… they appear to be EPROMs. These are chips which could be electrically written, and erased using UV light. To prevent accidental erasure you had to put a sticker over the window on the chip! I wonder if this could be an early case of pirated software? Or maybe it was common for software to be distributed on EPROMs? Since this is from over a decade before I was even born, I’m not really sure!

The ROMs were all of different part numbers but the first one I picked out was a 27C128 so I put together a reader on breadboard built around this. I figured the other chips must be all pin compatible anyway and a quick google for datasheets confirmed this. The circuit is super-simple: just connecting the chip to the 5v supply rails, pulling WE high (it’s an active low line), and connecting the data, address, CE and OE lines to the mbed’s GPIOs. I used all but one of the GPIOs. Cutting and stripping all of the wires was very tedious! The ROM chips need a 5v supply but have an input logic high level according to the datasheet of 2v. The mbed uses 3.3v logic but its inputs are 5v tolerant so it looks like we’re on to a winner.

I don’t use breadboard much so I only had one reel of solid core wire to hand. I hope you like green!


Next step was writing software. The way to read the chip is summarised by the read waveforms from the datasheet:

Essentially, you set the data address on the address bus, clock OE and CE low (they’re active-low lines), wait at least some minimum delay then read the data off the output bus. Afterwards, you clock OE and CE back high (in fact, I think you can leave one low all the time but I didn’t bother) and wait for the output to go back to tristate mode before repeating for the next word. The datasheet allowed ~250ns delays but just to be safe I used 1µs. Probably a good idea to allow for chips with different minimum timings anyway. Plus it would turn out later that the serial transfer is the bottleneck, rather than reading the chip.

Aside from that, the only thing to say about the code is that transferring the data to the computer over serial behaved a bit strangely. If I used Serial.putc() or Serial.printf(“%c”) on the data, I’d end up short of the expected 16KiB at the PC end. But if I used Serial.printf(“%d\n”), I got exactly the 16384 lines I expected. So instead of spending ages trying to debug this, I opted to use the decimal ASCII transfer and convert at the PC end.

The chip contents were just grabbed using a pipe:

 $ pv -s 56000 > romcontents-decimal

This gave a fairly useless decimal output (example from the ACORNOS ROM):


…etc etc.

So I threw together a quick bit of python to convert from the decimal ASCII to the raw binary:

#!/usr/bin/env python3
import fileinput,struct,sys
for line in fileinput.input():

Piping the decimal through this gave the raw ROM contents – exactly 16KiB! This could be viewed with hexdump. Here’s a couple of examples:

alex@Apollo:~/bbcroms$ hd graphics-bin |head
00000000  00 00 00 4c 49 80 82 21  a0 47 52 41 50 48 49 43  |...LI..!.GRAPHIC|
00000010  53 20 45 58 54 45 4e 53  49 4f 4e 20 00 31 2e 30  |S EXTENSION .1.0|
00000020  32 00 28 43 29 31 39 38  33 20 43 4f 4d 50 55 54  |2.(C)1983 COMPUT|
00000030  45 52 20 43 4f 4e 43 45  50 54 53 2d 50 61 75 6c  |ER CONCEPTS-Paul|
00000040  20 48 69 73 63 6f 63 6b  a1 08 2c 80 02 30 13 c9  | Hiscock..,..0..|
00000050  02 d0 11 2c cb 0c 10 08  98 9d f0 0d 8d f0 0c c8  |...,............|
00000060  a9 02 28 60 48 8e df 0c  8a 48 98 48 ba bd 03 01  |..(`H....H.H....|


Most of the ROMs had some plaintext descriptor telling you the software and its version.

This segment from AcornOS is quite interesting – a few familiar names here!

00003bf0  91 4c f4 ff 20 3a 53 45  54 53 44 46 20 4c 44 41  |.L.. :SETSDF LDA|
00003c00  bb c0 28 43 29 20 31 39  38 31 20 41 63 6f 72 6e  |..(C) 1981 Acorn|
00003c10  20 43 6f 6d 70 75 74 65  72 73 20 4c 69 6d 69 74  | Computers Limit|
00003c20  65 64 2e 54 68 61 6e 6b  73 20 61 72 65 20 65 78  |ed.Thanks are ex|
00003c30  74 65 6e 64 65 64 20 74  6f 20 74 68 65 20 66 6f  |tended to the fo|
00003c40  6c 6c 6f 77 69 6e 67 20  70 65 6f 70 6c 65 2c 20  |llowing people, |
00003c50  63 6f 6d 70 61 6e 69 65  73 20 61 6e 64 20 6c 6f  |companies and lo|
00003c60  63 61 74 69 6f 6e 73 2c  20 63 6f 6e 74 72 69 62  |cations, contrib|
00003c70  75 74 6f 72 73 20 28 61  6d 6f 6e 67 20 6f 74 68  |utors (among oth|
00003c80  65 72 73 20 74 6f 6f 20  6e 75 6d 65 72 6f 75 73  |ers too numerous|
00003c90  20 74 6f 20 6d 65 6e 74  69 6f 6e 29 20 74 6f 20  | to mention) to |
00003ca0  74 68 65 20 64 65 76 65  6c 6f 70 6d 65 6e 74 20  |the development |
00003cb0  6f 66 20 74 68 65 20 42  42 43 20 43 6f 6d 70 75  |of the BBC Compu|
00003cc0  74 65 72 3a 44 61 76 69  64 20 41 6c 6c 65 6e 2c  |ter:David Allen,|
00003cd0  42 6f 62 20 41 75 73 74  69 6e 2c 52 61 6d 20 42  |Bob Austin,Ram B|
00003ce0  61 6e 6e 65 72 6a 65 65  2c 50 61 75 6c 20 42 6f  |annerjee,Paul Bo|
00003cf0  6e 64 2c 41 6c 6c 65 6e  20 42 6f 6f 74 68 72 6f  |nd,Allen Boothro|
00003d00  79 64 2c 43 61 6d 62 72  69 64 67 65 2c 43 6c 65  |yd,Cambridge,Cle|
00003d10  61 72 74 6f 6e 65 2c 4a  6f 68 6e 20 43 6f 6c 6c  |artone,John Coll|
00003d20  2c 43 6f 6d 70 75 74 65  72 20 4c 61 62 6f 72 61  |,Computer Labora|
00003d30  74 6f 72 79 2c 43 68 72  69 73 20 43 75 72 72 79  |tory,Chris Curry|
00003d40  2c 44 65 73 69 67 6e 65  72 73 20 6f 66 20 74 68  |,Designers of th|
00003d50  65 20 36 35 30 32 2c 4a  65 72 65 6d 79 20 44 69  |e 6502,Jeremy Di|
00003d60  6f 6e 2c 54 69 6d 20 44  6f 62 73 6f 6e 2c 4a 6f  |on,Tim Dobson,Jo|
00003d70  65 20 44 75 6e 6e 2c 50  61 75 6c 20 46 61 72 72  |e Dunn,Paul Farr|
00003d80  65 6c 6c 2c 46 65 72 72  61 6e 74 69 2c 53 74 65  |ell,Ferranti,Ste|
00003d90  76 65 20 46 75 72 62 65  72 2c 4a 6f 6e 20 47 69  |ve Furber,Jon Gi|
00003da0  62 62 6f 6e 73 2c 4c 61  77 72 65 6e 63 65 20 48  |bbons,Lawrence H|
00003db0  61 72 64 77 69 63 6b 2c  44 79 6c 61 6e 20 48 61  |ardwick,Dylan Ha|
00003dc0  72 72 69 73 2c 48 65 72  6d 61 6e 6e 20 48 61 75  |rris,Hermann Hau|
00003dd0  73 65 72 2c 48 69 74 61  63 68 69 2c 41 6e 64 79  |ser,Hitachi,Andy|
00003de0  20 48 6f 70 70 65 72 2c  49 43 4c 2c 4d 61 72 74  | Hopper,ICL,Mart|
00003df0  69 6e 20 4a 61 63 6b 73  6f 6e 2c 42 72 69 61 6e  |in Jackson,Brian|
00003e00  20 4a 6f 6e 65 73 2c 43  68 72 69 73 20 4a 6f 72  | Jones,Chris Jor|
00003e10  64 61 6e 2c 44 61 76 69  64 20 4b 69 6e 67 2c 44  |dan,David King,D|
00003e20  61 76 69 64 20 4b 69 74  73 6f 6e 2c 50 61 75 6c  |avid Kitson,Paul|
00003e30  20 4b 72 69 77 61 63 7a  65 6b 2c 50 65 74 65 72  | Kriwaczek,Peter|
00003e40  20 4d 69 6c 6c 65 72 2c  41 72 74 68 75 72 20 4e  | Miller,Arthur N|
00003e50  6f 72 6d 61 6e 2c 47 6c  79 6e 20 50 68 69 6c 6c  |orman,Glyn Phill|
00003e60  69 70 73 2c 4d 69 6b 65  20 50 72 65 65 73 2c 4a  |ips,Mike Prees,J|
00003e70  6f 68 6e 20 52 61 64 63  6c 69 66 66 65 2c 50 65  |ohn Radcliffe,Pe|
00003e80  74 65 72 20 52 6f 62 69  6e 73 6f 6e 2c 52 69 63  |ter Robinson,Ric|
00003e90  68 61 72 64 20 52 75 73  73 65 6c 6c 2c 4b 69 6d  |hard Russell,Kim|
00003ea0  20 53 70 65 6e 63 65 2d  4a 6f 6e 65 73 2c 47 72  | Spence-Jones,Gr|
00003eb0  61 68 61 6d 20 54 65 62  62 79 2c 43 68 72 69 73  |aham Tebby,Chris|
00003ec0  20 54 75 72 6e 65 72 2c  41 64 72 69 61 6e 20 57  | Turner,Adrian W|
00003ed0  61 72 6e 65 72 2c 57 69  6c 62 65 72 66 6f 72 63  |arner,Wilberforc|
00003ee0  65 20 52 6f 61 64 2c 52  6f 67 65 72 20 57 69 6c  |e Road,Roger Wil|
00003ef0  73 6f 6e 2c 41 6c 61 6e  20 57 72 69 67 68 74 2e  |son,Alan Wright.|

In case the contents of these ROMs are useful to anyone, here’s a tarball of all of them. It contains:

  • ACORNBASIC, unknown version, 1981
  • ACORNOS 032 (?) 12-Aug-1981
  • MOS+ 1.15, 3-May-1988
  • DFS 2.26, 1985
  • Computer Concepts Graphics Extension 1.02, 1983

Since I have no use for these chips, I’m probably going to donate them to the Centre for Computing History. Check them out if you’re in Cambridge!


An 88 Byte Bare-Metal Mandelbrot Generator

A while back, I was playing around with a bit of 8086 assembly. I’m not much of a programmer, but I’ve done some C{,++}, Python, PHP (sorry) and bits of a few others in the past. All of these are quite high level though, and I felt like going back to basics a bit. I chose to start with MS-DOS, since I thought (perhaps incorrectly?) that it would be easier to program. And I figured it would probably be a nice bit of computing history to learn anyway.
I soon moved on from MS-DOS and decided to set myself a programming challenge of a bare-metal Mandelbrot generator. In this post, I’ll give a bit of background about the Mandelbrot set, and explain my code.


I started off with this MS-DOS assembly tutorial. It seems to be broadly unchanged since it was first crawled by the Wayback Machine in 2002. Nice. I chose to use NASM instead of A86 but otherwise the tutorial’s pretty good. If you’re lazy, here’s a virtualbox image, with MSDOS 6.22 and NASM installed. My code is all in C:\PROGRAMM, in case anyone wants to use it as a reference.

This was fun for a while, but eventually I got bored of DOS programming and decided to try some bare metal stuff – and somehow, perhaps vaguely influenced by demoscene style demos, I ended up setting myself the challenge of writing a bare-metal Mandelbrot generator within a 512-byte bootsector.

The Mandelbrot Set

The Mandelbrot Set is the set of points, c, on the complex plane such that the sequence zn+1=zn2+c (where z0=0) remains bounded. To clarify, here’s an example:

  • Let’s check if the point (-0.5+0.5j) is in the set. Set c=(-0.5+0.5j)
  • Now let’s perform the first iteration. (-0.5+0.5j)²+(-0.5+0.5j)=-0.5
  • And another iteration: (-0.5)²+(-0.5+0.5j)=-0.25+0.5j
  • And another: (0.25+0.5j)²+(-0.5+0.5j)=-0.6875+0.25j
  • We can keep on iterating again and again like this, and we find that the numbers stay small. Eventually, we might conclude that the sequence is bounded, and the point is therefore in the set.

In fact, if we plot each iteration, we can see that it’s going around in a strange, almost spirograph-like shape:

But overall, the general trend seems to be going around in a closed circle, rather than shooting off to infinity, so it would probably be fairly safe to say here that the sequence is bounded. If you’re not convinced, here’s a plot of the magnitude at each iteration:

To take another example, let’s try it with the point (0.5,0.5). The iterations go:

  • 0.5+1j
  • -0.25+1.5j
  • -1.69-0.25j
  • 3.28+1.34j
  • 990294279-77569977j
  • So, this looks like it’s shooting off to infinity… and fast.

    But this raises an interesting question: at what point do we decide whether the sequence is or isn’t bounded? In general we’d have to continue iterating to infinity to know for sure. Perhaps we should set a threshold, beyond which we declare it to be unbounded? But where should it be? And how do we know that sequences which exceed that threshold aren’t coming back?

    Well, as it turns out, it can be proven (see A6d) that if any point leaves a circle of radius 2, (ie |zn|>2), it will never return. So a reasonable approach is to compute a fixed number of iterations, and then check whether the result is less than 2 in absolute value.

    Here are two simple implementations I wrote just to demonstrate the idea. First, this is for the C++-minded people:

    #include <iostream>
    #include <complex>
    using namespace std;
    int main(){
    	cout << "P1 800 800 ";
    	for(int i=0;i<800;i++){
    		for(int j=0;j<800;j++){
    			complex<float> c((j-400)/200.0,(i-400)/200.0); //scale the indicies to floats in the range -2..2
    			complex<float> z(0,0); // initialise z to 0
    			for (int iter=0;iter<64;iter++) //iterate 64 times
    			cout << (abs(z)<2)?"1 ":"0 "; //output a black pixel if in the set, white otherwise
    	return 0;

    This goes through each pixel one-by-one and for each one runs 64 iterations. The output is a PBM file to stdout, so you’ll have to pipe it and open it in a compatible viewer (GIMP works) or convert it to a more common format.

    Or, if you’re more of a MATLAB fan:

    for i=1:50

    The same idea as the C++ example, but using vectorisation rather than iterating through pixels, and using MATLAB’s internal imshow command to display the image.

    The Bare-Metal Version

    With all that setting the scene done, let’s get down to how I wrote an 88-byte 8086 bare-metal mandelbrot generator.

    A note on arithmetic

    The general structure of the C++ example can be carried over, but the arithmetic gets difficult. Firstly, the complex number datatype in C++ is an abstraction. The processor doesn’t understand complex numbers (not an 8086 at least). So we’ll have to treat them as two reals. Addition and subtraction are trivial. For multiplication we just have to remember that:
    and specifically:

    This is conceptually not too difficult, but it does mean that complex operations take up a lot of program space.

    The second thing to consider is that floating point operations don’t exist in 16-bit intel. We’re going to have to work in integer instructions instead. With a little bit of trickery though, we can create a very simple fixed-point system. Essentially, we just represent every number as a fixed multiple of its true value. Say, this could be 10. Addition (and by extension, subtraction) is consistent:

    Multiplication works too, with the caveat that we have to divide by the scale factor after:

    It’s for this reason that it’s best for that multiple to be a power of two: this reduces the division to a right shift.
    For example, here is how we might multiply two numbers with a scaling of 1/32:

    mov al,73 ; move the number 2.28 (73/32) into al
    mov bl,-193 ;move -6.03 (-193/32) into bl
    imul bl ; multiply bl (implicitly, with al). stores result in ax.
    sar ax,5 ; shift right by 5 places to divide by 2^5=32

    Note the use of sar rather than shr to preserve the sign.

    The code

    So, without further ado, let’s start deconstructing my code, a section at a time:

    [bits 16]
    [org 7c00h]
    mov ax,0fh
    int 10h

    First a bit of initialisation – instructions to the assembler that the code should be 16-bit, and will be loaded into memory at location 0x7c00.
    Then, we call BIOS interrupt 10h with AH=0 (change video mode) and AL=0fh (640×350 mono resolution).

    This video mode will fit a nice big 512×256 canvas on the screen.

    Total assembled size so far: 5 bytes

    mov dx,-128 ;start on the first line
    mov di,-256 ;go to the beginning of the line
    ;set z=c initially - start on iteration 1
    mov si,di
    mov bp,dx
    xor ax,ax ; set colour to black (al=0) and iteration counter to zero (ah=0)

    So, here we initialise the two variables for our c value. DX will contain the y value and DI the x value. I’m starting at y=-1 and x=-2, using a scaling factor of 1/128. My canvas is to be y from -1 to 1 and x from -2 to 2. The z variable is stored with its real part in SI and its imaginary part in BP. We skip out the zeroth iteration here by initialising z=c.

    Later, the interrupt to write a pixel to the screen will read the pixel value from AL. This wants to be initialised to zero (black). If the pixel is in the set, this will be changed to one (white) later. Otherwise it will be unchanged.
    At the same time, I want to set an iteration counter to zero. Sneakily, I’ve chosen to use the other byte of AX for this, so that I can set both the pixel value and the iteration counter to zero in a single instruction. This single xor saves two bytes vs two 8-bit literal moves.

    This block: 12 bytes, So far: 17 bytes.

    ;calculate Im(z^2)=2xy
    mov cx,bp
    imul cx,si
    jo overflow
    sar cx,6 ;maybe mov 6 into a register first?
    ;cx contains Im(z^2)

    At each iteration step we need to calculate z_{n+1}=z_n^2+c. To do this we need to calculate the real and imaginary parts of z_n, then add these to the real and imaginary parts of c.

    As the comment suggests, here we’re calculating Im(z_n^2), which as I showed previously is 2Re(z_n)Im(z_n). We copy Im(z_n) into CX, since we’ll need it again later and we don’t want to clobber it. Then we multiply this by SI (Re(z_n)) and store the result in CX. The value in this register should now be 128Re(z_n)Im(z_n) (note what I said about fixed-point multiplication earlier). Ignoring the conditional jump for now, we would then want to divide by 128 to get Re(z_n)Im(z_n). Except what we want is actually 2Re(z_n)Im(z_n), so we can save ourselves a multiplication by just dividing by 64 instead (right shift by 6).

    Now, back to the jump. This is how we catch the sequences which go off to infinity – we’ll see where it goes later. In the C++ example, we used abs(z) to check whether we were inside an r=2 circle. But calculating an abs value is expensive. It requires us to square two numbers and sum them. But consider this: any point which leaves the r=2 circle will eventually go off to infinity. If I wanted to I could use an r=3 circle, or an r=1,000,000 circle – it would just potentially take more iterations. Any region which completely contains the r=2 circle will do the job. So how about a rectangle?

    My jump is conditional on 128Re(z_n)Im(z_n)=64Im(z_n^2) being greater than 32768/128 in magnitude (the signed 16-bit max, after scaling). That is, |Im(z_n^2)|>8

    So, provided that |Im(c)|<6 - which it always will be since we're only working with y in the range of -1 to 1 - this will contain the whole of the r=2 circle. Perfect. This block: 10 bytes, So far: 27 bytes [code];calculate Re(z^2)=x^2-y^2 mov bx,si ;we will work in the BX register for now add bx,bp ;bx contains x+y sub si,bp ;si contains x-y imul si,bx jo overflow sar si,7 ;si contains Re(z^2)[/code] areasAnd now to calculate Re(z_n^2). As noted previously, this is given by Re(z_n)^2-Im(z_n)^2. Earlier versions of my code calculated exactly this – multiplying each part by itself, then subtracting one from the other. Then I realised I was missing a trick here – I could factorise a^2-b^2=(a+b)(a-b), thus gaining an ADD (+2 bytes), but losing an IMUL (-3 bytes). A saving of 1 byte – score!

    After we’ve calculated Re(z_n)+Im(z_n) with the add bx,bp, we no longer need Re(z_n) (SI), so we save ourselves an MOV by clobbering it in the SUB instruction.

    Again, we check for overflow and SAR 7 to take out the factor of 128. Here, the overflow checks if |128Re(z_n^2)|>\frac{32768}{128}, ie |Re(z_n^2)|>4. Over the range of |Re(c)|<2 - exactly the space we're working in - this completely contains the r=2 circle. In fact, at the far left and right edges of the canvas, they just touch. An illustration is shown to the right: including the [latex]abs(z^2)<2[/latex] criterion in grey and the rectangular criterion we're using here in blue. The rectangle moves around relative to the axes depending on the point being calculated. This block: 14 bytes, So far: 41 bytes [code];calculate z'=z^2+c add si,di add cx,dx mov bp,cx ;do another iteration inc ah jno iterate[/code] Here are the final steps to calculate [latex]z_{n+1}[/latex]. We then increment the iteration counter, AH, and unless it overflows (passes 255), we go round again. Whee! 255 iterations is thoroughly excessive, but it allows us to just check overflow, rather than doing a CMP with an immediate value, which would cost 3 bytes. This block: 10 bytes, So far: 51 bytes [code];iterations are over inc al ; if we've gotten all this way, set the colour to white overflow:[/code] If we get here just by the natural progression of the program, that will be because we went through 255 iterations without overflowing. Therefore the pixel is in the set, so we set it to white - a value of 1. Since I know its value before is zero, I can use an INC (2 bytes) instead of a MOV immediate (3 bytes). If we get here via one of the jump on overflows, we skip out of this instruction and the pixel remains black. This block: 2 bytes, So far: 53 bytes [code];now write a pixel mov ah,0ch ; write pixel interrupt xor bh,bh mov cx,di add cx,320 add dx,175 int 10h sub dx,175[/code] Here we use another INT 10h interrupt to write the pixel to the screen. We set AH=0Ch for the "write a pixel" command, set BH=0 for page zero (using XOR- no size advantage over MOV immediate here, but faster), and stick the coordinates in CX and DX (in fact, I chose to use DX to store the y coordinate anyway, to save a MOV instruction here). Since our coordinate system was centred around zero, we add 320 and 175 to the x and y coordinates respectively, to centre the image on the screen. After the interrupt, we have to take the 175 back off DX, since it serves as our loop counter so we can't clobber it. All of these ADD and SUB immediates are VERY expensive instructions - each one takes 4 bytes. I haven't thought of a way to optimise this though. This block: 20 bytes, So far: 73 bytes [code];loop around, do the next pixel inc di cmp di,255 jne xloop ;or if we've gotten to here, draw the next row inc dx cmp dx,127 jne yloop[/code] And some loop logic. We go onto the next pixel in the line (inc DI), and loop around unless the pixel value's 255 (end of the line). This CMP literal is expensive (4 bytes) but not easily avoided - since it's a signed type (and we're only using the lower 9 bits anyway), we can't just test for overflow. Outside of this, we have the same structure to loop through lines. This block: 13 bytes, So far: 86 bytes [code]cli hlt db $-$$ ;tell us the length of the code times 510 - ($ - $$) db 0 dw 0xAA55[/code] And finally a little bit of boilerplate: clear BIOS interrupts and halt. The db $-$$ line is a little hack to make it easier to see the size of the program - it just inserts a byte of value (this memory location-memory location of program start). Then I just look at the code in xxd and the last byte before the long run of zeros is the size of the program. It needs to be padded up to 512 bytes (the size of a boot sector), with the magic word AA55h at the end to indicate that it's bootable. And so that's a final 2 bytes of executable code (I'm not counting the padding, of course), bringing us up to 88 bytes! The complete code is here: mandelbrot-bootsector.asm

    The result

    The code is compiled with nasm:

     $ nasm -o mandelbrot mandelbrot-bootsector.asm

    And run in qemu:

     $ qemu-system-i386 -video cirrus -fda mandelbrot -net none

    Here’s a screenshot and a video of it running:

    Possible Optimisations

    I’ve no doubt this code could be made smaller by someone more skilled than I am. I can see a few areas for potential optimisation, but can’t think how to implement them without breaking other things. For example, when writing the pixels to the screen, there are 3 ADD/SUB immediate instructions. These take up 4 bytes each – if I could have the immediate values in a register instead, that would save two bytes per instruction. But alas, I’m already scraping the bottom of the barrel for registers (I tried using SP, but since BIOS interrupts apparently use the stack, that breaks things).

    Similarly, the CMP literals for the loop logic are pretty expensive. Avoiding literals is always desirable.

    Perhaps it’s possible to save operations by foregoing the BIOS interrupt to write pixels and just write to video memory instead? I haven’t looked into this at all.

    Dissolve Animation

    I made another version of my code which makes the set “dissolve” in. Essentially it uses an LFSR to cycle through pixel values, rather than a loop. I think it gives a pretty cool visual effect. I was going to write more on it, but this post is getting long enough already, so I’ll just leave the code and a video here if anyone’s interested. It’s not as heavily optimised as the code above.


    This has turned out to be a very long blog post (my word count is saying over 2,700 words) but hey, I guess I had a lot to say. I had great fun learning assembly with this programming challenge, and I hope you’ve found my explanation interesting interesting too. If you can optimise it any more, I’d be interested to see your results, so do let me know in the comments!


Sinclair C5 Indicators

A while back, I bought a Sinclair C5.


I love it. It’s an amazing piece of history and something really unique. Sure, it’s not that practical so I don’t drive it every day, but it’s great fun to take out for a spin every now and then.

I’ve got various projects on the go to improve it, and one of them was making indicators for it.

The original indicators looked like this:

The front ones are fairly standard but the rear ones are specially made for the C5 – as you can see they fit flush against the curved edge. The problem is, these original indicators are very rare and expensive nowadays — they can easily fetch over 100GBP for a set.

So I set about making my own. My friend Eddie Tindall, very talented designer, recommended that Renault Megane indicators would go quite well with the style of the C5. So I bought some of those and pryed the plastic diffusers free from the backings. The glue was quite brittle, so they came off ok with a little bit of care.

The reason for doing this was that I wanted to use LEDs rather than the original incandescent lamps. I wanted to avoid retrofits though because I thought they probably wouldn’t be bright enough.

I hacked up a quick PCB in EAGLE CAD. It was really rough and dirty — the polygons were all hand drawn and had funny shaped gaps between them, etc. But in the spirit of my previous post, I didn’t want to bother spending ages getting it perfect. Once I was fairly happy the design was OK, I sent it off to Smart Prototyping in HK for manufacture. 18 days later, the 5 copies of the PCB arrived.


Each board is for two indicators — one left and one right, since the two indicators are mirror images of each other. You’re looking at the front of one indicator PCB and the back of the other here. They needed to be guillotined in half along the silk screen diagonal line. The circles on the top right of that image (also present on the back at the bottom left) are my idea for thermal management: I calculated that if I soldered coins to these pads (1p coin for the circular ones, 2p coin cut in half for the semicircular ones), it would give the indicators enough thermal mass to keep cool in normal use. This is pretty essential, since the LEDs I was using would be dissipating quite a lot of power.


Here’s one of the boards soldered up, from both sides:

I couldn’t find a hacksaw, so I just blobbed a load of solder where the 2p halves should have been for now. It’s not quite as effective, but it might be good enough. It will run for quite a few seconds before the 1p coin becomes noticeably warm, but the solder blobs heat up a bit quicker. Soldering this was fairly easy since I have a ~90W iron, but would probably be impossible with a cheap maplin iron.

Here’s a picture of one working, hooked up to a 700mA constant current supply.
I’m not entirely sure how to measure the brightness, but they were bright enough to give me spots in my vision. For comparison, the backdrop of that picture is in mid-evening british summer sunlight and the exposure was apparently f/2.8, ISO 50, 1/250 seconds.

I’m reasonably happy with how these turned out. I still need to work out how I’m going to fix the PCB to the diffuser (probably some sort of high temperature epoxy), and then fix them to the C5. But before I can do that, I need to get the rest of the electronics done — and that’s a job for another day.


Perfect is the Enemy of Done

I’ve finally gotten round to re-hosting my blog — I was previously using Gandi‘s Dotclear installation which came free with my domain name but I got fed up of it because the comment spam filtering was useless and I found the markup difficult to use. The old blog is currently still hosted at but may move in the near future. This blog is now hosted by the lovely people at the SRCF (full disclosure: I’m on the committee of the SRCF).

I’ve been intending to re-host it for a while but I spent a long time deciding what platform to use. I ruled out wordpress early on because it was too bulky and slow — PHP and MySQL, really!? For just a blog!? I tried a few static content generators but ruled those out for some reason or another too, and I eventually decided that this was a job for later, when I could find time to do it perfectly.

And that’s the problem. Recently, I realised that a lot of my projects end up unfinished because I can’t do them perfectly. My bench frequency reference, my reprap, my Sinclair C5 indicators (I’ve bought a Sinclair C5 since my last post — more on that another time) and various programming projects — all ended up coming to a stall. Or if for some reason I absolutely have to finish something, I tell myself that there’s no point in doing it just alright — if it’s not going to be perfect anyway, I may as well put the absolute bare minimum effort in and do a horrific bodge of it.

But enough is enough. As they say, perfect is the enemy of done. Over the last week or two I’ve been trying to get myself into that mindset. I’m going to start finishing some projects which have dragged on for too long. I may not be able to do them perfectly but that’s fine – I’ll at least get them done. First up was migrating my blog. WordPress may not be perfect, but it will get the job done. So I’ve used that and now the job is done.

To help me keep this in mind, I made a motivational poster to go on my wall — I took a piece of scrap paper, scrawled “PERFECT IS THE ENEMY OF DONE” on it and taped it up in front of me with parcel tape.


It’s not perfect, but hey — it’s done.