Sharpie, part 7: image processing

February 11, 2026

Now that Sharpie is, like, functional, we can have some fun with it. This post has a lot of images and no code.

64 colors

The LS021 can display one of 64 colors on every one of its 76800 pixels. If you've studied old computer systems, you might be accustomed to hardware that can display a very limited number of colors. The difference between these systems and Sharpie is that those color-limited hardware designs were made to output video signals to a high-color-resolution display, like a CRT, and are color-limited due to their silicon design. For example, a Sega Master System was designed to connect to a CRT, which is an analog device and can display as many distinct colors (within some range) as you want. The actual Master System itself, however, can only render 32 colors on screen out of 64 possible colors at any given time, so it is the color-limiting factor.

Sharpie isn't like this. The computer connected to the screen---in this case an RP2350 microcontroller---can do whatever it wants, but the actual display itself can only show 64 colors. There is no way to get the LS021 to display any colors besides those 64. (This is fairly distinct from the Master System example, because Sharpie can put any one of those 64 colors on any or all pixels, while the Master System is a tile-based system with video layers. See SMS Power! for more information.)

In practice, these two situations aren't all that different. As long as there's something restricting color depth anywhere in the pipeline between memory and display, you have to work within those constraints. In fact, we'll soon see that a different "feature" of the LS021 is at least as annoying than its low color depth.

64 colors is not very much. It's almost nothing. A normal digital camera image supports 16 million colors (8 bits per color = bpc, 24 bits per pixel = bpp), and many digital formats can store way more than that (10 bpc = 30 bpp and higher). 64 colors stopped being cool in about 1987.

Images on the LS021

Image data is big, so normally we store images in compressed formats. There are a lot of ways to do this and they generally rely on throwing away some amount of information that our eyes aren't sensitive to, like minute differences in color intensity. This is useful, but it's most important with respect to storage of image data, not displaying it. When your computer shows you an image, compressed or not, it builds a grid of raw pixels because that's what an LCD panel knows how to display. For a 64-color display (2 bpc = 6 bpp), we can store one pixel in a byte, in the format 0b00BBGGRR. Each subpixel element (the individual colors) occupies 2 bits, and a raw image is just a stream of these bytes, starting at one corner of the image and running to the end.

In a perfect world, we could convert an image

of the proper size and rotation (the LS021's long dimension is the vertical axis)

to this pixel format, with a little help from libraries like image and a lot of bitshifts, save it to a file, convert to a C array, program Sharpie to display the image, and call it a day. This is what happens if you actually do that:

click for full size

That's definitely not right, even if you can see some elements of the original image. The issue here is that the LS021 is not expecting a simple and easy linear stream of pixels. Look at the datasheet (page 30):

This display expects its pixel data stream to be split into sections of most significant bits and least significant bits. A line of one image is 240 pixels = 240 bytes, and the LS021 is expecting the first 120 of those 240 bytes to be just the most significant bits of the color values for all 240 pixels in the line. It then expects the remaining 120 bytes to be the least significant bits of those color values.

This might make more sense with a diagram:

For any two adjacent pixels, P0 and P1, within the same row in the original image, starting with the first two pixels on the left-hand side of a row, we can generate two bytes composed of the most significant bits and least significant bits, respectively, of the subpixel elements of the original image. I call these "MSb bytes" and "LSb bytes", where the lowercase "b" stands for "bits". Think of it like a "byte of MSbs" or a "byte of LSbs".

These two bytes are fairly easy to generate using bitwise logic. Once they've been generated, they need to go into a buffer. sharpie-formatter uses a pre-allocated array because it's already iterating over an image. The MSb byte goes in the location of the original P0, and the LSb byte goes 120 bytes down the row from the original P0 position.

I call this process formatting, for lack of a better name. To format an image is to start with a 240x320 image, reduce its color depth (which we'll look at later), apply this bit-motion process to it, and make a new 240x320 = 76800 byte block of data that can be written directly to the screen with a full frame refresh. (Note that because pixels ahead of the current read position are overwritten during formatting, you can't format directly in place, on top of an existing buffer. You would have to maintain a few lines of spare data, or just allocate an entirely separate buffer, as sharpie-formatter does.)

So, then, what happens if we format the image before sending it to the display?

That's more like it. (The screen doesn't look this dark in reality, nor do the pixels make weird chunks. That's my phone camera's fault.)

Subpixel layout

This is a weird data format, so you might reasonably ask "why does the LS021 want its data in this layout?" I can't concretely answer that, but I have a theory. Remember back in part 1 when I promised that we'd look at the subpixel layout? Here's how the datasheet says individual pixels are assembled within the screen (datasheet page 33):

Each color within each pixel uses two subpixel elements for its most significant bit and one for its least significant bit---but unlike a more normal color LCD, these elements can only be set to on or off. In a 2-bit number, the most significant bit contributes twice as much to the total value of the number as the least significant bit, so the display can use three on-or-off binary subpixel elements per color and map them as shown above to simulate full a complete 2-bit color range. This definitely only works on a dense screen like the LS021 where the pixels are very small, so the human eye can't detect the fact that the colors are kind of fake. This kind of color fakery happens on larger displays as well (usually with high-frequency switching between two levels to simulate a third), and at sufficiently high resolution, distance, and/or frequency, you can't tell.

I suspect the reason that the display wants a data stream formatted as it does is to make the pixel driver's logic simple. Under this scheme, all the driver has to do is write 240 MSbs, then 240 LSbs, in that order, with no bit manipulation involved. I don't know why Sharp decided to have the driver accept two pixels' MSbs or LSbs at one time. Maybe for speed? Maybe they already had a 6-bit data bus ready to use?

This also does suggest that perhaps the memory LCD technology is limimted to binary subpixel elements, which is why there are still (as far as I know) no higher-color memory LCDs available.

Making the most of 64 colors

You can probably tell that even though the image is correctly formatted, it still doesn't look that good. Ryuuko (the black-haired one) has weird blue parts all over, the white background is painfully washed out, and every shadow looks very out of place.

This is what that image looks like when it's reduced to 64 colors and displayed on not-Sharpie:

This looks bad. The only reason it looks a little less bad on the LS021 is because the LS021 is tiny and has fair-to-middlin' contrast.

One solution to this problem is to dither the image. Dithering is a process where you convert an image, pixel by pixel, from one color depth to another, calculating the error between the initial and resulting image (called the quantization error), then distribute that error over nearby pixels to make it less noticable.

There are several dithering algorithms, most (if not all) of which are simply differences in how that error is distributed. Here's 9 of them applied to the image from before:

Algorithms from left to right, top to bottom: Atkinson, Burkes, Floyd-Steinberg, Jarvis, Judice, Ninke, Sierra, Sierra3, and Stucki. Images made with dither and stitched with ImageMagick.

Look at how much detail gets preserved. The sky isn't a block of colors, the background has actual nuance and gradient, and the colors look more accurate. Of course, there aren't actually any more pixels, or more colors; rather, dithering smoothes the color conversion across the whole image, and at small size, can trick your eye into seeing colors that aren't really there.

From the selection of algorithms above, I decided that Floyd-Steinberg looks the best, so I chose it for all of Sharpie's dithering. Floyd-Steinberg dithering is also generally the dithering algorithm, so it's well-documented. I won't be putting a code walkthrough in this post, but you can see my implementation in sharpie-formatter. The trickiest parts were a) figuring out how to tell image to convert pixels from sRGB color to linear, and b) discovering that the quantization error had to be computed with a saturated add.

Here's the image, converted to 2-bit color with Floyd-Steinberg dithering:

This is an image that has been convert from an 8bpc image to a 2bpc image with dithering, then converted back to a 8bpc image for display on the computer. Every subpixel component of every pixel is only one of four values, so this is perhaps best called a simulated 64-color dithered image.

Aside
Converting from 2-bit-per-color pixels to 8-bit-per-color is not quite as simple as simply shifting the value left 6 times. If you do this, the highest 2-bit value, 0b11, becomes only 0b11000000, which is far from the maximum 8-bit value, and the resulting 8bpc image doesn't map correctly to the full range of a real 8bpc image. The correct method is to use a matching table, like so:
0b00 => 0u8,
0b01 => 85u8,
0b10 => 170u8,
0b11 => 255u8
You can go from an 8bpc image generated this way back to 2bpc just by shifting right 6 times, as you would with a default 8bpc image. Check the binary if you don't believe me.

And here's the dithered image on Sharpie:

To summarize: if we have a generic 240x320 image on the computer, we can dither it, format it, and convert it to a C array, then tell the RP2350 to display it. The RP2350 is entirely capable of the same data processing (except maybe for the generic image decoding), but there's no particularly good reason to do it on there instead of on a host computer.

For what it's worth, I like the demo image, but I don't think it does the LS021 justice. Here's the colored pencils image from the last post in its original state, without dithering, and with dithering:

Video!

The code for this section is in the sharpie-usb-display directory in the main repository.

The image I used above was actually the first image I ever considered putting on Sharpie, well before I even had functional hardware. For the uninitiated, it's a frame from an early episode of the anime Kill la Kill. Once I had it working, I went looking for something more colorful, and settled on the picture of colored pencils you saw in the last post.

As cool as static image display is, we can do better. The LS021 can update its entire screen 18 times per second, because of how long the display signal takes to transmit. 18 frames per second is fast enough to show a video that actually looks like a continuous video. The RP2350 can't decode a video format like H.264 or AV1 on its own, but it can totally work with one LS021 screen's worth of raw data at many fps. Could it receive raw frames from another device that's decoding a normal video file?

The RP2350's USB link is theoretically capped at 12 Mb/s. One uncompressed LS021 screen is 76800 bytes, and (12e6/8)/76800 = 19.53. In theory, we can stream video to the RP2350 and display it in real time on the LS021 at a decent framerate of 19 fps.

Of course, reality doesn't work like that. The RP2350 cannot sustain that data rate, but it can do about 950000 bytes/second (by my testing), using the most basic TinyUSB vendor bulk endpoint. If we round this to 900000 bytes/second, we can only reach 11.7 fps. That's not very much.

But what if each frame were compressed to half the size? Then we could hit ~22 fps, which is high enough of a frame rate for most video to look at least pretty good, if not totally fine. The RP2350 has two cores, so we can offload the decompression onto the other core with some simple buffering. Our modern world is also bursting with extremely high-speed compression algorithms. I decided that the odds were in my favor and took a chance.

We need a host computer to decode the video and convert it to something that Sharpie can use. I like programming in Rust, so I did a little research and found that GStreamer has well-supported Rust bindings and supports essentially every format and piece of hardware in existence. I saw all this and thought, "heh, well, it shouldn't be more than a couple hours' work to get some simple video decoding working."

Oh how wrong I was

GStreamer is a pillar of the modern video stack, and the center of lots of effort from many different entities. It is also, to the best of my knowledge, a demon sent straight from hell to punish innocent hobbyists like myself who dare to dabble in the dark domain of audio-video processing. Holy mother of god was it hard to get GStreamer working. All Sharpie needs is a simple pipeline that decodes a video, flips it, scales it, adjusts its frame rate, and converts it to raw frames, then sends it to an application sink. This is, according to the GStreamer primary documentation and the Rust API documentation, a task supposedly so easy as to be contrite.

Unfortunately, GStreamer's documentation is written exclusively for people who already understand GStreamer in its entirety. Are you a beginner? Too bad! You'll have better luck relying on your IDE's autocomplete tools than studying examples or using your best Google-fu. It took me four whole days to get that pipeline running.

Note
To be clear, I don't mean any disrespect toward GStreamer or the people behind it. It's an important part of the modern audio-video stack, and while it is hard for me personally to use, that might be a skill issue on my part.

Okay, so after a lot of pain, we have video frames. Now what? The frames that GStreamer gives us are 8-bit-per-color 240x320 RGB images, because that's what we asked GStreamer for, so it's our job to dither, format, and compress them. We already know how to dither and format (though I did end up adapting the sharpie-formatter code to make it easier to work with when there's no convenient image types around), but how should we compress?

I started with lz4, because I know it's pretty good and really, really fast. The RP2350 can, by my benchmarks, decompress LZ4 at 12 MB/s, which is well more than fast enough. I added lz4_flex to the host Rust project and changed the host code to compress the formatted frames before sending them over USB. I then put together a simple C client for Sharpie that reads compressed frames and sends them to the second core to decompress and display. With this in place I had functional video playback! (No, you don't get to see.)

Unfortunately, I could see that this method was dropping frames. I added a quick bit of instrumentation to the main USB data loop and found that the frame rate consistently fell below my desired minimum of 18 fps.

I messed with overclocking the RP2350 and increasing the compression level on the host, but neither got the lowest possible frame rate up. At this point I realized that since the culprit was the amount of data being sent over USB, there might be a better compression algorithm available. A little research turned up Zstandard (zstd), which compresses a lot better than LZ4, even at its base level. Compare LZ4:

with zstd:

I tested both compressors at compression levels 1-10 on a dithered and formatted Sharpie frame from a different, mostly unsuccessful video test.

I then checked zstd's decompression speed on the RP2350 and found it to be sufficiently fast (I didn't save the exact number), so I implemented it on both sides, and it works! Now it doesn't drop frames, and I can show it off:

I think this is rather amazing.

There was one more bug triggered by very small frames (usually frames that are mostly one color, which compress to very small amounts of data) where that frame's data transfer would finish and the second core would decompress it before the PIO had finished sending the current frame. This would cause the second core to reset the PIO in the middle of a frame and break video playback for a few seconds. The solution was simple: just make sure that the PIO is done sending a frame before resetting it.

If you look at the code for the USB display, you might notice that I'm not using partial update. This is mostly out of laziness, because if I used partial update, I would have to build something resembling an actual data protocol instead of "data length + data", to differentiate between partial frames and total frames, and I would have to make the host code capable of generating partial frames, and I would have to determine reasonable logic to decide when to use a partial frame and a full frame. The primary factor that might make partial updates desirable is that unless every single line on the screen has been updated, a partial update is faster than a full refresh. This is held back by the fact that video playback doesn't usually lend itself well to variable frame rates.

Final thoughts

First off: Sharpie is a resounding success. I never imagined, when I laid out the original full frame PIO code back in January 2025, that I would be playing fully functional video on Sharpie. I never even intended video to be the end goal, it just seems like a good (maybe temporary) stopping place.

Second: did you notice above that I said I overclocked the RP2350? If you remember earlier posts you'll notice that I was quite specific about clocks, and carefully chose every clock divider and input frequency to meet the exact demands of the datasheet. Well, it turns out you can bump the RP2350 system clock---which also feeds the PIOs---by 33% with no consequences on the display. This shouldn't be surprising, but I didn't even notice I was out of spec until I did some tests comparing behavior between 150 MHz and 200 MHz. 200 MHz helps with screen tearing-like effects that show up sometimes, and it also means we can bump the frame rate to about 22 fps reliably. Any higher and the USB link isn't fast enough to keep up.

Finally: where am I going from here? I don't really know. I will be putting Sharpie aside for a while, but probably not forever. I really like this project and I really like this display, so I'd like to do something with both of them. I also think it would be fun to source a larger LSxxx color display from a supplier that isn't Digikey and play with that.

I am going to use the Sharpie board to get some measurements on the RP2350's low power states. Pi Foundation added these in response to the RP2040's very lame low-power states, but they didn't bother to provide much info about the RP2350's power consumption in low power in its datasheet. Apparently no one else on the entire internet has wondered about this (edit: it's not quite that bad, and the datasheet does have some info), so I'm going to find it myself. Stay tuned for that.

Oh, one more thing: