Sharpie, part 6: partial updates

December 24, 2025

Revised January 4, 2026 with bugfixes and more explanation.

Welcome back. I am happy to share that Sharpie, such as it is, is now available to the public. The code and designs are available at its GitHub repository. Enjoy!

We're coming into the homestretch on this project. Last time, we were able to send full frames of data to the display, but this isn't all that the LS021 is capable of. Specifically, the LS021 supports a waveform where you send short GCK pulses on lines where no data is being sent---in other words, you can update just part of the screen.

This post assumes you've read the rest of the Sharpie series, and are familiar with how the display works from a signals perspective.

Overview

Let's start with the datasheet. This chart is from page 31.

This might look complex, but if you've been following along so far, it's actually about as simple as what we've done before. All we have to do is send short GCK pulses, like the diagram shows, when we don't want to update the image. The problem is that figuring out how to do this on PIO took a lot of planning and testing. Let's look at how my solution works (with the standard disclaimer that there's probably easier ways, small issues with my code, etc, etc).

Even though partial refresh isn't necessarily that complicated, a functional implementation is. My implementation uses two data streams, chained IRQs, and so many instructions that we have to split the program across two PIOs. Because the state machines have to communicate with each other, we have to move to the RP2350 and restrict these programs to PIO version 1. The RP2040's PIOs don't have any facilities for communication between SMs.

This beautiful Inkscape diagram shows the data streams and IRQs, except for one counter. The name of each state machine is the title of the inner boxes. We're starting at PIO 1 because PIO 0 has already been configured for full screen updates.

Let's run this backward, starting with the way the code works and then moving to the code.

1. Once all the SMs and DMA streams are set up, CPU sets IRQ 0. The INTB/GSP SM is waiting for this and starts by raising INTB and GSP appropriately, then sets IRQ 1.

2. GCK and the "GCK end SM" start with IRQ 1. GCK h/ls can now be different widths, and the easiest way I found to make sure that GCK h/ls at the end of the frame are sent correctly is to have a separate SM send them. The GCK end SM waits for IRQ 1, then counts down a pre-calculated number of cycles (which was placed on its FIFO before we started) before sending a few full-size GCK h/ls.

3. GCK SM loops and sets IRQ 2 for horiz/data SM when data needs to be sent. The horizontal pulses and data state machine sends certain amounts of pixel data from a DMA stream when cued by the GCK SM.

4. After the GCK end SM times out, it sets IRQ 3 to tell INTB to count down, then lower INTB. INTB has to fall at a specific point at the end of the frame. The least painful way I've found to do this is to have INTB load a register with a counter at the start (when it was otherwise going to be running nop instructions), then have it wait for GCK end after GSP falls, then count down that counter before setting INTB low.

Put simply, although the INTB/GSP SM sends the actual start signal, the GCK SM is what really controls the signal generation. It is fed with a DMA stream of line counts, for skipped lines and changed lines alternately (this data is one of the yellow lines in the diagram above), and when it starts sending full-size GCK pulses for a region of the image that has changed, it sets IRQ 2 so that data is also sent.

If that kind of makes sense, then this Pulseview screenshot of a partial frame should get you the rest of the way there (click to view full size):

This waveform was made by telling the setup to skip 200 lines, then send 19 lines of data, then skip the rest of the lines in the frame. You can see why having partial update might be desirable for any number of applications: it takes significantly longer to send 19 lines of data than it does to skip a total of 301 lines. This transfer took 5.4 ms, which is not nothing, but is much faster than a 53.56 ms full frame.

INTB rises, then GSP, then GCK has two full-size h/ls before going into skip mode. The GCK SM sends lots of GCK skip h/ls, then starts sending full-size h/ls for the changed part of the frame. At the same time this starts, it tells the horiz/data SM to send data and horizontal pulses, and the main GCK loop also makes GEN pulse in the middle of GCK pulses. Then the GCK SM goes back to skip pulses before the GCK end SM takes over and makes 6 full-size GCK h/ls at the end of the frame.

Let's note a few oddities with the signal:

Code

INTB/GSP SM

First, the INTB/GSP PIO code. I'm leaving out the C code and the assembler directives in the PIO files, because now you can view it yourself!

; side-set: 0b[GSP][INTB]
; clock: 1/32 of a full (not skipped) GCK h/l
; expects to be able to pull one 32-bit int, a big counter value
; so that INTB falls correctly

    
.wrap_target
wait 1 irq 0     side 0b00     ; wait for CPU to tell us to start
nop              side 0b01 [7] ; rise INTB
set x, 21        side 0b11 [7] ; rise GSP, set x for loop later
out y, 32        side 0b11     ; wait more GSP, to meet thsGSP timing, and get loop counter
irq set 1        side 0b11     ; wait rest of GSP and tell GCK + GCK end to start

; wait out GSP with a loop, the counter comes from the CPU
gsploop:
jmp y--, gsploop side 0b11


wait 1 irq next 3 side 0b01   ; wait for GCK end to set irq 3

; this loop starts after GCK end begins running the large GCK pulses at the end
; of a partial frame, and makes sure INTB falls at the (approximately) halfway
; point of GCK646. the counter in x was determined through trial
; and error.
loop:
jmp x--, loop    side 0b01 [4]    ; just hang out, man

; literally just wrap, the `wait 1 irq 0` will stall and hold INTB low
.wrap

This code expects a clock equal to 1/32 of a GCK h/l. With the system clock at 150 MHz, we have to use a divider of 387.5. While this is fractional, and thus maybe adds some jitter, it doesn't seem to cause inaccuracies in the final waveform. I theorize that this is because a divider with 0.5 is pretty close to an integer, so the PIO clock divider doesn't introduce error.

GCK SM

This one's big, because it does most of the work.

; one SM that controls just GCK
; side-set: 1 pin, GCK
; clock: 1/32 of a (full-length) GCK h/l
; autopull: enabled, threshold 32 bits

; this expects a DMA stream of 32-bit ints, starting with a number of
; skipped image lines (not GCK h/ls) to skip, followed by a number of
; image lines to change. the changed lines count has to be ( - 1), and the skipped one has to be that way too.

; INTB/GSP SM sets irq 1 on the cycle after GSP rises. the wait
; instruction takes two cycles, so we have to add some fine-tuned delay
; over here.

wait 1 irq prev 1       side 0 [6]  ; wait for INTB/GSP SM in previous PIO
out x, 32               side 1 [15] ; get first skipped lines counter, rise GCK1,
                                    ; wait almost half of it
nop                     side 1 [14] ; wait rest of GCK1
jmp !x, initialchange   side 1      ; if we're starting immediately, we have to do GCK2 separately
nop                     side 0 [15] ; fall GCK2 and wait for its first half
nop                     side 0 [15] ; wait the rest of GCK2
; if we start with changed lines, we have to send data starting on GCK2

.wrap_target
nop                     side 1

skiploop:
nop                      side 0
jmp x--, skiploop        side 1

irq set 2                side 0
; now flow seamlessly into changed lines loop

; note that we have a [14] because that flow is in fact seamless.

changedstart:
nop                     side 0 [14]
out x, 32               side 0 [15] ; get counter and wait rest of this long GCK

changedloop:
nop                     side 1 [7]  
set pins, 1             side 1 [15] ; rise GEN and wait rest of this GCK half
set pins, 0             side 1 [7]  ; fall GEN

nop                     side 0 [2]  ; fall GCK
changedloop_from_inject:
nop                     side 0 [4]
set pins, 1             side 0 [15] ; rise GEN, wait
set pins, 0             side 0 [6]  ; fall GEN, wait     
jmp x--, changedloop    side 0      ; repeat until x == 0

exit:
out x, 32               side 1      ; get next skip counter (which will never be 0,
                                    ; that's only for the start)


; if the number provided is greater than 1 (2 skips), the first high skip pulse will
; be 2 short cycles instead of one. this doesn't affect the image, and it is not out
; of spec, but it is worth noting.

; and go back to skips
.wrap


; if we start without any skips, we need a special code path.
; this has to do GCK2 here (or, in this case, start it) AND get
; the first changed lines counter. this code block must also complete
; all of GCK2, because GCK2 cannot have a GEN pulse.
initialchange:
irq set 2                    side 0 [15]
out x, 32                    side 0 
jmp changedloop              side 0 [14] ; next instruction will rise GCK
; this goes into the main loop, because the main loop can the rest of the GCKs

This state machine handles all of GCK except for the final pulses. The CPU or DMA must send it 32-bit ints containing the number of lines to skip, then change, in that order (and repeating if necessary). If the frame starts with changed lines, that first number must be zero, and the state machine will detect this and go right to sending changed lines. Each value representing the number of lines to skip must be the number of lines minus one (value = 19 for 20 lines), because of the way the loops work.

This state machine also expects 1/32 GCK h/l for its clock. This does mean that this approach to partial update has one shortcoming: the length of a partial GCK h/l is about 2.5 ms. The screen itself will work with an h/l that is 1 ms or longer, so we are definitely wasting time with this approach. The only way to get the h/l time shorter is to change the SM clock, which would require even more instructions to work around and get the right full-size GCK h/l.

This, I feel, is evidence that Sharpie pushes the limits of what PIO can do. Any more complexity (possibly including the change just mentioned) would probably make the case for moving this whole thing to an FPGA, if it were to be used in a real device.

GCK end SM

This is the least elegant state machine in this whole setup.

; side-set: 1 pin, GCK, with the `opt` flag.
; clock: 1/32 of a GCK h/l

wait 1 irq prev 1 [6] ; wait for IRQ to start GCK from previous PIO
waitloop:
jmp x--, waitloop

.wrap_target

; set irq 3 for INTB fall
irq set 3            side 0b1
set y, 29            side 0b1
highloop:
jmp y--, highloop    side 0b1


set y, 29            side 0b0
lowloop:
jmp y--, lowloop     side 0b0


; continue wrapping until FIFO is empty to get the appropriate number
; of loops
; then when the FIFO is empty, this SM will stall here and hold
; GCK low
out x, 32            side 0b0
.wrap

The GCK end SM works exactly as we described, but because it runs at a high speed compared to a full-size GCK h/l, it has to waste a lot of time for each GCK h/l. Because we also want to have six h/ls total, we use the final out instruction to control the number of times the code runs. The FIFO must have 2 values on it (the exact value is unimportant, they just need to be on there) so that the code in the wrapped section runs 3 times total.

This SM also relies on one of the more hidden features in PIO: if multiple state machines have outputs on the same pin, then on every cycle, the pin configuration (direction and level) will come from the highest-numbered SM. This means that the GCK end SM must be on a higher-numbered state machine within the same PIO block as the GCK SM. Then, the GCK SM can do all its clocking while the GCK end SM waits (with optional side-set, it isn't setting any pin values), and when the GCK end SM is ready, it'll activate its side-set and override any value the GCK SM is putting on the GCK pin.

Horiz/data SM

This code is the least changed from the original full-frame code. In fact, it's so similar that I'm not going to offer much explanation, because I already did that in the very first post. There are a few differences, though: this state machine is capable of loading its charged register values out of the FIFO without any forced instructions, and it will load a 32-bit changed lines counter (minus 1, because all the loops here need minus 1) before reading any data.

; side-set: BCK and BSP
;           bit 1   bit 0

; clock: same as in full frame PIO, 6 MHz instruction clock

out isr, 32       side 0b00 ; get the inner loop counter and make ISR backup
mov x, isr        side 0b00 ; copy it to x
out y, 32         side 0b00 ; put first changed lines (times 2) counter in y

.wrap_target

; wait for GCK rise from PIO 1

wait 1 irq next 2 side 0b00

; this has to stall, because GCK will set irq 2 even when it
; stalls (when GCK end is activated), and we don't want to try
; to send data when we shouldn't.


; we can't tamper with any of the delays beneath the `restart:` label, because
; then the loop is broken (past the first time, at best, and always, at worst)

restart:
mov x, isr        side 0b01 [1] ; BSP rises 333 ns after GCK1 rises and charge X for this loop
pull              side 0b11 [1] ; BCK1 rises 333 ns after BSP rises, and get the outer loop counter
out pins, 8       side 0b11 [1] ; hold BCK1, BSP still high, set data out
nop               side 0b01 [1] ; fall BCK1, BSP still high

loop:
out pins, 8       side 0b00     ; fall BSP, next data out, middle of BCK2
jmp !x, exit      side 0b00     ; exit the loop if it's the last iteration (data goes to 0 on BCK121)
nop               side 0b10 [1] ; rise BCK
out pins, 8       side 0b10 [1] ; hold BCK, data out
jmp x--, loop     side 0b00 [1] ; fall BCK, jump

exit:
nop               side 0b10 [1] ; rise BCK121
mov pins, null    side 0b10 [1] ; set data pins to zero
nop               side 0b00 [3] ; fall BCK122 and hold for all of 122
nop               side 0b10 [3] ; rise BCK123
jmp y--, restart  side 0b00 [1] ; fall BCK124, reach middle, restart

out y, 32         side 0b00     ; load next outer counter into y
.wrap

This still relies on having an inner counter with a backup in ISR, but it has to get the number of lines it's going to change from the FIFO. Currently, the first lines counter must be manually pushed and placed into a state machine register, but it would be better to do it with the same DMA stream that writes pixel data, to simplify the code on the CPU side. This is easier said than done because the horiz/data SM must grab this counter without messing with the signal timing.

CPU code

You can view all the code that makes this tick, so I'm not going to step through the PIO initialization code. Instead, let's look at the code that sets up for a frame. This is a section of the full code, which you can view in sharpie-sw in the repository.

With this code, I'm demonstrating how to do a partial update including a changed section of 19 lines filled with green pixels at the very top of the screen, then another, smaller changed section of 5 lines of blue pixels 200 lines below the first changed region. There are other partial update sequences in the full code, using various combinations of the same numbers, if you're curious.

PIO intb_gsp_horiz_pio = pio1;
PIO gck_gck_end_pio = pio2;

uint partial_intb_gsp_sm = 0;
uint partial_horiz_data_sm = 1;

uint partial_gck_sm = 0;
uint partial_gck_end_sm = 1;

// Initialize partial update stuff for one specific, predefined partial update.
#define SKIPS (200)
#define CHANGES (19)
#define SKIPS2 (20)
#define CHANGES2 (5)


// ...


// 0 at the start of a GCK control data block tells the SM to start
// sending changed lines beginning with the first line on the display,
// instead of skipping some quantity.
uint32_t gck_control_data3[] = {0,
				CHANGES - 1,
				SKIPS - 1,
				CHANGES2 - 1,
				(320 - SKIPS - CHANGES - SKIPS2 - CHANGES2) - 1};

// When the changes start at the top of the screen, they start on
// GCK2, so we have to open this with 1*32 instead of 2*32.
uint32_t gck_end_timeout3 = 1*32 + 
			    (CHANGES*2 + 1) * 32 +
			    SKIPS * 2 +
			    (CHANGES2*2 + 1) * 32 +
			    // we still use 319 because 1*32 includes the
			    // first line.
			    (319 - SKIPS - CHANGES - CHANGES2)*2 + 1; 

// ...


// this value never changes
const uint32_t gsp_high_timeout = 53;

// note the +120 for final 1/2 line of zeros, which is used on the
// extra GCK h/l during a changed section that we mentioned in the GCK
// end timeouts above.
// the +4s are for the 32-bit changed lines counters
uint8_t partial_frame_pixels[4 + CHANGES*240+120 + 4 + CHANGES2*240+120];

// ...within a function...

  // push a value for how long the INTB/GSP SM should leave GSP
  // high. this value is a constant, but it's bigger than 5 bits so we
  // can't use a `set` instruction in the state machine.
  pio_sm_put(intb_gsp_horiz_pio, partial_intb_gsp_sm, gsp_high_timeout);
  
  // prepare GCK end SM by giving it the proper counter
  // important! make sure this is correctly calculated, otherwise the
  // signals at the end of the frame will be deformed
  pio_sm_put(gck_gck_end_pio, partial_gck_end_sm, gck_end_timeout3);
  
  // put two zeros---the value doesn't matter but the number of
  // numbers does---on the GCK end FIFO so that the wrap repeats 3
  // times total (this facilitates a code-saving measure)
  pio_sm_put(gck_gck_end_pio, partial_gck_end_sm, 0);
  pio_sm_put(gck_gck_end_pio, partial_gck_end_sm, 0);
  
  // place the GCK end timeout counter in GCK end SM's x register
  pio_sm_exec(gck_gck_end_pio, partial_gck_end_sm, pio_encode_out(pio_x, 32));

  // charge horiz/data SM's inner loop counter
  // get the counter, put it in ISR (backup), then copy to x
  // this will be used repeatedly throughout a frame (and
  // this is also exactly how the full-frame PIO works)
  pio_sm_put(intb_gsp_horiz_pio, partial_horiz_data_sm, 59);
  // we don't need to force any instructions into the PIO here like we
  // do with the full-frame program, because there's enough
  // instruction space left in the INTB/GSP/horiz/data PIO for the
  // `out`s and `mov` that charge registers appropriately. that also
  // means that the first changed lines counter for horiz/data can go
  // in the DMA stream.

Then we have to actually send the partial frame:

// ...later on in the code...
  
  // this is the GCK control DMA stream
  dma_channel_config partial_gck_c = dma_channel_get_default_config(gck_dma_channel);
  channel_config_set_read_increment(&partial_gck_c, true);
  channel_config_set_write_increment(&partial_gck_c, false);
  channel_config_set_transfer_data_size(&partial_gck_c, DMA_SIZE_32); // we use the WHOLE width of the FIFO entry
  channel_config_set_dreq(&partial_gck_c, pio_get_dreq(gck_gck_end_pio, partial_gck_sm, true)); // true for sending data to the SM
  dma_channel_configure(gck_dma_channel, &partial_gck_c,
			&gck_gck_end_pio->txf[partial_gck_sm],
			// getting the number of transfers right would
			// be much better done in a higher-level
			// language with easy array size tracking
			gck_control_data3,
			5,
			true);

  // and this is the pixel data and changed lines counter for horiz/data DMA stream
  dma_channel_config partial_data_c = dma_channel_get_default_config(pixel_data_dma_channel);
  channel_config_set_read_increment(&partial_data_c, true);
  channel_config_set_write_increment(&partial_data_c, false);
  // we found originally that transfers have to be 32 bits, for some
  // reason, and that still holds true for slightly modified horiz/data.
  channel_config_set_transfer_data_size(&partial_data_c, DMA_SIZE_32);
  channel_config_set_dreq(&partial_data_c, pio_get_dreq(intb_gsp_horiz_pio, partial_horiz_data_sm, true));
  dma_channel_configure(pixel_data_dma_channel, &partial_data_c,
			&intb_gsp_horiz_pio->txf[partial_horiz_data_sm],
			partial_frame_pixels,
			1 + (CHANGES*60 + 30) + (CHANGES2*60 + 30) + 1,
			true);
			
	// set IRQ 0 for the INTB/GSP SM to start the transfer
	intb_gsp_horiz_pio->irq_force = 0b1;

GCK counts are 32 bits, because the horiz/data DMA has to be 32-bits to work, and because 32-bit values will always be aligned with the pixel data.

Result

Click for full size.

Although partial refresh is working very nicely, it's not technically complete. A complete implementation would, on the CPU, take some sort of data structure and convert it to gck_end_timeout and the rest of the data required for a partial refresh. It would also (if possible) leave the SMs in a configuration where they don't need to be reset to do another refresh. I'd like to test video playback on this screen, so I might take it to that complete implementation at some point.

Conclusion

I'm publishing this at the end of 2025. I'm very happy with how far I've taken Sharpie this year, even factoring in the many-month delay I had to endure before the screens came back in stock. The next post will probably be the last Sharpie post (famous last words), all about the image processing I've done and maybe even with a nice image gallery, if I can figure out how to photograph this thing well.