Skip to content

JCAP Log #9: Video Part 4

Connor Spangler edited this page Apr 30, 2018 · 35 revisions

NES CPU-PPU Setup

Video System Implementation

We finally have all the information we need to fully implement a VGA arcade graphics system, minus one final critical consideration. The main and cog RAM sizes and their constraint of the graphics representation solution have already been addressed, however the Propeller 1 core clock as it pertains to the pixel clock represents one final technical hurdle to overcome, and will ultimately define the high-level architecture of the system.

Decisions

Thanks to the robust community behind the Propeller 1 microcontroller, a vast amount of communal knowledge can be drawn from to apply to our own design decisions. This is in no better way showcased than with video display. Dozens of developers have created hundreds of different solutions to display a wide variety of video types, resolutions, refresh rates, and other variations. What can be learned from these implementations is that video display of a high complexity and quality simply cannot be accomplished in a single cog. This is largely a constraint imposed by the generation of pixel data itself. Some solutions solve this by splitting the scanlines into groups which are assigned to different cogs, while others interlace individual scanlines generated by individual cogs.

In our case, with two layers of indirection and sprite effects to implement, we'll be forced to use a different paradigm altogether: a scanline driver. With this method, one cog is the "display" cog. Its sole job is to take pixel data from main RAM and display it via the video generator circuit. N cogs are then spooled up as "render" cogs. Their job is to generate interleaved scanlines of pixels which are then requested sequentially by the display cog. The choice of this methodology is a direct result of simply doing the math...

Colors

A critical constraint posed by the "indirect" method of using waitvid discussed in Video Part 2 is that each series of 16 pixels can only have 4 colors: 2 bits per pixel addressing one of the four color bytes. We need 16 colors per 8x8 pixel tile, which even if we only push out 8 pixels per waitvid we're still restricted to a 4 color palette. The solution to this problem is novel: simply switch the color palette with the pixel palette. By populating the color palette with the colors of the next four pixels, we can directly display them by waitviding each color sequentially, i.e. waitvid pixels, #%%3210. This new paradigm works perfectly at giving us "full color", but requires more waitvids per screen, an issue that will need to be addressed.

Nanoseconds

It is in no way shape or form an exaggeration to say that the timing of this video system on the Propeller 1 comes down to single nanoseconds. Let's look at the numbers to find out why...

Our 640x480 @ 60 Hz VGA pixel clock is 25.175 MHz, which means we're displaying a pixel every 40 nanoseconds, or a group of 4 every 160 nanoseconds. Using our "direct" method of pixel output discussed above - displaying 4 at a time - we'll need to have a waitvid being blocked every 40*4=160 nanoseconds. Between each waitvid, we also will need to perform a rdlong to retrieve the next 4 pixels from main RAM. We're assuming also that we're not using a djnz to loop the instructions, and instead we're creating explicit scancode for all of the pixels on the line. Since we're upscaling This will take up (640/4) * 2 = A worst-case scenario waitvid takes 7 clock cycles from execution to pixels being pushed out of the video generator. A worst-case scenario rdlong takes 23 cycle, however because the intermediate waitvids are only 7 cycles, we're always hitting the best case of 8 cycles.

Assuming an 80 MHz core clock, where each instruction cycle is 12.5 ns, our reading and printing routine takes (8+7) x 12.5 = 188 nanoseconds. That means we're blowing our deadline by over 40 nanoseconds!


Propeller Video Output

Propeller Video Output