Sunday, August 26, 2018

Control and Sync Generation in the AY-3-8500


In my last post I showed off a virtual tangle of spaghetti circuit map called the AY-3-8500. I've been looking close at the circuits to understand it, and I'm sharing anything I find on this blog. Lets start with the control signals and sync generation circuitry.

To recap, the GI AY-3-8500 was a PONG clone chip produced from the late '70s to early '80s. When connected to controls, support circuitry, and a television, it could play several variants of PONG plus two shooting games in black and white. That's all. Millions of them were produced and used in first-generation video game consoles. More on the history in my first post.

Sean Riddle decapped and photographed a specimen from his own Hanimex 777 last year. I decided to undertake reverse-engineering it's internal circuitry, and build an online virtual simulation of it in the process.
The simulation allows the user to step through individual clock cycles and view the transistors as they turn on and off. You can find the chip simulation here, and some information about it in my last post. You can follow along with it by clicking the link above, or the links to specific areas throughout the article.

MOSFET mechanics

An explanation of how the internal electronics work is necessary. The chip contains 23531 transistors which act as tiny switches, all in an area smaller than a fingernail. All of the transistors in this chip are NMOS transistors, short for n-channel MOSFET (Metal Oxide Semiconductor Field Effect Transistor). MOS technology allows the creation of smaller transistors than other semiconductor processes, resulting in chips containing tens of thousands of transistors by 1980, and billions today. The 'N' stands for n-channel, transistors made this way will open when their gate voltage is high, also known as enhancement mode transistors. NMOS technology allowed for the fastest compact MOS transistors, before more advanced CMOS technology became widespread in the 1980s.


Above is a side view of an NMOS transistor, along with a top view of some transistors in the simulation. Each transistor has three connections, the gate, the source, and the drain. A layer of oxide (green) crosses the source and drain (blue lines). When the wire (grey) connected to it's gate is powered the source and drain become electronically connected. From only transistors and wires, logic gates, adders, and entire microprocessors can be built.

The chip has four important layers.2 The transistors (green), the metal traces (grey), the diffusion areas (blue) and vias (light blue). Diffusion can only form the source/drain of transistors, metal can only form gates. For diffusion and metal to connect, they need to travel over a via, which functions as a square hole in the insulating layer.

The clock circuitry

According to the datasheet, in a normal configuration, the chip uses a standalone 2.012Mhz oscillator. Sometimes the chip may be wired to the AY-3-8515 color adding chip, which outputs a 2Mhz clock signal from its own oscillator (the majority of systems made with the chip stuck to black and white.) This signal is wired to pin 17 (clock in). In a real chip, thin gold bond wires attach the pin to a pad on the die surface. In the simulation, the clock pad is the third from the bottom on the left hand side. When you start it, the support code should quickly switch the clock pad on and off, simulating an external clock.

The clock pad
The pad is connected to a L-shaped transistor through a layer of diffusion. When the clock pin is powered, this transistor grounds a node named "internal_clock". A node is the collection of all wires connected to one another. Their are a few things that need explaining. First, below the L-shaped transistor is a rectangular transistor powered by ... the ground node? This is a gate grounded NMOS, a form of ESD (ElectroStatic Discharge) protection. All input pins on the chip utilize them to help protect the delicate internal circuits. Second, above the L-shaped transistor is an unseen pull-up transistor. These will "pullup" the node into a high voltage state, unless any part of the node is grounded. The chip has a lot of them (860 to be exact) all lined up next to the Vcc (power) conduits. In a pullup transistor, the connected node both controls the gate of the pull-up and connects to the drain of the transistor, reducing leaked current. One last note, the internal clock's signal will be inverted from the one coming in. This doesn't matter in the case of the clock input, just remember that most of the input pins are also inverted.

A total of five pullup-transistors (white) near the clock pin (Courtesy of Sean Riddle)

The internal clock line goes two ways. Following it up a little ways upward leads to circuitry that generates a two alternating clock signals. The circuitry has four nodes running vertically, labeled here as 1,2,3, and 4. There are two other important nodes, lets call them linger and disable. The metal in linger functions as a (very weak) capacitor. Clicking on it will display the words "NO_PULL" in the upper-right. Because it has no pullup transistor, it's state will "linger" from what it was last. Nodes without pullup transistors will eventually leak, this is why DRAM needs to be refreshed every few milliseconds. On this chip however, these little capacitors never go more than a few microseconds holding charge, so leakage is not a problem.  The disable node serves to freeze this circuitry whenever an invalid game combination is detected.

At the beginning of it's cycle, 1 and 4 are the only high nodes. Advancing the chip shows that as the (internal) clock line goes high, 1 goes low as linger goes high. As the clock goes low again, node 2 is now not grounded by either node 1 or the clock line. 2 is pulled high, and grounds 4, which allows 3 to be pulled high.
Another advance will make the clock active once again, grounding node 2, and connecting linger to the grounded node 4. Once the clock falls again, node 1 is not grounded by lingering and will ground 2 and 3, allowing 4 to become high once again. This only works because the linger node does not have a pullup transistor, and thus will not return to a high state once no longer grounded. The end result of this circuitry is two alternating clock signals coming out of 3 & 4, with half the frequency of the original.

Because it changes at only half the clock's frequency, the clock signals coming out of this circuitry only allow half the horizontal resolution of the 2Mhz signal. To display thinner areas, such as the paddles, the original internal_clock signal is ANDed with various control circuits.

The horizontal counter


The shift register (top) and part of the decoder (bottom)
To the right of the clock circuitry is a counter which tracks the horizontal position of the electron gun. The chip designers didn't use a normal binary counter, instead they used a linear feedback shift register (LFSR.) A LFSR is a simple shift register with its input based on state of the current bits. It uses less die space and transistors than an ordinary counter, at the expense of not counting in normal binary order. The shift register is made of seven chained counting units, each with a flip-flop circuit. Each bit has a wire pair traveling downward, one matches the bit's state and the other is the opposite of the current state. The shift register's units are actually identical to the previously described clock circuitry, except that the linger node is set to the state of the previous counter, rather than to its own state.

The XNOR gate

The counter advances every time the "hrz_shifter" node goes low. The input line (normally pulled high) is grounded if either the bit 0 signal and bit 6's inverted signal are both high, or if bit 0's inverted signal and bit 6's signal is high. In simpler terms the input is an inverted exclusive-or (XNOR) of the end bits. The shift register will begin with 0000000 and, as the end bits match it will become  1000000, followed by 0100000 (0, as the end bits don't match), 1010000, 0101000 etc...

The horizontal counter's value goes into a textbook binary decoder.3 Fourteen control signals pass through it from left to right. The control signals will always be grounded by at least one transistor, unless the number in the counter matches the (opposite) of the transistors in the row. When it matches, the control line is allowed to be pulled high.

I wrote a quick python program to find out when/where the control signals are triggered based on the decoder's structure. Remember that the counter advances at half of the divided clock rate, or every microsecond.

Signal 12: 11μs (left wall)
Signal 10: 14μs (player 1 goalkeeper)
Signal 11: 22μs (player 2 forward)
Signal 8: 24μs (player 1 score start)
Signal 7: 29μs (player 1 score end)
Signal 6: 31μs (center line/wall)
Signal 5: 33μs (player 2 score start)
Signal 4: 38μs (player 2 score end)
Signal 3: 39μs (player 2 squash paddle)
Signal 2: 40μs (player 1 squash/forward paddle)
Signal 1: 48μs (player 2 goalkeeper)
Signal 9: 49μs (right wall)
Signal 13: 60μs (sync start)
Signal 14: 64μs (sync end)

Analog TV and sync signals.


A little background on the working of analog TV is necessary before we cover the next part. (NTSC) analog TV is generated from a continuous signal that varies between a black voltage (~0.3V) and a white voltage (1V), the exact voltage determining the strength of the electron gun's beam. The electron gun scans across the screen like an (English) book, left to right. In an old-fashioned television camera, the same thing would be simultaneously happening with some sort of scanning tube.

To synchronize a camera and a receiving television set, sync signals are mixed in with the sensor. A sync signal is indicated by 0V, "blacker than black." The camera will create a short sync pulse of a few microseconds at the end of every scanline, the receiver set filters this out and will trigger a horizontal sync, sending the beam back to the left and a line lower then before. Once the beam gets to the bottom of the screen, a much longer sync pulse triggers the vertical sync, sending the beam back to the top, to draw another field.4  This will repeat about 60 times a second for NTSC television. PAL television, used in most of Europe and the world, is incompatible with NTSC (mostly in North America) and SECAM. Because of this, there are at least two variations of the AY-3-8500 chip, the original (PAL) and the 8500-1 version (NTSC.) I don't know if a SECAM version was ever made.

The different standard's signals are quite similar, when you don't take color or sound into account.5 One import difference is the framerate. PAL draws fields at 50hz (except in Brazil) because of 50hz power standards. As the chip updates ball movement every field (I'll get to how this works eventually), this means the NTSC version (probally) plays faster.

The AY-3-8500 uses a separate pin to generate the sync signals. In a normal console setup, the sync line merges with the other video output pins onto the output line. The output line travels into a RF modulator inside the console, which puts the game signal onto channel 3 or 4, then into the television.6 The sync signal travels through a 510Ω resistor, and a 220Ω resistor pulls the output line low. Because of this, when the sync pin is pulled low by the chip, it's 0.3V (black) signal turns into a 0V signal (sync). Lets look at what that sync pin is hooked up to internally.

The horizontal sync

The sync logic is actually very simple, it's just horizontal control lines #13 and #14 connected to a simple SR latch, on the chip here. Two nodes, that I creatively named hsyncON and hsyncNOT_ON, invert the other, making a stable latch as shown on the right. Control line 13 grounds hsyncNOT_ON, allowing hsyncON to go high. Control line 14 does the opposite. When powered, hysyncON grounds the node labeled to_sync, outputting a sync signal instead of black (remember low=sync.) Control line #13 is triggered after 60 increments of the counter, line #14 after 64, resulting in a 4μs sync pulse at the end of every 64μs scanline

Another important node is hsyncNOT_ON2, this one branches into six different circuits across the chip, presumably updating logic that needs to be updated every scanline.

One other thing of importance, when the final control signal triggers (#14) it will activate the hrz_reset line here, resetting the horizontal counter to state 0. This tripped me up for a few minutes, as control signal 14 never seemed to go high. In reality, it does, it just forces itself off before the clock fully rises.

The vertical circuity


The hsyncNOT_ON2 wire leads to a small circuit near the clock pin. This is a signal divider (by 2), identical to the clock halfer. Instead of advancing the horizontal counter, this one's output advances the vertical counter. The vertical counter is almost identical to the horizontal counter, its just another LFSR. This one is 8 bits long instead of 7 and the feedback results from the XNOR of bit 4 and 7 (4th to right and rightmost). You can see the vertical counter advance by clicking the "one line" button in the upper right (2nd from right.)

The matching decoder circuitry (above the counter) has smaller transistors and outputs only 10 control signals. The smaller transistors are due to the vertical counter being advanced at a much lower frequency than the horizontal counter. The horizontal counter and other high-frequency components use larger/higher current transistors to change state faster.

Once again, this counter doesn't count in normal binary order so I modified my last program to figure out when the control signals are activated. Note that the vertical counter only advances every 4th line because the counter advance signal's frequency is halfed, and every other line is skipped due to interlacing. Once again, this resolution limitation is overcome by connecting the slow control circuitry and faster signals with an AND gate.

Signal 10: Line 80 ?
Signal 9: Line 84 (sidelines)
Signal 8: Line 88 (score start)
Signal 7: Line 136 (score end)
Signal 6: Line 164 (goal wall end)
Signal 5: Line 384 (goal wall start)
Signal 4: Line 464 (sidelines)
Signal 3: Line 468 ?
Signal 1: Line 516 (vsync start)
Signal 2: Line 524 (vsync end)

Just like before, the vertical sync latch (to the right of the horizontal latch) is a SR flip flop controlled by the top two vertical control signals. Control line 1 will flip the latch of vsyncON and vsyncNOT_ON, allowing vsyncON to ground the to_sync wire. Once control line 2 is triggered, both the flip-flop and vertical counter are reset. As each horizontal line takes 64μs to scan, the vertical sync lasts for the last 256μs of a   16768μs long field.
All the control signals overlaid on the game screen

Wrap up

With nothing but an alternating clock waveform, the AY-3-8500 generates a complete television picture, along with a set of control pulses which trigger display of most of the game elements. In my next post, I'll cover the circuitry behind the either the field, scores, or paddles. In the meantime, feel free to comment and ask any questions below.

Notes

1. I got that number from processing the image files. Along with the 2353 switching transistors, there are 860 pullup transistors, 973 electrically unique nodes, for a total of 10327 polygons. 

2. If you look at the original die photo (with the top layer removed) you can see a purple protection layer around the pads, possibly p-type diffusion. This doesn't affect the chips logic, so I left it out. 

3. This decoder has two minor differences than a regular decoder First, a node named "half_clock" forces all lines low while the shift register updates (so the control signals activate for only one clock cycle.) Second, decoders require one inverter for each input. As the shift register's design contains a pair of inverted nodes, an extra inverter is not needed. 

4. A field is one-half of a full frame. Back the 1930's television designers found that (then) high-definition systems would take so long to draw that major flicker would be present. The solution was interlacing, drawing half resolution pictures 60 times per second (known as fields, as the electron beam creates rows like a field of crops). Every other field was drawn in-between the lines of the previous one, resulting in a full image 30 times a second. Analog TVs definitely did flicker, but at a manageable amount. Digital TV is not limited to refreshing a single point on the screen at once, which allows it to display much larger screens with no flicker. For example: 8K UHD, the highest resolution standard available, has about 97 times as many pixels as NTSC television (if sampled at a normal 4/3 aspect ratio.) 

5. PAL has 625 total lines of resolution VS NTSC's 525. There are some other variations in sync times and voltage levels, but these changes are easy to account for by changing the values in the control signals. 

6. Many 1st and 2nd generation consoles use a RF modulator to put the signal onto channel 3 or 4, as well as an antenna/game switch-box to choose between the game or the antenna. This was necessary before composite-compatible TVs became widespread. Composite connections don't need to modulate then demodulate the signal, resulting in better video quality, and no need to add modulators to the console. 

Friday, August 10, 2018

The Simulation is Finally Complete!*


Its been a few months since I made a post on here. In my last one, I laid out my plan to create a virtual version of the AY-3-8500 chip using the visual6502 team's simulator. Its taken a while to markup the die photos and turn them into giant JavaScript arrays using my image processor. Thankfully that's all over with. After I got the chip into the simulation, it took some bug-hunting to find problems preventing it from working correctly. Finally, once the chip started running, I added features such as a simulated television screen, buttons which toggle the input pins on and off, and the ability to set the internal score registers with one button press. Along the way I found a few flaws and made improvements to the original code.

There are still more things planned. The simulation does not yet support paddle positioning or the ball, only the playfield and scores. I plan on making a few posts on how those circuits work, while improving the simulator. So its not fully complete (hence the asterisk in the title)

The source is available here. It's written in JavaScript, so it should run on any (modern) web browser. You can download the source and run it from your computer, or as the purpose of HTML is to make downloading unnecessary, you can use the version I hosted HERE.


 Getting it running

To run the chip, "hit the step forward" in the upper right to advance the chip by half a clock pulse, hit the "run" button to make it step forward automatically. Each step will toggle the state of the clock pin (It will flash very fast) and update the circuitry. One of the circuits connected (almost) directly to the clock input is the horizontal position shift register. If you run the simulation you should be able see the bits inside shift from left to right.
The shift register
Below the shift register is a binary decoder. The horizontal control lines going through it will light up when the shift register reaches the right combination.

The signal output

As said before, not everything is implemented yet. The simulated television will only show the outputs from the play-field and sync pins in this version. As the chip runs, the simulated electron gun moves across the screen and the value of the output pins determines the color drawn. Sync pulses appear as red, these trigger new lines (HSYNC) or new fields (VSYNC.) Usually, the chip runs at a ~2Mhz clock, but ours will run much slower, allowing you to see the image being created.

You probably don't want to wait around for it to slowly generate the screen. There's a box called "Animate during simulation" below the chip image, if you uncheck this, it will run much faster. To get it to go at even higher speeds, hit the "one line" or "one field" buttons at the upper right. It may freeze the window for a few seconds while it processes though.

Below the buttons is a speed indicator. Using Firefox 61, I get ~13Hz, 100Hz without updating the animation, and about 3000Hz at full speed using the "generate field" button. I'm curious as to how fast this performs on other browsers/computers.

Interacting with the Chip

You can turn the game select and reset pins (bottom left-bottom right) on and off with the checkboxes above the TV screen. Tennis (default) displays an I-shaped court, soccer adds walls to the back. Practice and Squash create a one-wall court, and the rifle games don't create any court. You see what happens when you try out any combination. Soccer and Tennis will trigger a soccer game, while Soccer and Squash are invalid and will lock the clock circuitry.


The score circuitry is on the right middle. The "Update Scores" button sets the internal counters to the values in the boxes. Its quite fun to mess with the inputs and states of a virtual chip. If you break it, you can just hit the reset button!

Under the hood

I made some changes to the original chip simulator code. I had to add in functions to "force" nodes into other states (e.g. when changing the scores) The original simulations never needed to edit internal registers directly, and thus only had functions to change the pull-up/pull-down of nodes. The simulation also saves the chip's state every step into a string, which is read when stepping back. This can grow fast very quickly (multiple megabytes per second), I had barely any RAM left after generating a few fields. That was modified to save only a limited number of steps (400)

A very helpful addition I made was to highlight all the transistors connected to the selected node. Yellow ones are nodes controlled by the node (gate-connected), while pink are ones that affect the node if powered. Now it only takes a click to find out what can  ground a node, as well as what that node can ground.
It's definitely easier to analyse the chip with a simulation instead of plain die photo.

It took me a little while to understand the chip, and I left a lot of comments for anyone else who wants to understand how it works. Again, the source is here.
Original
Simulation


To be continued...

I promised a post on the workings of the control and sync circuitry last January. That's coming, just a few months later than I originally expected. I'll look into the gamefield and score circuitry after that, then improve the simulation so that it can make use of all functions of the chip. There is also another digital logic reverse-engineering project I might work on, one that's less than a decade old. Also, sometime I'll put the finishing touches on the image processor I wrote and put that online.

Until next time!