Tuesday, 31 December 2013

Rewriting sidecar2 - much better JTAG speed now

One of the key features of my FPGA board is the coprocessor - an Atmel ATmega32u2 running code I'd codenamed "sidecar". This is responsible for the USB interface to the PC and disk emulation.

Up until this point, my USB code was quite hacky - based on the LUFA example USBtoSerial example, it had a USB-serial bridge and also allowed JTAG operations using USB control messages. Unfortunately, the chip I'd chosen only supports a maximum of 4 endpoints and a virtual serial port requires 3 endpoints itself (actually, the FTDI chip only uses 2, but it consequently needs a driver as it's not a standard USB CDC device) and so there was little choice in the matter. And control messages seemed to work for the most part, but I'd often get random disconnects whilst programming the flash chip (originally, it was about 1 failure every 5 flash cycles so I wasn't too worried). As the code I was running on the Atmel was getting more and more complicated (it also handles FDC emulation and the SD card), this failure rate increased to the point where the majority of reflash cycles were failing.

So, this was annoying and I was also getting fairly poor performance from my JTAG interface - it'd take about 20 seconds to flash the PROM for instance, which for a 200KB file is pretty slow! In the mean time, I'd decided to research booting the FPGA over JTAG as then at least I wouldn't be pummelling the PROM with all these erase-write cycles.

So, you know from the previous post, I've had a lot of difficulty with this, ultimately it turns out that contrary to the Xilinx docs, JTAG mode isn't always available and as soon as you start configuring the chip, it resamples M0,M1,M2 and reads data from there (although unreliably as it seems to use the JTAG TCK as it's sample clock, but it's still feeding the PROM CCLK at a much faster rate). So, I hooked up a spare data pin from the Atmel to the M0 and M2 pads (which fortunately, I'd had the presence of mind to bring out to a jumper as I'd always planned to support JTAG boot). This didn't work either.

See the jumper wire in red

By chance I re-read the documentation and realised this pin was actually 2.5V level and so feeding it 3.3V from the Atmel, whilst not damaging the chip (fortunately) was causing erratic behaviour. I discovered that if I left the pins floating (I'd previously been pulling them to GND for PROM boot) then JTAG boot worked as the Xilinx chip actually contains pull-ups for these configuration pins. So far so good. However, I actually want the default option to be PROM boot and the Atmel takes a little time on boot before it can pull this pin low, so now the FPGA remains stubbonly uninitialised on power on. I tried having a pull-down resistor on these pins, but they fought with the pull-ups and put the voltage into the "not quite high, not quite low" territory, and whilst I could probably have won the battle by using a really low resistance pull-down, I didn't want the current loss when the Atmel was driving this high...

Despite all this, even though I could now boot via JTAG, this process was still unreliable - the USB stack was still crashing randomly, and so the whole thing was really frustrating. I got so annoyed, I kind of ignored the project until this week of holiday over Christmas. I'd wanted to do the USB stuff by interrupt, but the LUFA docs specifically mention how support for this was removed and how polling was the only option. However, my curiousity was piqued when I found Jim Paris' source for lufa-ftdi which emulates the functionality of an FTDI 2-endpoint serial port and does so solely through interrupts. I took a look, learned a lot more about the Atmel USB stack and started rewriting my sidecar code in a similar way.

So now, I have a rock solid JTAG implementation using 2 endpoints and because the whole process is optimised to use ping-pong buffers, it means I can do a 128-bit JTAG exchange per USB packet. Because these are bulk endpoints too, there can be more than one in a 1ms period, and because I'm using ping-pong buffers, I can be reading from a receive buffer (OUT), bit-banging the JTAG exchanging and writing to the transmit buffer (IN), whilst the PC is still reading the previous result (IN) buffer. Pretty sweet. Now, I can write the PROM in 8 seconds and what's more, the JTAG boot works reliably and boots in under 5 seconds.

I'm probably going to drop the serial port functionality, just because serial ports weren't really all that common on the CPC anyway, and I'm exploring the possibility of reconfiguring the USART into SPI-master mode and doing 8-bits of JTAG exchange that way instead which would further increase the speed.

Another casualty of this rewrite is that I've lost all my old FDC emulation and FAT code. The FDC code needed rewriting as its state machine was pretty buggy, but this is the next thing on the horizon now... :)