TO DO

   xil_printf("Fifo Occupancy:6x   Status: 0x%8x   MIR info: 0x%8x \r\n",
         WTL_DFMUX_INTERFACE_mReadFIFOOccupancy(baseaddr), 
         WTL_DFMUX_INTERFACE_mReadFIFOStatus(baseaddr), 
         WTL_DFMUX_INTERFACE_mReadMIR(baseaddr));
--- dfmux/hard/Winterland/pcores/wtl_dfmux_interface_v1_01_a/hdl/vhdl/dmfd_mycic_to_fir.vhd 2007-07-11 10:16:06.000000000 -0400
+++ svn/hard/Winterland/pcores/wtl_dfmux_interface_v1_01_a/hdl/vhdl/dmfd_mycic_to_fir.vhd 2007-07-09 10:22:41.000000000 -0400
@@ -183,6 +184,7 @@
     data_out <= v_data3;
     ready_data_out <= '1';
   elsif v_channel_count="00011" and v_ready_data4='1' then
+    ch_for_data_out <= v_channel_count;
     v_ready_data4 := '0';
     v_channel_count := v_channel_count + 1;
     data_out <= v_data4;
--- svn/xps/data/gen_version.awk 2007-06-01 08:51:17.000000000 -0400
+++ dfmux/xps/data/gen_version.awk 2007-07-17 08:52:56.000000000 -0400
@@ -1,5 +1,6 @@
 #
 BEGIN {num_lib=0;num_driv=0;}
+gsub( /\r$/, "" ) {}
 /^BEGIN LIBRARY/ {num_lib++}
 /^[[:blank:]]*PARAMETER OS_NAME/ {os_name = $4}
 /^[[:blank:]]*PARAMETER OS_VER/ {os_version = $4}

Week of May 28th - June 1st 2007

Friday, June 1st

Met with Matt & Eric to sort out workflow. We have two problems:

...and two workflows: The question now is which of these two flows will best resolve the two problems above. Next week I'll come up with a compelling story.

Week of June 4th - June 8th 2007

Monday, June 4th

This week's laundry list of questions:

Strung together a full signal flow (using Xilinx blocks for FIR and CIC); components are correct, but parameters (e.g. CIC stages) and interconnections (truncation) aren't. First impression: this is going to be slow, since simulation in Matlab is glacial and compiling for cosimulation (where part of the Simulink model runs in hardware within the FPGA) takes a long time to compile.

To do:

Tuesday, June 5th

Backed off a little: testing dmfd_square_mixer + XILINX CIC (2048x decimation); bus types and decimal point locations now sane.

BUT... cosimulation library is only useful if we play by Xilinx's rules for clocks and clock enables. The generated block is shown below.

Any inputs to Xilinx blocks (e.g. anything that passes through a gateway in) show up as inputs to the cosimulation block. That means any clock signals, too, unless we follow the rules. To figure out: how painful is this?

I think the crux is this: in VHDL, we have to deal with synchronization, especially where sampling rates change (e.g. where decimation occurs.) This is where System Generator is intended to remove headaches, but only if we let it: trying to use the existing VHDL within system generator looks pretty warty from here. On the flip side, the generated VHDL is not readable either.

Hm. Ditto for control logic: the "gateway in"-type inputs are for data_in, reset and write_enable. The latter two are sequenced in simulation using a "signal builder" block (not from Xilinx, so it doesn't get swallowed by the model). This works well for software simulations, but it's important not to forget about control logic. Especially since it's meaningful in the end product, but mostly shrugged off during signal path simulations.

The best plan of attack might look something like this:

Wednesday, June 6th

Today is ModelSim day. If Simulink is going to be a pain, let's see if we can find an advantageous mix of Simulink (for signal synthesis and analysis) and a VHDL simulation tool (to do the hard work.) This requires a VHDL testbench and signal I/O via files (which is probably slow and at any rate doesn't stream), but it doesn't involve shoehorning SysGen-phobic VHDL into Simulink.

First job: get Xilinx libraries built for ModelSim, and import them into a ModelSim project. Short story: run "compxlib" (a Xilinx tool, but it's on the system path for me). This builds ModelSim libraries for Xilinx IP blocks. Then, in ModelSim, do file->import library, and select the output path used by compxlib (here, c:\xilinx\xilinx91i\vhdl\mti_se). Each of the three libraries should be imported, which extracts them to yet another path. I used "z:\xlib" (where z: is a Samba mount.)

(...more to come. It's taking me a while to get used to ModelSim.)

Thursday, June 7th

OK, I'm catching on slowly. There's nothing wrong with my testbench, except that it takes a very long time for data to propogate through it. Of course.

Data goes in at 25 MHz. It comes out of the CIC at 1/2048th of that, or 12.207 kHz. Each FIR halves the data rate, and we end up with 6 stages, so the final output is 190.7 Hz. Using back-of-the-napkin reckoning, we'll have to wait for 1/190.7 Hz = 5.24 ms (Simulation result: 5,248,725 ns) for the first sample to emerge. That takes nearly an hour using ModelSim as a simulator. (There are some gains to be made via optimization, since I currently simulate with full signal visibility. That constrains ModelSim's ability to simulate efficiently. Loose extrapolation suggests about a 2x speed-up.)

It gets worse. Each FIR is 100 taps, so meaningful data has to traverse 50 taps of each filter before we see anything but zeros (or garbage, if nonzero.) Add a large start-up delay to the picture.

And worse again. Since we're interested in frequency-domain characteristics of the output, it's an absolute necessity that we collect gobs of data and analyze it spectrally. That adds another few orders of magnitude to the data collection requirement.

Now, that's using a single, monolithic ModelSim instance to take care of all of our simulation. It uses a file for input, and another for output. It's probably a good deal faster than Simulink (since it's tuned for speed, and built around RTL simulation.) It may be simulating all 8 channels, but a factor of 8 (definitely an optimistic factor) isn't going to make full software simulation a viable option.

Obviously, we're going to have to be more clever.

Cleverness might be starting to arrive. I've done a few trivial test sims:

Current thinking: cut the DSP flow into three chunks.

My suspicion is that we can only cosimulate a single block at once (since I strongly doubt bitstream-merging is possible.) That would suggest the FIR chain gets cosimulated, and everything else gets the software treatment.

This solves the simulation-of-signal-path problem. It's still not clear whether we'll export to pcore or port back to VHDL --- this is dependent on what the actual implementation looks like (and whether or not the existing VHDL can still fit within the Xilinx clocks-and-enables semantics, particularly given the limitations I'm imposing above.)

Week of June 11th - June 15th 2007

Monday, June 11th

Calculating the delays for the DSP signal path. We're interested in two scenarios:

In both cases, the delays are dominated by the FIR blocks, so the following is a bit sloppy in the first few stages and may also mix a few registering/downsampling delays.

First the end-to-end delay.

The flushing delay (assuming the number of filter taps doesn't change; otherwise, the number changes depending on where the FIR block restarts) is double this number, or about 524 ms.

A few sources of inaccuracy:

...but depending on what we need it for, this should be a good estimate. It also demonstrates how useless a slow simulation topology will be.

Tuesday, June 12th

Putting together a blackbox FPGA co-simulation (noun string) for the entire DMFD. Will evaluate whether it's sensible to generate a full 8-channel system, and thus avoid the work of hatcheting a single-channel system.

Wednesday, June 13th

Looks like it's going to be easier (faster) to do some hatchet work on wtl_dmfd. Two tasks (man, do I seem to love bullet lists):

As an aside, the ML402 evaluation board (which I'm using for cosimulation) uses a fixed 100 MHz clock. That means even though we only need to meet timing at 25 MHz on the DFMux board, it's useful to meet 100 MHz as well. We don't -- the trouble appears to be deep layers of logic at the square mixer.

For now, since co-simulation will run in single-step mode, I'm hoping this isn't a problem. This part of the system might take up less room if pipelined.

Thursday, June 14th

First impression: a few pipeline registers stuck in the square mixer component make a minor improvement, but I don't have a good enough idea where the slow paths are to do this properly. Plus, I'm starting to wander away from the "known good" design.

I'm working on two strategies, either of which will suffice:

If neither of these work, another option is to tweak the routing parameters so that place-and-routing tries harder before giving up on meeting timing. I'm not sure where this is configured right now, but it's probably in the toolbox tree (just like the path in the bullet list above.) I'm not sure if this will balloon the synthesis time into something unweildly, but it's not something I expect to do often. Plus, I'm spending a lot of time getting it to work anyways.

Hrm. The paths that don't meet timing in Xilinx ISE are different from those using System Generator. That might complicate things a bit... looking into the new-target option.

Urgh! Drivers for the Platform Cable USB (JTAG pod) don't exist for Windows Vista. Can't use desktop.

Solved! It suffices to run the system clock at 100 MHz and underclock the DMFD block (via the AddClkCEPair entry in the black-box setup m-file.) I've got the first running black-box simulation of the DMFD system, but it needs verifying (and streamlining, since JTAG cosimulation doesn't work on Vista and Ethernet cosimulation -- the only alternative I've got right now -- is probably a lot slower.)

I'm now trying to find out how well-behaved this block is. There are plenty of possible hiccoughs...

Friday, June 15th

Still too slow.

Co-simulation is fast enough that I can verify samples are going in and coming out again (working on it now), but too slow for any meaningful data collection. I suspect data I/O is the killer, and I'm wondering if I need to either reduce the number of redundant control I/Os (registers used only for setup), or if I need to move to asynchronous (non-clock-locked) cosimulation.

...and samples aren't going in or out either, apparently. Looking into it...

Monday, June 18th

I've removed the 100 MHz clock, but also tried to remove the dependence on falling clock edges (according to some -- not that I have a particularly strong feeling about it -- it's Bad Mojo to depend on both clock edges in a synchronous system.) Moreover, with single-edged clocking, timing is met without any of the equally hairy tricks I tried above.

This monkeying around with existing code is a bit frustrating, since I keep departing from the known-good code in SVN in order to simulate something that (touch wood) behaves the same way. However, I've added all 8 channels back into the synthesized design, so there's a bit of 'give' alongside the 'take.' Since the complexity of the design is largely the FIRs, the other 7 channels come with very little synthesis overhead.

Tuesday, June 19th

(see below)

Wednesday, June 20th

Working on 1-channel-izing some more. Now 8 samples get through (one per channel, where 7 channels are hatcheted) but no more. Q: does the multi-channel FIR expect strict sequencing (i.e. if it doesn't get data for the other 7 channels, does it lock up?)

Thursday, June 21st

Yesterday, I managed to get "interesting" (non-zero) data out of the board for the first time. However, only under free-running cosimulation -- in which case, it's not guaranteed that input data becomes output data.

When cycle-locked, output data never shows. That suggests there's a timing gotcha or that something needs registering. Looking into it now.

...fixed. There still seems to be internal glitching, since the output is extremely chaotic (zero input produces zero output, but everything else is madness.) Will look into it next time.

Tuesday, June 26th

Looking into square vs. IQ mixer noise performance characteristics. Since the theory is much simpler using sinusoidal mixing signals, it's useful to construct or decompose square signals into familiar sinusoids. Both the square and quarter-wave mixer inputs can be generated by sampling sinusoids, so it suffices (assuming no other signals collide in harmonic regions) to consider the usual product mixer case with sin/cos inputs.

Wednesday, June 27th

A to-do laundry list:

Today: working on mixer transfer-function document.

Thursday, June 28th

Friday, June 29th

Tuesday, July 3rd

I haven't disappeared, I've just been working on the transfer-function document. There's not much to track -- I'll post a version here as it gets closer.

The three mixers (sinusoidal, square-wave and quarter-band square-wave) are modeled with signal and white noise components. I'm working on narrowband noise models now.

Here's Today's draft: * document.pdf: document.pdf

Wednesday, July 4th

Simulations. The transfer functions I'm seeing agree with my calculations, but don't jibe with what Matt was expecting -- we suspect it's a matter of conventions. More tomorrow.

Thursday, July 5th

Figured it out. The difference pops up when variance (equivalently, RMS noise voltage) is converted to power-spectral density. I need to add a line or two describing the conversion between RMS gain and PSD gain, and clarify the bandwidth measures for noise (single-sided vs. double-sided).

Friday, July 6th

Here's the story. For narrowband noise, the output variance is related to the input variance via the following ratio:

...as derived in the last draft. We make use of the conversion formula between power-spectral density (PSD) and variance:

where BW is the bandwidth of the signal under consideration (this is simply Parseval's formula for power signals, where the frequency content is constant over a limited range and zero elsewhere.) Substituting this into the above expression gives the ratio of PSD out to PSD in:

This is for PSD measurements in volts RMS per root hertz. For watts/hz, (e.g. assuming a 1-ohm load) this ratio must be squared.

This confuses me, though -- there's an equivalence between narrowband and broadband noise that needs to be fleshed out a little further. (Question: why isn't the amplitude-PSD correction required for wideband noise? Answer: because we didn't use Rice's representation -- but that's not a very convincing answer.)

Here's a better one, although it only applies for truly white noise. When we gate white noise (one way to consider the effect of the quarter-wave mixer), we half the noise power. No problem. Alternatively, we multiply the time-domain noise by a 50\% duty-cycle pulse train, with fundamental frequency equal to that of the mixer. In the frequency domain, this is is equivalent to a convolution by the Fourier series expansion of the pulse train (with .) The expansion itself makes absolutely no difference in the white-noise case, since the frequency-domain convolution simply returns more white noise with a different amplitude.

I'm also looking at the FIR taps again. Consider the number of 25 MHz clocks per input sample (assuming 1 channel for now):

Stage Clocks per Input Sample (25 MHz) Taps (8 channels, neglects overhead) Channels (32 taps each)
1 2048 256 64
2 4096 512  
3 8192 1024  
4 16384 2048  
5 32768 4096  
6 65536 8192  

That suggests we already have enough headroom to handle all 8 channels at a 25 MHz FIR clock (not 100 MHz as is currently used.) Erm, why don't we? (Is the overhead really that big? What's the deal?)

I could do some simulations, but I bet there's an easy answer.

Monday, July 9th

Nope -- I did some simulations at 25 MHz, and it works just fine. There's enough overhead for more taps as well.

wave.png

There are two jobs involved in updating the DigitalFMux receive path:

Tuesday, July 10th

(see below)

Wednesday, July 11th

Commissioning SQUID boards. Rigged workstation with power supplies; DB25 power connector. Verified that bitstream on Wiki matches bitstream on "known good" board (Serial #002). Commissioned a single board.

Thursday, July 12th

Francois has been fixing the U19 insulator mats (hooray!) Showed him how to program a board, walked through the process for two. Programmed three more for a total of 6 boards.

Also working on the demodulator firmware. The FIR truncation for the first two filters has been chosen arbitrarily (this needs to be tweaked later.) Compiling now; hopefully I'll approach a testable build sometime today or tomorrow. (This is a long-shot build, and will probably need to be debugged.)

Friday, July 13th

Monday, July 16th

Tuesday, July 17th

Programming more SQUID boards; built an untested 25-MHz-clocked, filter-optimized DfMUX build. Working on FIR documentation (see wiki pages for DMFD signal path.)

Wednesday, July 18th

Thursday, July 19th

Awright, a busy couple of days. Here's the results of the first build of the new FIRs running at a 25 MHz clock. The output should be a 10 Hz sine wave (obtained by mixing a 1 MHz carrier with a 1.00001 MHz carrier.) It (I'll bet) is also what happens when you feed a FIFO clocked at 100 MHz using control signals at 25 MHz.

first_blood.png

Friday, July 20th

This is looking much better:

better.png

Don't get too excited, though. The changes I made to properly match the DMFD with the output FIFO don't always lock onto the correct sample in the plot from July 19th, meaning I get plots that are flat more frequently than I get this one.

In addition, there's some messiness at the maxima of this plot, suggesting an early-stage filter is saturating (and thus clipping, which is cleaned up a little by subsequent stages.)

More to come.

Monday, July 30th

Putting together 10 more SQUIDs. Also, considering the "9th channel" task -- seems a few dedicated channels are the best way to go, in order to avoid a large crossbar-type switch. It's also reasonable to end up with a "strange" number of channels (i.e. 8+1, or 16+1, or 16+2 etc) in order to make the number of "normal" channels into a power of 2.

Tuesday, July 31st

First task: fixing up the FIFO latching at the DfMUX output.

Wednesday, August 1st

OK, I need to find a more effective way of doing this than tweaking something and waiting hours for a full build.

The "right" way appears to be using the IBM BFM (Bus Functional Model) toolkit -- we isolate the DfMUX from the rest of the system at the bus level, and issue transactions on a preprogrammed basis. This will allow me to make sure everything behind the OPB (including FIFOs) is correctly sequenced.

There are also two important bonuses: it'll allow me quick feedback on modifications to the register interface (needed for the "9th-channel" problem, along with a number of wish-list items Eric and I have) and will allow rapid synthesis of the DfMUX to check out resource usage.

It's a free download, but I need to wait for Xilinx to approve my license. (24 hours, they say.)

Thursday, August 2nd

Man, what a pain. The BFM stuff is pretty heavy, and I'm not confident with the command-line Modelsim interface (or other internals) to feel like it's worth investigating yet.

In the meantime, I'm starting to understand where the latching problems are occurring. I need a better way to track my own changes (which aren't checked in to SVN yet, and aren't ready either.) Working on it.

Ultimately, the OPB FIFO interface expects to run synchronously. We're using too many clock domains (200 MHz, 100 MHz, and 25 MHz, operating on both rising and falling edges) for things to be as clean as they ought. Harrumph.

Friday, August 3rd

Dodged a bullet. With a single-clock strobe on the data_ready_to_fifo output from the DMFD, everything lines up nicely (but both the old and new code ignore the "got it" response from to FIFO write request, which bothers me.)

I'm now giving the FIRs a last look over, and finding out that Matt's specs weren't met with either the old or new designs. I'll have to have a chat about that. Once I've finalized FIR designs, I'll pick the right truncation points and start verifying.

Looking into SquareMixerTroubleWithHighFrequencies (placeholder) for Jeff MacMahon at Chicago.

Monday, August 6th

Met with Jeff MacMahon again. We're converging, although I'm still unnerved by running the square mixer at high frequencies. Finalizing filter designs (wasn't meeting specs by a wide margin; tweaked via noise shaping during truncation. See signal path documentation.)

Tuesdsay, August 7th

Got filter designs finalized and compiled; now testing results. The interface is a little frustrating at times.

A quick check: synthesize sinusoid at fc, demodulate at fc-200 Hz. For a first trial, fc is synthesized on-board (not using a function generator.) Comparing the situation at fc=500 kHz with fc=10.66 MHz suggests a higher white noise floor (but equivalent 1/f roll-off noise!). More investigation needed, esp. if I suspect a lot of high-energy carrier spurs in the high-frequency case.

Wednesday, August 8th

Worked on UDP data draining, testing via function generators.

Thursday, August 9th

Short-term to-do list:

Friday, August 10th

Monday, August 13th

Tuesday, August 14th

Wednesday, August 15th

Thursday, August 16th

Working on a few things:

Long haul, sorry for the lack of notes.

Friday, August 17th

To-do for work study wrap-up: (Obviously, this stuff will not all happen today.)

Monday, August 27th

Tuesday, August 28th

Working on memory-mapped interface. It's starting to come together; good, since most of the remaining wrap-up is contingent on it (if I get to do things the way I'd like.)

Here's the latest to-do list, in handy-dandy table formatting. Sorry, the links are cross-wiki and thus don't work.

Stuff I Did

Stuff I'm Working On Now

Stuff I'd Like To Do But Probably Can't

Wednesday, August 29th

Memory interface now behaving, but FIFO reads 0xdeadbeef. I messed something up elsewhere...

Thursday, August 30th

Nope. The interface between user logic and the OPB bus is multiplexed via wired-OR. Thus, when I'm not supposed to assert data bits, I really shouldn't (otherwise I flip bits on the real data.)

Looks good. Transfer between the MMIO interface and the DFMUX doesn't seem to work, though...

Friday, August 31th

Tuesday, September 4th

Still working on transfer between MMIO interface and DfMUX. The problem is synchronization back and forth between the OPB clock (100 MHz) and the DMFD clock (25 MHz). I haven't had an opportunity to double-check that transfers to the DMFS work properly, although I'm pretty sure the clock domains are friendlier in that direction.

I've also stuck in an interface to synchronize DMFD channels. It's possible (modulo debugging) to directly synchronize phase accumulators between different mixers, which allows them to be completely synchronized. The phase shift register already present allows mixers to be set in quadrature.

Wednesday, September 5th

Clock domain crossings still. (Frustrating business -- I only get two builds a day, so they have to count.) I've also re-done the filter truncations; I'll update the wiki once I've verified them.

Thursday, September 6th

I'm doing a couple of things at once (a bad idea, unless I'm careful?)

Here are the timing images (I've wanted these a couple of times, may as well keep them)

Latest and greatest (wtl_dfmux_interface_v2_00a) checked in to SVN repository. Looking good, but I need to spend a few more hours testing it. Yay!

Monday, September 17th

Haven't been updating this page for a while. Most of my MMIO difficulties are resolved, and we're just getting ready for Jeff to arrive from Chicago.

Monday, September 24th

Ditto. Last week was entirely devoted to the holography application. Tracking down two bugs -- one with mixer synchronization and the other with memory access -- led me to this little VHDLism that I'd forgotten about (or never knew):

  my_process: PROCESS(clk)
  BEGIN
    IF(load_some_signal = '1') THEN
      some_signal <= some_signal_in;
    END IF;
    some_signal <= some_signal + offset;
  END PROCESS;

The VHDL compiler will happily optimize away all references to some_signal_in. The reason is that only the final load of some_signal takes effect; the first one (even though it is supposed to be referenced in the second one) is simply ignored.

Ugh. Of course, it's my mistake, and the optimizer is doing exactly what I told it. For the record, the correct code is:

  my_process: PROCESS(clk)
  BEGIN
    IF(load_some_signal = '1') THEN
      some_signal <= some_signal_in + offset;
    ELSE
      some_signal <= some_signal + offset;
    END IF;
  END PROCESS;

...it's just out-of-idiom for someone who's more used to C.

Error during latex2img:
ERROR: can't find dvipng at /usr/bin/dvipng
INPUT:
\documentclass[fleqn,12pt]{article}
\usepackage{amsmath}
\usepackage[normal]{xcolor}
\setlength{\mathindent}{0cm}
\definecolor{teal}{rgb}{0,0.5,0.5}
\definecolor{navy}{rgb}{0,0,0.5}
\definecolor{aqua}{rgb}{0,1,1}
\definecolor{lime}{rgb}{0,1,0}
\definecolor{maroon}{rgb}{0.5,0,0}
\definecolor{silver}{gray}{0.75}
\usepackage{latexsym}
\begin{document}
\pagestyle{empty}
\pagecolor{white}
{
\color{black}
\begin{math}\displaystyle $\frac{\mathrm{PSD}_o}{\mathrm{PSD}_i} = 2\sqrt{2}/\pi$\end{math}
}
\clearpage
{
\color{black}
\begin{math}\displaystyle D\approx 32N \frac{2048}{25 \cdot 10^6} \approx 262~\mathrm{ms}\end{math}
}
\clearpage
{
\color{black}
\begin{math}\displaystyle k=1 \ldots 6\end{math}
}
\clearpage
{
\color{black}
\begin{math}\displaystyle $\sigma_o/\sigma_i=\frac{2}{\pi}$\end{math}
}
\clearpage
{
\color{black}
\begin{math}\displaystyle c_n=1/(\pi n)\sin\left(\pi n/2\right)\end{math}
}
\clearpage
{
\color{black}
\begin{math}\displaystyle $\mathrm{PSD}=\frac{\sigma}{\sqrt{\mathrm{BW}}}$\end{math}
}
\clearpage
{
\color{black}
\begin{math}\displaystyle N=100\end{math}
}
\clearpage
{
\color{black}
\begin{math}\displaystyle N/2 \cdot k \cdot \frac{2048}{25000000}\end{math}
}
\clearpage
\end{document}
STDERR: