GraemeSmecherWorkbook < Main

---+ TO DO

~~I and Q paths for synchronization testing~~
~~Spin holo build into separate patch and document it~~ -- checked in to v2.00, grep for GMS HOLO
~~More channels~~
Wikify MATLAB scripts and procedures for draining UDP data
~~Soften landing for MMIO~~
- ~~Pass configuration structure pointers in firmware rather than baseaddr~~
- ~~Move FIFO bitmasks (e.g. for channel) into header files~~
~~Make sure new MMIO interface synchronizes registers and internal data, e.g. defaults (via VHDL generics) and across resets~~
~~Fix/document bug in wtl_fMuxControl.c: clearly, 6x should be e.g. %x or %i~~

   xil_printf("Fifo Occupancy:6x   Status: 0x%8x   MIR info: 0x%8x \r\n",
         WTL_DFMUX_INTERFACE_mReadFIFOOccupancy(baseaddr), 
         WTL_DFMUX_INTERFACE_mReadFIFOStatus(baseaddr), 
         WTL_DFMUX_INTERFACE_mReadMIR(baseaddr));

~~Fix output sample multiplexing!~~
~~SQUID board commissioning~~
~~Check in (or hound Eric to check in) fix for dmfd_mycic_to_fir:~~

--- dfmux/hard/Winterland/pcores/wtl_dfmux_interface_v1_01_a/hdl/vhdl/dmfd_mycic_to_fir.vhd 2007-07-11 10:16:06.000000000 -0400
+++ svn/hard/Winterland/pcores/wtl_dfmux_interface_v1_01_a/hdl/vhdl/dmfd_mycic_to_fir.vhd 2007-07-09 10:22:41.000000000 -0400
@@ -183,6 +184,7 @@
     data_out <= v_data3;
     ready_data_out <= '1';
   elsif v_channel_count="00011" and v_ready_data4='1' then
+    ch_for_data_out <= v_channel_count;
     v_ready_data4 := '0';
     v_channel_count := v_channel_count + 1;
     data_out <= v_data4;

~~Check in (or hound Eric to check in) fix for gen_version.awk:~~

--- svn/xps/data/gen_version.awk 2007-06-01 08:51:17.000000000 -0400
+++ dfmux/xps/data/gen_version.awk 2007-07-17 08:52:56.000000000 -0400
@@ -1,5 +1,6 @@
 #
 BEGIN {num_lib=0;num_driv=0;}
+gsub( /\r$/, "" ) {}
 /^BEGIN LIBRARY/ {num_lib++}
 /^[[:blank:]]*PARAMETER OS_NAME/ {os_name = $4}
 /^[[:blank:]]*PARAMETER OS_VER/ {os_version = $4}

Week of May 28th - June 1st 2007

Wiki, SMB, desktop accounts set up.

Friday, June 1st

Met with Matt & Eric to sort out workflow. We have two problems:

Meeting timing, and
Profiling, verifying, and optimizing DSP performance

...and two workflows:

Matlab + Sysgen -> optimization -> direct PCore synthesis using System Generator
Matlab + Sysgen -> optimization -> hand-tooling of VHDL -> PCore synthesis using existing flow

The question now is which of these two flows will best resolve the two problems above. Next week I'll come up with a compelling story.

Week of June 4th - June 8th 2007

Monday, June 4th

This week's laundry list of questions:

if we remove all of the existing constraints from the signal flow (i.e. ditch our VHDL blocks and adopt system generator blocks wholesale), what difference does it make in the synthesized VHDL?
- is it more efficient?
- is it readable?
- is it something we couldn't do by hand?

Strung together a full signal flow (using Xilinx blocks for FIR and CIC); components are correct, but parameters (e.g. CIC stages) and interconnections (truncation) aren't. First impression: this is going to be slow, since simulation in Matlab is glacial and compiling for cosimulation (where part of the Simulink model runs in hardware within the FPGA) takes a long time to compile.

To do:

run both ways (compile for cosimulation, and simulate in software) on the desktop, to see how much of the speed problem is swap thrashing on the laptop
refine the model so the signal flow (as well as the topology) is correct, try to get some performance data

Tuesday, June 5th

Backed off a little: testing dmfd_square_mixer + XILINX CIC (2048x decimation); bus types and decimal point locations now sane.

Can't synthesize low-frequency input signal (frequencies below 1 kHz seem to confound Simulink)
Output from 1 kHz AM synthesizer looks good.
HDL generation for cosimulation: about 5 minutes (much better); suggests FIR compilation is the killer.

BUT... cosimulation library is only useful if we play by Xilinx's rules for clocks and clock enables. The generated block is shown below.

Any inputs to Xilinx blocks (e.g. anything that passes through a gateway in) show up as inputs to the cosimulation block. That means any clock signals, too, unless we follow the rules. To figure out: how painful is this?

I think the crux is this: in VHDL, we have to deal with synchronization, especially where sampling rates change (e.g. where decimation occurs.) This is where System Generator is intended to remove headaches, but only if we let it: trying to use the existing VHDL within system generator looks pretty warty from here. On the flip side, the generated VHDL is not readable either.

Hm. Ditto for control logic: the "gateway in"-type inputs are for data_in, reset and write_enable. The latter two are sequenced in simulation using a "signal builder" block (not from Xilinx, so it doesn't get swallowed by the model). This works well for software simulations, but it's important not to forget about control logic. Especially since it's meaningful in the end product, but mostly shrugged off during signal path simulations.

The best plan of attack might look something like this:

Simulate existing VHDL using top-level DMFD structure. Either use a black box in simulink, or provide a signal synthesis/analysis interface using file I/O and an existing testbench. Don't worry too much about breaking the top-level entity down into blocks, since it's painful, slow, and difficult to Xilinx-ize.
Re-create the existing workflow using Xilinx blocks (avoid VHDL wherever possible). See if it looks similar (synchronize, subtract and examine error signal?)
Tweak DSP using Xilinx-happy model (and probably cosimulation)
Resynthesize pcore, either by using System Generator or by hand-tooling VHDL. The "which" decision depends on a few things:
- can we interleave and overclock FIR blocks to reduce MAC usage? Is this a red herring? (Answer: Yes, it's trivial to do. There's an m-channel FIR block.) (Wait, there's more. Clocking these blocks is going to be tricky -- will write about it tomorrow.)
- is control logic easy to create? easy to follow? worth the effort?
- is the generated VHDL readable enough? (what is "readable enough"?)

Wednesday, June 6th

Today is ModelSim day. If Simulink is going to be a pain, let's see if we can find an advantageous mix of Simulink (for signal synthesis and analysis) and a VHDL simulation tool (to do the hard work.) This requires a VHDL testbench and signal I/O via files (which is probably slow and at any rate doesn't stream), but it doesn't involve shoehorning SysGen-phobic VHDL into Simulink.

First job: get Xilinx libraries built for ModelSim, and import them into a ModelSim project. Short story: run "compxlib" (a Xilinx tool, but it's on the system path for me). This builds ModelSim libraries for Xilinx IP blocks. Then, in ModelSim, do file->import library, and select the output path used by compxlib (here, c:\xilinx\xilinx91i\vhdl\mti_se). Each of the three libraries should be imported, which extracts them to yet another path. I used "z:\xlib" (where z: is a Samba mount.)

(...more to come. It's taking me a while to get used to ModelSim.)

Thursday, June 7th

OK, I'm catching on slowly. There's nothing wrong with my testbench, except that it takes a very long time for data to propogate through it. Of course.

Data goes in at 25 MHz. It comes out of the CIC at 1/2048th of that, or 12.207 kHz. Each FIR halves the data rate, and we end up with 6 stages, so the final output is 190.7 Hz. Using back-of-the-napkin reckoning, we'll have to wait for 1/190.7 Hz = 5.24 ms (Simulation result: 5,248,725 ns) for the first sample to emerge. That takes nearly an hour using ModelSim as a simulator. (There are some gains to be made via optimization, since I currently simulate with full signal visibility. That constrains ModelSim's ability to simulate efficiently. Loose extrapolation suggests about a 2x speed-up.)

It gets worse. Each FIR is 100 taps, so meaningful data has to traverse 50 taps of each filter before we see anything but zeros (or garbage, if nonzero.) Add a large start-up delay to the picture.

And worse again. Since we're interested in frequency-domain characteristics of the output, it's an absolute necessity that we collect gobs of data and analyze it spectrally. That adds another few orders of magnitude to the data collection requirement.

Now, that's using a single, monolithic ModelSim instance to take care of all of our simulation. It uses a file for input, and another for output. It's probably a good deal faster than Simulink (since it's tuned for speed, and built around RTL simulation.) It may be simulating all 8 channels, but a factor of 8 (definitely an optimistic factor) isn't going to make full software simulation a viable option.

Obviously, we're going to have to be more clever.

Cleverness might be starting to arrive. I've done a few trivial test sims:

25 MHz signal synthesis in Simulink doesn't have to be a problem. Test signal flow: soft synth -> ( pluggable downsample-by-131072 ) -> scope
- decimation by Xilinx block with JTAG cosimulation: simulation rate is 23 ms / wall minute (output ~ 5 samples/minute. Marginal.)
- decimation by Matlab block in software: simulation rate is 1.3 s / wall minute (output ~ 250 samples/minute. Could be good.)
- suggests 25 MSPS transfer rate dominates. This simulation is run clock-locked with the FPGA, so we'd possibly benefit from FIFOing data and running asynchronously (although presumably the software rate is a decent estimate of the limit.)

Current thinking: cut the DSP flow into three chunks.

Test signal synthesis. Probably simulated in software, though we could do good, fast bandlimited Gaussian signal synthesis in hardware...
Fast signal path --- square mixer + CIC. This may run the fastest, but it's the simplest to simulate; plus, if signal is generated in software, transfer rate is probably worse than computational burden.
Slow signal path --- FIR chain. This is almost certainly the Murderously Big Problem, and we'd probably benefit the most from moving this to hardware cosimulation.

My suspicion is that we can only cosimulate a single block at once (since I strongly doubt bitstream-merging is possible.) That would suggest the FIR chain gets cosimulated, and everything else gets the software treatment.

This solves the simulation-of-signal-path problem. It's still not clear whether we'll export to pcore or port back to VHDL --- this is dependent on what the actual implementation looks like (and whether or not the existing VHDL can still fit within the Xilinx clocks-and-enables semantics, particularly given the limitations I'm imposing above.)

Week of June 11th - June 15th 2007

Monday, June 11th

Calculating the delays for the DSP signal path. We're interested in two scenarios:

End-to-end delay for a single sample
Flushing delay (change FIR -> output data is good again)

In both cases, the delays are dominated by the FIR blocks, so the following is a bit sloppy in the first few stages and may also mix a few registering/downsampling delays.

First the end-to-end delay.

Mixer: assuming instantaneous (perhaps registered?)
CIC: 1 sample (The MATLAB DSP blockset contains a CIC block, with a 1-sample group delay over the passband. Taking this for now.)
FIR: N taps implies group delay of N/2. The kth stage has a delay of
Combining approximately, we have and , so

The flushing delay (assuming the number of filter taps doesn't change; otherwise, the number changes depending on where the FIR block restarts) is double this number, or about 524 ms.

A few sources of inaccuracy:

the assumption that 16+8+4+2+1+1/2=32
decimation: do we take the first or second sample?
FIR length is actually 99 plus one prepended zero, suggesting noninteger delay (assumed 100/2=50). This might be exactly right or off by 1/2 sample; doublecheck it.
missing register delays in inter-block shims

...but depending on what we need it for, this should be a good estimate. It also demonstrates how useless a slow simulation topology will be.

Tuesday, June 12th

Putting together a blackbox FPGA co-simulation (noun string) for the entire DMFD. Will evaluate whether it's sensible to generate a full 8-channel system, and thus avoid the work of hatcheting a single-channel system.

Wednesday, June 13th

Looks like it's going to be easier (faster) to do some hatchet work on wtl_dmfd. Two tasks (man, do I seem to love bullet lists):

remove 100 MHz clock (will use this later anyhow). As a side effect, this will remove the difficulty with clocking, i.e. will make Xilinx System Generator tools play nicely with black-box cosimulation.
remove other channels (trivial, and I need to revisit how multiplexing works in mycic_to_fir anyways. If all CICs clamor for attention from the FIR at the same time, how is access to the FIR sequenced?)
- one at a time, as needed. Lower clock rates (as is sensible) will be just fine as long as there's enough time for convolution

As an aside, the ML402 evaluation board (which I'm using for cosimulation) uses a fixed 100 MHz clock. That means even though we only need to meet timing at 25 MHz on the DFMux board, it's useful to meet 100 MHz as well. We don't -- the trouble appears to be deep layers of logic at the square mixer.

For now, since co-simulation will run in single-step mode, I'm hoping this isn't a problem. This part of the system might take up less room if pipelined.

Thursday, June 14th

First impression: a few pipeline registers stuck in the square mixer component make a minor improvement, but I don't have a good enough idea where the slow paths are to do this properly. Plus, I'm starting to wander away from the "known good" design.

I'm working on two strategies, either of which will suffice:

adding a new JTAG co-simulation target for System Generator with a 25 MHz clock rate (these templates live in "MATLABPATH/plugins/xilinx/sysgen/plugins/compilation/Hardware Co-Simulation/ML402/JTAG/..."). I'm hoping that even though the hardware clock is 100 MHz, the JTAG single-stepping mode is self-clocking and will be much slower; thus tight constraints are not really necessary. This may not fly depending on exactly how JTAG clocks things, and if not, there isn't much I'll be able to do about it.
getting a closer look at the slow paths using Xilinx ISE, so I can tweak things properly. Learning the tools, too (arguably useful in its own right.)

If neither of these work, another option is to tweak the routing parameters so that place-and-routing tries harder before giving up on meeting timing. I'm not sure where this is configured right now, but it's probably in the toolbox tree (just like the path in the bullet list above.) I'm not sure if this will balloon the synthesis time into something unweildly, but it's not something I expect to do often. Plus, I'm spending a lot of time getting it to work anyways.

Hrm. The paths that don't meet timing in Xilinx ISE are different from those using System Generator. That might complicate things a bit... looking into the new-target option.

Urgh! Drivers for the Platform Cable USB (JTAG pod) don't exist for Windows Vista. Can't use desktop.

Solved! It suffices to run the system clock at 100 MHz and underclock the DMFD block (via the AddClkCEPair entry in the black-box setup m-file.) I've got the first running black-box simulation of the DMFD system, but it needs verifying (and streamlining, since JTAG cosimulation doesn't work on Vista and Ethernet cosimulation -- the only alternative I've got right now -- is probably a lot slower.)

I'm now trying to find out how well-behaved this block is. There are plenty of possible hiccoughs...

Friday, June 15th

Still too slow.

Co-simulation is fast enough that I can verify samples are going in and coming out again (working on it now), but too slow for any meaningful data collection. I suspect data I/O is the killer, and I'm wondering if I need to either reduce the number of redundant control I/Os (registers used only for setup), or if I need to move to asynchronous (non-clock-locked) cosimulation.

...and samples aren't going in or out either, apparently. Looking into it...

Monday, June 18th

I've removed the 100 MHz clock, but also tried to remove the dependence on falling clock edges (according to some -- not that I have a particularly strong feeling about it -- it's Bad Mojo to depend on both clock edges in a synchronous system.) Moreover, with single-edged clocking, timing is met without any of the equally hairy tricks I tried above.

This monkeying around with existing code is a bit frustrating, since I keep departing from the known-good code in SVN in order to simulate something that (touch wood) behaves the same way. However, I've added all 8 channels back into the synthesized design, so there's a bit of 'give' alongside the 'take.' Since the complexity of the design is largely the FIRs, the other 7 channels come with very little synthesis overhead.

Tuesday, June 19th

(see below)

Wednesday, June 20th

Working on 1-channel-izing some more. Now 8 samples get through (one per channel, where 7 channels are hatcheted) but no more. Q: does the multi-channel FIR expect strict sequencing (i.e. if it doesn't get data for the other 7 channels, does it lock up?)

yup. SEL_I and SEL_O (channel selects for FIR input and output) are both generated by each FIR block. An 8-channel FIR is an 8-channel FIR is an 8-channel FIR is not a 1-channel FIR.

Thursday, June 21st

Yesterday, I managed to get "interesting" (non-zero) data out of the board for the first time. However, only under free-running cosimulation -- in which case, it's not guaranteed that input data becomes output data.

When cycle-locked, output data never shows. That suggests there's a timing gotcha or that something needs registering. Looking into it now.

...fixed. There still seems to be internal glitching, since the output is extremely chaotic (zero input produces zero output, but everything else is madness.) Will look into it next time.

Tuesday, June 26th

Looking into square vs. IQ mixer noise performance characteristics. Since the theory is much simpler using sinusoidal mixing signals, it's useful to construct or decompose square signals into familiar sinusoids. Both the square and quarter-wave mixer inputs can be generated by sampling sinusoids, so it suffices (assuming no other signals collide in harmonic regions) to consider the usual product mixer case with sin/cos inputs.

Wednesday, June 27th

A to-do laundry list:

Start on document for mixer signal/noise transfer functions
Document signal path gotchas
- Falling edges in sensitivity lists
Re-compute FIR coefficients; compile new IP cores
Simulation process documentation
- Why isn't a full-loop simulation in System Generator feasible?

Today: working on mixer transfer-function document.

Thursday, June 28th

Friday, June 29th

Tuesday, July 3rd

I haven't disappeared, I've just been working on the transfer-function document. There's not much to track -- I'll post a version here as it gets closer.

The three mixers (sinusoidal, square-wave and quarter-band square-wave) are modeled with signal and white noise components. I'm working on narrowband noise models now.

Here's Today's draft: * document.pdf: document.pdf

Wednesday, July 4th

Simulations. The transfer functions I'm seeing agree with my calculations, but don't jibe with what Matt was expecting -- we suspect it's a matter of conventions. More tomorrow.

Thursday, July 5th

Figured it out. The difference pops up when variance (equivalently, RMS noise voltage) is converted to power-spectral density. I need to add a line or two describing the conversion between RMS gain and PSD gain, and clarify the bandwidth measures for noise (single-sided vs. double-sided).

Friday, July 6th

Here's the story. For narrowband noise, the output variance is related to the input variance via the following ratio:

...as derived in the last draft. We make use of the conversion formula between power-spectral density (PSD) and variance:

where BW is the bandwidth of the signal under consideration (this is simply Parseval's formula for power signals, where the frequency content is constant over a limited range and zero elsewhere.) Substituting this into the above expression gives the ratio of PSD out to PSD in:

This is for PSD measurements in volts RMS per root hertz. For watts/hz, (e.g. assuming a 1-ohm load) this ratio must be squared.

This confuses me, though -- there's an equivalence between narrowband and broadband noise that needs to be fleshed out a little further. (Question: why isn't the amplitude-PSD correction required for wideband noise? Answer: because we didn't use Rice's representation -- but that's not a very convincing answer.)

Here's a better one, although it only applies for truly white noise. When we gate white noise (one way to consider the effect of the quarter-wave mixer), we half the noise power. No problem. Alternatively, we multiply the time-domain noise by a 50\% duty-cycle pulse train, with fundamental frequency equal to that of the mixer. In the frequency domain, this is is equivalent to a convolution by the Fourier series expansion of the pulse train (with .) The expansion itself makes absolutely no difference in the white-noise case, since the frequency-domain convolution simply returns more white noise with a different amplitude.

I'm also looking at the FIR taps again. Consider the number of 25 MHz clocks per input sample (assuming 1 channel for now):

Stage	Clocks per Input Sample (25 MHz)	Taps (8 channels, neglects overhead)	Channels (32 taps each)
1	2048	256	64
2	4096	512
3	8192	1024
4	16384	2048
5	32768	4096
6	65536	8192

That suggests we already have enough headroom to handle all 8 channels at a 25 MHz FIR clock (not 100 MHz as is currently used.) Erm, why don't we? (Is the overhead really that big? What's the deal?)

I could do some simulations, but I bet there's an easy answer.

Monday, July 9th

Nope -- I did some simulations at 25 MHz, and it works just fine. There's enough overhead for more taps as well.

There are two jobs involved in updating the DigitalFMux receive path:

Move the double-clock scheme to a single, 25 MHz clock
Tweak the FIR taps. It makes the most sense to me to change the first two filters and leave the rest alone.

Tuesday, July 10th

(see below)

Wednesday, July 11th

Commissioning SQUID boards. Rigged workstation with power supplies; DB25 power connector. Verified that bitstream on Wiki matches bitstream on "known good" board (Serial #002). Commissioned a single board.

Thursday, July 12th

Francois has been fixing the U19 insulator mats (hooray!) Showed him how to program a board, walked through the process for two. Programmed three more for a total of 6 boards.

Also working on the demodulator firmware. The FIR truncation for the first two filters has been chosen arbitrarily (this needs to be tweaked later.) Compiling now; hopefully I'll approach a testable build sometime today or tomorrow. (This is a long-shot build, and will probably need to be debugged.)

Friday, July 13th

Monday, July 16th

Tuesday, July 17th

Programming more SQUID boards; built an untested 25-MHz-clocked, filter-optimized DfMUX build. Working on FIR documentation (see wiki pages for DMFD signal path.)

Wednesday, July 18th

Thursday, July 19th

Awright, a busy couple of days. Here's the results of the first build of the new FIRs running at a 25 MHz clock. The output should be a 10 Hz sine wave (obtained by mixing a 1 MHz carrier with a 1.00001 MHz carrier.) It (I'll bet) is also what happens when you feed a FIFO clocked at 100 MHz using control signals at 25 MHz.

Friday, July 20th

This is looking much better:

Don't get too excited, though. The changes I made to properly match the DMFD with the output FIFO don't always lock onto the correct sample in the plot from July 19th, meaning I get plots that are flat more frequently than I get this one.

In addition, there's some messiness at the maxima of this plot, suggesting an early-stage filter is saturating (and thus clipping, which is cleaned up a little by subsequent stages.)

More to come.

Monday, July 30th

Putting together 10 more SQUIDs. Also, considering the "9th channel" task -- seems a few dedicated channels are the best way to go, in order to avoid a large crossbar-type switch. It's also reasonable to end up with a "strange" number of channels (i.e. 8+1, or 16+1, or 16+2 etc) in order to make the number of "normal" channels into a power of 2.

Tuesday, July 31st

First task: fixing up the FIFO latching at the DfMUX output.

Wednesday, August 1st

OK, I need to find a more effective way of doing this than tweaking something and waiting hours for a full build.

The "right" way appears to be using the IBM BFM (Bus Functional Model) toolkit -- we isolate the DfMUX from the rest of the system at the bus level, and issue transactions on a preprogrammed basis. This will allow me to make sure everything behind the OPB (including FIFOs) is correctly sequenced.

There are also two important bonuses: it'll allow me quick feedback on modifications to the register interface (needed for the "9th-channel" problem, along with a number of wish-list items Eric and I have) and will allow rapid synthesis of the DfMUX to check out resource usage.

It's a free download, but I need to wait for Xilinx to approve my license. (24 hours, they say.)

Thursday, August 2nd

Man, what a pain. The BFM stuff is pretty heavy, and I'm not confident with the command-line Modelsim interface (or other internals) to feel like it's worth investigating yet.

In the meantime, I'm starting to understand where the latching problems are occurring. I need a better way to track my own changes (which aren't checked in to SVN yet, and aren't ready either.) Working on it.

Ultimately, the OPB FIFO interface expects to run synchronously. We're using too many clock domains (200 MHz, 100 MHz, and 25 MHz, operating on both rising and falling edges) for things to be as clean as they ought. Harrumph.

Friday, August 3rd

Dodged a bullet. With a single-clock strobe on the data_ready_to_fifo output from the DMFD, everything lines up nicely (but both the old and new code ignore the "got it" response from to FIFO write request, which bothers me.)

I'm now giving the FIRs a last look over, and finding out that Matt's specs weren't met with either the old or new designs. I'll have to have a chat about that. Once I've finalized FIR designs, I'll pick the right truncation points and start verifying.

Looking into SquareMixerTroubleWithHighFrequencies (placeholder) for Jeff MacMahon at Chicago.

Monday, August 6th

Met with Jeff MacMahon again. We're converging, although I'm still unnerved by running the square mixer at high frequencies. Finalizing filter designs (wasn't meeting specs by a wide margin; tweaked via noise shaping during truncation. See signal path documentation.)

Tuesdsay, August 7th

Got filter designs finalized and compiled; now testing results. The interface is a little frustrating at times.

A quick check: synthesize sinusoid at fc, demodulate at fc-200 Hz. For a first trial, fc is synthesized on-board (not using a function generator.) Comparing the situation at fc=500 kHz with fc=10.66 MHz suggests a higher white noise floor (but equivalent 1/f roll-off noise!). More investigation needed, esp. if I suspect a lot of high-energy carrier spurs in the high-frequency case.

Wednesday, August 8th

Worked on UDP data draining, testing via function generators.

Thursday, August 9th

Short-term to-do list:

Try carrier and demod at 500 kHz (and +3hz offset) and see if 1/f wings go away
Why does amplitude drop when we go from 10 MHz to 10.6 MHz? (Check analog filter attenuation)
Do full double/triple-function-generator setup and see what happens at high carrier rates
Amplitude modulate white noise and extend bandwidth (checking for contributions from OOB spurs, at both predictable and random telegraph signals)

Friday, August 10th

Monday, August 13th

Tuesday, August 14th

Wednesday, August 15th

Thursday, August 16th

Working on a few things:

"9th-channel" problem and memory-mapped interface
Data for Jeff

Long haul, sorry for the lack of notes.

Friday, August 17th

To-do for work study wrap-up: (Obviously, this stuff will not all happen today.)

Document FIR gains; finalize truncations
Check in new RTL
Send I/Q data to Jeff

Monday, August 27th

Tuesday, August 28th

Working on memory-mapped interface. It's starting to come together; good, since most of the remaining wrap-up is contingent on it (if I get to do things the way I'd like.)

Here's the latest to-do list, in handy-dandy table formatting. Sorry, the links are cross-wiki and thus don't work.

Stuff I Did

XILINX/MATLAB SysGen investigation (result: not the right tool for the job.)
Signal path moved to 25 MHz (Result: checked in as DfMUX v1.02; more updates pending the next list)
Mixer transfer function document (Result: see DfMUXSignalPath_DMFD_SquareMixer)
New FIRs: Complete, checked in as DfMUX v1.02. See DfMUXSignalPath.
Signal path documentation: Sufficiently complete to document my changes, see DfMUXSignalPath. Also see the wish list item below.
SQUID controller commissioning: complete; see SquidControllerCommissioning
Holography sample data and testing at high carrier frequencies: complete. (Result: data sent to Jeff; convinced myself that high carriers do indeed work.)

Stuff I'm Working On Now

FIR truncations. Gains are currently higher than previous builds; I'm not quite sure where the saturation limits should be.
FIR finalization. One tweak (power-of-two gains) remains for FIR stages 3-6; need to borrow Eric's laptop briefly to use MATLAB. (Toolkits required...)
"9th-channel" interface. This is pending the landing of the MMIF code.
MMIF code. A memory-mapped interface to the DfMUX that greatly cleans up the firmware-RTL interface and makes new requirements convenient to add.
Holography tweaks. Eric is working on injecting timestamps; there's a thing or two that needs to be merged before the build goes out.

Stuff I'd Like To Do But Probably Can't

Amalgamate and improve DfMUXSignalPath documentation. A brief cruise through the wiki the other day showed redundancies and outdated wiki contents elsewhere -- I wish I'd integrated existing contents a little better, but it's a big job.
Remove double-edging from DfMUX code. We currently clock on both rising and falling edges, and probably shouldn't.

Wednesday, August 29th

Memory interface now behaving, but FIFO reads 0xdeadbeef. I messed something up elsewhere...

Thursday, August 30th

Nope. The interface between user logic and the OPB bus is multiplexed via wired-OR. Thus, when I'm not supposed to assert data bits, I really shouldn't (otherwise I flip bits on the real data.)

Looks good. Transfer between the MMIO interface and the DFMUX doesn't seem to work, though...

Friday, August 31th

Tuesday, September 4th

Still working on transfer between MMIO interface and DfMUX. The problem is synchronization back and forth between the OPB clock (100 MHz) and the DMFD clock (25 MHz). I haven't had an opportunity to double-check that transfers to the DMFS work properly, although I'm pretty sure the clock domains are friendlier in that direction.

I've also stuck in an interface to synchronize DMFD channels. It's possible (modulo debugging) to directly synchronize phase accumulators between different mixers, which allows them to be completely synchronized. The phase shift register already present allows mixers to be set in quadrature.

Wednesday, September 5th

Clock domain crossings still. (Frustrating business -- I only get two builds a day, so they have to count.) I've also re-done the filter truncations; I'll update the wiki once I've verified them.

Thursday, September 6th

I'm doing a couple of things at once (a bad idea, unless I'm careful?)

doing some timing simulations for Eric (to help the GPS sample-tagging)
making the number of channels easily selectable via generics
repairing the memory-mapped interface (still!)

Here are the timing images (I've wanted these a couple of times, may as well keep them)

eric_stage0_overall.bmp: CIC output routed to FIFO; large-scale view
eric_stage0_group.bmp: CIC output routed to FIFO; one group of 8 samples (one for each channel)
eric_stage1_overall.bmp: FIR stage 1 routed to FIFO; large-scale view
eric_stage1_group.bmp: FIR stage 1 routed to FIFO; one group of 8 samples (one for each channel)
eric_stage1_one.bmp: FIR stage 1 routed to FIFO; blow-up of a single sample

Latest and greatest (wtl_dfmux_interface_v2_00a) checked in to SVN repository. Looking good, but I need to spend a few more hours testing it. Yay!

Monday, September 17th

Haven't been updating this page for a while. Most of my MMIO difficulties are resolved, and we're just getting ready for Jeff to arrive from Chicago.

Monday, September 24th

Ditto. Last week was entirely devoted to the holography application. Tracking down two bugs -- one with mixer synchronization and the other with memory access -- led me to this little VHDLism that I'd forgotten about (or never knew):

  my_process: PROCESS(clk)
  BEGIN
    IF(load_some_signal = '1') THEN
      some_signal <= some_signal_in;
    END IF;
    some_signal <= some_signal + offset;
  END PROCESS;

The VHDL compiler will happily optimize away all references to some_signal_in. The reason is that only the final load of some_signal takes effect; the first one (even though it is supposed to be referenced in the second one) is simply ignored.

Ugh. Of course, it's my mistake, and the optimizer is doing exactly what I told it. For the record, the correct code is:

  my_process: PROCESS(clk)
  BEGIN
    IF(load_some_signal = '1') THEN
      some_signal <= some_signal_in + offset;
    ELSE
      some_signal <= some_signal + offset;
    END IF;
  END PROCESS;

...it's just out-of-idiom for someone who's more used to C.

Attachments

Topic attachments
I	Attachment	Action	Size	Date	Who
png	better.png	manage	122.5 K	2007-07-20 - 13:13	GraemeSmecher
pdf	document.pdf	manage	138.5 K	2007-07-03 - 16:08	GraemeSmecher
bmp	eric_stage0_group.bmp	manage	167.5 K	2007-09-06 - 15:28	GraemeSmecher
bmp	eric_stage0_overall.bmp	manage	167.5 K	2007-09-06 - 15:28	GraemeSmecher
bmp	eric_stage1_group.bmp	manage	167.5 K	2007-09-06 - 15:28	GraemeSmecher
bmp	eric_stage1_one.bmp	manage	167.5 K	2007-09-06 - 15:29	GraemeSmecher
bmp	eric_stage1_overall.bmp	manage	167.5 K	2007-09-06 - 15:28	GraemeSmecher
png	first_blood.png	manage	147.0 K	2007-07-19 - 17:03	GraemeSmecher
png	wave.png	manage	2.6 K	2007-07-09 - 14:56	GraemeSmecher

This topic: Main > TWikiUsers > GraemeSmecher > GraemeSmecherWorkbook Topic revision: r56 - 2011-05-13 - DfmuxCollab