Porting Cyclops to Teensy [32b MCU],about jonnew/cyclops

Comments (5)

oyeb commented on August 24, 2024 2

Benefits to Cyclops

Higher Clock Speed and 32b word size give a great performance boost.
There's a Native USB port and 3 other (TTL) Serial ports, allowing us to flash code while staying connected to other devices on (other) Serial ports.
A lot of special timers are available. These timers can be chained (not in arbitrary configurations) to perform sets of tasks without the intervention from the CPU.
- FlexTimers are general purpose timers which can be triggered from other modules like ADC, etc. The FlexTimer interrupts can be used to trigger user code in an ISR or trigger other modules.
- Periodic Interrupt Timers (PIT) can be set up to trigger modules like DMA (4 channels of DMA can be periodically triggered), etc.
- Programmable Delay Blocks (PDB) can be set up to control ADC, DAC, comparator.
More digital I/O pins. And All I/O pins can be used as external interrupt pins on a Teensy.

I/O pins on Teensy 2.0, 3.1 and Teensy++ 2.0 are 5V tolerant but on Teensy 3.2 and Teensy LC they are 3.3V tolerant.
Typical "flashing" of the Teensy requires pressing of a reset push-button. There is a guide that explains how to flash via a software jump to bootloader. This emulates "pressing of the push-button".

Hence, the GUI can be used to program the Teensy and there would be no need to run TeensyLoader or invoke a makefile before an experiment.
DMA can be used to write to almost any register on the CPU or peripherals without CPU intervention. DMA transfer_done interrupts can trigger other DMA transfers too!

Setting up DMA for SPI

With a Teensy, we might be able to expolit the DMA to regularly write to the Cyclops DAC (via) SPI. This would free up the CPU to monitor the Serial interface and do waveform updates. The pipeline could be:

PIT -> DMA (2B transfer) to SPI `TXFIFO` -> `DMA_done` interrupt

Since the CPU is free to perform updates, it would just wait for DMA-done, after which it could update the contents of the DMA source for "this" iteration. Next DMA-SPI event would be triggered by the next PIT interrupt - maintaining a precise time-interval. The PIT periods can be updated by the CPU anytime.

A Teensyduino library for use of DMA with SPI (DmaSpi) is available. There is also a chip-select option available, but the selection operation requires CPU intervention.

IMO it might not be possible to perform a DAC transfer to the Cyclops Driver without "any" intervention from CPU in the way described. A complete transfer not only requires the DAC value to be pushed into SPI TXFIFO, but also

selection of the DAC-slave-chip,
the actual SPI transfer and,
call to load_dac().

The above described coordinated operations would require help from CPU. A complicated pipeline might work but, just like a CPU, even DMA can get overloaded.
Moreover, since Teensy provides better clock speed, many hardware timers, and more memory; the best way to deliver precise signals would be making CPU handle all the waveform timing and SPI writes.

This would in theory increase latency (delay in waveform updates), but compared to an Arduino, a Teensy runs 4 times faster anyway.
DMA can be utilised for other tasks (see below).

Setup other USART ports as `DATASTREAM`

Other USART ports can be used for streaming data into the Teensy or streaming out to PC.
It might be possible to utilise the DMA triggers to do this operation without CPU intervention.

Measuring and recording signals

The Cyclops Driver,

provides numerous measurements of circuit operation that can be recorded during an experiment such as LED current and stimulus reference voltages.

A PDB -> ADC -> DMA pipeline could be setup to record these analog signals and store them on the Teensy or send them to PC in chunks. PDB would ensure precise sampling intervals.
The chunk-send could also be be performed by the DMA over USART.

from cyclops.

jonnew commented on August 24, 2024

Just a quick note that I received this and I'm processing it. Looks good,
but I need to think about it a bit before replying.

On Thu, Apr 7, 2016 at 2:36 PM, Ananya Bahadur [email protected]
wrote:

Benefits to Cyclops

Higher Clock Speed and 32b word size give a great performance boost.

There's a Native USB port and 3 other (TTL) Serial ports, allowing
us to flash code while staying connected to other devices on (other) Serial
ports.

A lot of special timers are available. These timers can be chained (
not in arbitrary configurations) to perform sets of tasks without
the intervention from the CPU.

FlexTimers are general purpose timers which can be triggered
from other modules like ADC, etc. The FlexTimer interrupts can be used to
trigger user code in an ISR or trigger other modules.

Periodic Interrupt Timers (PIT) can be set up to trigger
modules like DMA (4 channels of DMA can be periodically triggered),
etc.

Programmable Delay Blocks (PDB) can be set up to control ADC,
DAC, comparator.

More digital I/O pins. And All I/O pins can be used as external
interrupt pins on a Teensy. >I/O pins on Teensy 2.0, 3.1 and Teensy++ 2.0
are 5V tolerant but on Teensy 3.2 and Teensy LC they are 3.3V tolerant.

Typical "flashing" of the Teensy requires pressing of a reset
push-button. There is a guide
http://www.pjrc.com/teensy/jump_to_bootloader.html that explains how
to flash via a software jump to bootloader. This emulates "pressing
of the push-button". >Hence, the GUI can be used to program the Teensy and
there would be no need to run TeensyLoader or invoke a makefile before
an experiment.

DMA can be used to write to almost any register on the CPU or
peripherals without CPU intervention. DMA transfer_done interrupts can
trigger other DMA transfers too!

Setting up DMA for SPI

With a Teensy, we might be able to expolit the DMA to regularly write to
the Cyclops DAC (via) SPI. This would free up the CPU to monitor the Serial
interface and do waveform updates. The pipeline could be:

PIT -> DMA (2B transfer) to SPI TXFIFO -> DMA_done interrupt

Since the CPU is free to perform updates, it would just wait for DMA-done,
after which it could update the contents of the DMA source for "this"
iteration. Next DMA-SPI event would be triggered by the next PIT
interrupt - maintaining a precise time-interval. The PIT periods can be
updated by the CPU anytime.

A Teensyduino library for use of DMA with SPI (DmaSpi
https://github.com/crteensy/DmaSpi) is available. There is also a
chip-select option available, but the selection operation requires CPU
intervention.

IMO it might not be possible to perform a DAC transfer to the Cyclops
Driver without "any" intervention from CPU in the way described. A
complete transfer not only requires the DAC value to be pushed into SPI
TXFIFO, but also

selection of the DAC-slave-chip,

the actual SPI transfer and,

call to load_dac().

The above described coordinated operations would require help from CPU. A
complicated pipeline might work but, just like a CPU, even DMA can get
overloaded.
Moreover, since Teensy provides better clock speed, many hardware timers,
and more memory; the best way to deliver precise signals would be making
CPU handle all the waveform timing and SPI writes.

This would in theory increase latency (delay in waveform updates), but
compared to an Arduino, a Teensy runs 4 times faster anyway.
DMA can be utilised for other tasks (see below).
Setup other USART ports as DATASTREAM

Other USART ports can be used for streaming data into the Teensy or
streaming out to PC.
It might be possible to utilise the DMA triggers to do this operation
without CPU intervention.
Measuring and recording signals

The Cyclops Driver,

provides numerous measurements of circuit operation that can be recorded
during an experiment such as LED current and stimulus reference voltages.

A PDB -> ADC -> DMA pipeline could be setup to record these analog signals
and store them on the Teensy or send them to PC in chunks. PDB would ensure
precise sampling intervals.
The chunk-send could also be be performed by the DMA over USART.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#5 (comment)

Jonathan Newman
Postdoctoral Fellow, MIT
http://www.mit.edu/~jpnewman/

from cyclops.

jonnew commented on August 24, 2024

OK, I was able to go through this in detail. First of all, thanks for such a detailed feature request, its good that you have brought up so many potential ways to do this for our discussion. What these suggestions seem to boil down to are 4 major actions:

The PCB layout would need to change to accommodate the teensy over the Arduino (easy)
Remove Arduino dependencies from existing MCU library for the cyclops (easy). That said -- we need to think about users who might want to hack the cyclops using the Arduino IDE. Of course its bloated and the libraries are slow, etc. But its easy and that is almost the most important thing.
Port current SPI implementation (which is CPU intensive arduino library, if I recall correctly) to a DMA based solution. I like the idea of using or adapting the DmaSpi library you mentioned, at least as a starting point.
- While its always good to think about performance from the beginning, its more important to think about architecture and maintainability. It is OK to write low level, and even nasty code for doing things like driving SPI via DMA, but it must be encapsulated away from user-facing portions of the library. It needs to be wrapped up in a header with a nice interface that can be called transparently from the main functions that the user might want to use that are currently in cyclops.h.
- One question: How will waveform updates in memory be synchronized with SPI writes to prevent waveform 'glitching'?
Use remaining processing power and potentially DMA for implementing the RPC handling engine.
- I like the idea of having both input and output. That would be a pretty cool feature, but: most of the useful signals on the cyclops exist in the analog domain. How would these be digitized to be sent back to the OE interface.
- The most important part is definitely a low bandwidth, low latency "trigger" stream from the host PC to the cyclops telling it to deliver pre-defined waveforms
- A high bandwidth data stream with less emphasis on latency for uploading waveforms to flash memory that can be triggered later.

IMO: Best thing to do at this point is

Create a diagram of each component and how they will be linked (DMA, FIFOs, etc)
Assign classes and functions to each portion of the diagram
Get a POC going using simplified implementations of these components.

This will make problems in our thinking extremely obvious without wasting to much effort.

from cyclops.

oyeb commented on August 24, 2024

Supporting Arduino IDE

We are (almost) completely dependent on the teensy (aka teensyduino) libraries, all device specific definitions are in there. The cyclops library would hence be fully compatible with the teensy environment. It's only that the user/developer can choose where they edit code and which tool they use to compile and flash it.
Developers can hack cyclops code on the Open Ephys GUI or the IDE (utilising teensyduino libraries in both cases) or even the command line. Cyclops Development on Arduino IDE is supported by default, out-of-the-box, no special effort needed.

Not using DMA for SPI

I strongly discourage using the DmaSpi, simply because it doesn't fit our needs. DMA handled SPI makes sense only when there is a large amount of data (each byte of which requires no assistance from CPU) that needs to be moved into or out of memory via SPI, typically on a single channel.
Large writes or large reads hog CPU for no reason but we don't have large transfers; we have frequent, small transfers.
Each of these small transfers requires special assistance:

Chip Selection
Latch Activation on the Cyclops DACs

Moreover, the DMA controller can only offload burden from CPU but without CPU intervention, it will not be able to guarantee that waveforms don't "glitch".
And invloving CPU in scheduling DMA backed transfers would defeat the purpose of DMA, since the performance gain would be negligible (small transfers).
DMA Controller cannot schedule transfers.
If CPU is scheduling transfers, it could also do a few more instructions of Chip Select, Latching and Initiate-SPI-write at little extra cost.

A convoluted DAC pipeline

The DMA SPI pipeline can be triggered either by software or hardware interrupt. This would invoke 6 DMA pipelines (on a chain of sorts).

Select the correct slave {not simple} by writing to a PORT register, then trigger the SPI stage.
Fetch the next voltage, Perform the 2B SPI transfer, then trigger the Latch Activation pipeline.
Activate the Latch Line by writing to the correct PORT register, then trigger timer-update pipeline.
Fetch new timer-period and write to the Timer Register, trigger the Latch Deactivation pipeline.
Deselect the slave by writing to the correct PORT register, then trigger the next stage.
Deactivate the Latch Line by writing to the same PORT register (The above 2 steps ensure Latch Line is active for more than 2 CPU instructions). Interrupt CPU to notify of completion.

Each of these small operations would consume a programmable DMA channel. Each Waveform Channel would hence require 6 DMA Channels. Atleast 24 DMA (of ~40 on a Teensy 3.2, Teensy 3.1 has fewer) channels would be hardcoded for this!

Concerns

What would happen if the same DMA pipeline is triggered on another (DMA) channel? It would surely interfere with operation of the ongoing pipeline operation. It could be made to wait but waiting would 'glitch' the waveform. Perhaps a scheduling needs to be computed by the CPU? But that is not acceptable.
No other external peripherals can be used on the PORTs with the CS and LacthLine pins because DMA Controller would be writing to this PORT. DMA controller cannot read PORT, then OR it with the desired bitmask, and then write back, only CPU can.
It merely offloads the SPI transfer task to the DMA controller which
- cannot be programmed easily
- cannot be debugged easily
- cannot run "any code" before/during the pipeline.
- does not guarantee precise waveforms and timely updation of the Cyclops DACs.
The only benefit seems to be that the CPU never waits/schedules an SPI transfer (see next section), but it would have to spend time to save and load contexts for a DMA pipeline and sync it's registers for correct SPI tranfers nonetheless.
Why not schedule SPI transfers intelligently for better accuracy, maintainability and flexibility?

SPI Optimisation

I read the current SPI implementation, and yes, the SPI::transfer() is blocking, but not CPU intensive. Infact the CPU is made to busy-wait! We can optimise that easily by manipulating the SPI Control Registers ourselves.
SPI writes are not very taxing, a write can be no faster than 640ns (16b / 25MHz). That is equivalent to ~40 instructions on a 64MHz device.
With typical SPI speeds, and 64MHz CPU clock, we can get more than 40 idle cyclyes per SPI 16b transfer and these 40+ instructions can be put to better use than idling.

Also, there is an SPI::transfer16() available. That would send 2B in one transfer, slightly faster than (currently used) 2 1B transfers.

The big question still remains though, how do we ensure precisely timed SPI transactions? The #6 proposes a solution.

Syncing Waveform updates with DMA backed SPI transfers

This is a tricky operation, CPU would have to hold the updates in some buffer till the "major" DMA loop is completed.
After each major loop completion, the pipeline interrupts CPU to run an ISR. In such an ISR, CPU can update the waveform from the buffer.

Analog Signal Capture using DMA

A PDB (Programmable Delay Block, chap. 35) channel would be setup to perform sampling on (upto 24? see chap. 31) channels. The CPU would program a PDB channel to perform sampling on some of these channels.
The CPU could trigger this PDB operation either on it's own or via timer interrupt.
When a PDB starts, it samples each channel one-by-one. After completion of each conversion, the DMA is interrupted to transfer the Result Register contents to a memory location in RAM (pre-allocated by the CPU).
When all ADC channels are sampled, the CPU can be notified.
CPU can then initiate a Serial Block transfer of the Results (which the DMA accumulated in an array perhaps) to the OE GUI.

PDB documentation is sketchy in the Datasheet.

from cyclops.

jonnew commented on August 24, 2024

The migration to Teensy 3.2 occurred in revision 3.6 of the device

from cyclops.

Porting Cyclops to Teensy [32b MCU] about cyclops HOT 5 CLOSED

Comments (5)

Benefits to Cyclops

Setting up DMA for SPI

Setup other USART ports as `DATASTREAM`

Measuring and recording signals

Supporting Arduino IDE

Not using DMA for SPI

A convoluted DAC pipeline

Concerns

SPI Optimisation

Syncing Waveform updates with DMA backed SPI transfers

Analog Signal Capture using DMA

Related Issues (15)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent