New firmware for printing directly from the Raspberry Pi's GPIO pins

MachineHum

Re: New firmware for printing directly from the Raspberry Pi's GPIO pins
January 20, 2015 11:35AM

Registered: 9 years ago
Posts: 14

Thanks for your reply, that's very interesting... do you have plans in the future to build an adapter from the RPI headers to the RAMPS headers?

Reply Quote

wallacoloo

Re: New firmware for printing directly from the Raspberry Pi's GPIO pins
January 22, 2015 02:54AM

Registered: 9 years ago
Posts: 18

No, I don't have any plans to build such an adapter - the jumper cables work well enough for me.

Reply Quote

AndrewBCN

Re: New firmware for printing directly from the Raspberry Pi's GPIO pins
January 27, 2015 09:05PM

Registered: 9 years ago
Posts: 977

Hey Colin,
Just chiming in to encourage you to continue on with your development work, this is a very interesting piece of software that you have designed.
Yes, there is a lot of misinformation and "so and so says it can't be done" in all forums around the world, this forum here is no exception, don't let that discourage you.
That said, if you are using Linux I would strongly suggest you take a look at a real-time kernel when you have some time, for various reasons.
Good luck and may the force be with you, etc.

Reply Quote

frankvdh

Re: New firmware for printing directly from the Raspberry Pi's GPIO pins
February 11, 2015 07:35PM

Registered: 9 years ago
Posts: 978

Quote
wallacoloo

As far as processor usage goes, it's not quite that straightforward. In the devel implementation, a single read using the component values found in the ramps-fd thermistor circuit (10 uF capacitor and 4.7k parallel resistance) will take 1 mS - 10 mS, based on the temperature, and is measured using a 1 MHz hardware timer. I think I have the sample rate set to 1 Hz right now from the legacy RC circuit, which could be naively translated to between 0.1% and 1% of cpu usage. The routine is non-blocking though, so it's still doing other stuff during this time and only checking the RC circuit every 2-3 uS or so.

Hi Wallace,
I'm new here, having just discovered your project, but I have a history of embedded systems development, mostly on stuff smaller than an RPi, and with no OS. I have practically no experience with programming the RPi though.

One technique for measuring time (and therefore the resistance in your RC circuit) is to set up the discharge of the capacitor to generate an interrupt. This way there is near zero CPU usage needed to monitor its state. The interrupt service routine would read the hardware counter, save it to a variable, reset the counter, and restart the process. The rest of the system would just use the latest value saved in the variable.

Another approach I've seen for measuring resistance is to use the thermistor to control the frequency of an oscillator... you can then just measure the length of one cycle, or count the number of pulses over a known period of time.

I wouldn't get hung up on the time taken to read 20C... typical hot-end temperatures while printing are between 180C and 220C, I guess, although more exotic plastics may use some other temperature value. OTOH, in the future, when there's software to support it, it might be nice to know the ambient temperature, so that print speeds could be varied on the fly, to allow for the difference in cooling time between a hot day and a cold one.

[Edit] The bed temperature is something should be measured, in which case ambient would be pretty much redundant... for those with a heated bed, the bed temperature could be anywhere between ambient and 70-80C, I think.

Frank

Edited 1 time(s). Last edit at 02/12/2015 09:19PM by frankvdh.

Reply Quote

Dejay

Re: New firmware for printing directly from the Raspberry Pi's GPIO pins
May 20, 2015 05:03PM

Registered: 9 years ago
Posts: 210

This is really awesome! I've been looking for a Raspberry PI solution for 3D printing. Just some rambling thoughts:

What I find compelling about the Raspberry PI is the cheap high speed camera module. Mostly of course for monitoring the build remotely. And theoretically - well this is some pie in the sky stuff but anyways - with computer vision using the GPU you could do some tracking of the print head and really "close the loop" in terms of position feedback. Besides calibration you could theoretically even adjust for slop or backlash or inaccuracies in the system. With a really intelligent tracking / adjustment you could even have bent out of shape rails and simply train the system to compensate for that. And that in turn could enable the use of cheaper / simpler mechanical systems like a robot arm with slop.

And thanks for sharing you code. I have a BBB with machinekit but it's such a behemoth to play around with. I got frustrated with machinekit because I wanted to integrate an intelligent calibration routine (not my math) and you can do that of course. But you'd need to trigger gcode that writes to files, then read the files and run them through an executable. And make build files for that. So not easy to add changes. So I certainly understand the urge to sometimes just start from scratch. Even if it's just to understand where the complexities in a system are. Then often you'll find yourself going back to exactly the same architecture of the implementation that looked overly complicated before grinning smiley

I think ideally a firmware should have all the high level control stuff written in javascript and nodejs so you can reuse all code for remote UI / configuration. For example you could write a slicer in js and have it run either on the raspberry or on the browser of your PC. Afaik Nicholas Seward is currently working on a 5 dof slicer in javascript.

BTW is the wiring diagram / parts list available? I've been learning about arduino a bit the last days and just realized that using the stepper motor drivers isn't actually black magic. They simply have a step and dir pin! How convenient lol. But I'm an electronics noob. For example it looks like you have rather big capacitors next to the stepper drivers. What do they do? I can probably read that up somewhere too. But a "how to build your own 3D printer board" would really be an ultimate electronics learning tutorial! smiling smiley

Of course it might still be easier to just slave an arduino to the Raspberry PI to do all of this. Or the Teensy 3.1 for $20. I think it has 34 GPIO (21 shared with analog) is 3.3V but 5V tolerant. Isn't the beaglebone doing basically the same by adding two micro controllers for realtime control?

Reply Quote

wallacoloo

Re: New firmware for printing directly from the Raspberry Pi's GPIO pins
May 28, 2015 04:39PM

Registered: 9 years ago
Posts: 18

Hi Dejay,

I'm glad you came across this firmware. As far as a wiring diagram, since there's no "official" Printipi or standard Raspberry Pi 3d printing board, it really comes down to choice in how you wire it. After posting my videos, I switched over to using a pre-built board designed for 3.3v arduino and connected it via jumper cables. You could do the same, just make sure that whatever interface board you're using is 3.3v tolerant (a lot of RAMPS boards use transistors that require > 3.3v to activate, so they don't work for us).

The interface board I switched to is RAMPS-fd (RAMPS for [Arduino] Due). I don't quite have a wiring diagram for it, but I did document which connections I made as I went. These can be found in the kosselrampsfd file, as well as links to schematics for the actual RAMPS-fd board. The top bit is cluttered with different thermistor circuits I considered - you can skip over that and skip to line 479 where it describes the actual connections to be made when it comes to wiring a thermistor. I should probably get around to clearing out some of the clutter at the top of that file sometime.

Reply Quote

vlorijer

Re: New firmware for printing directly from the Raspberry Pi's GPIO pins
May 29, 2015 01:36PM

Registered: 8 years ago
Posts: 12

I took a look at the soucecode but I can not understand the principal function.
Can you please explain?
I do understand how DMA works and what a control block is but I could not work out how what your SW is doing.
I wrote a simple gcode decoder using Lazarus for the Raspberry Pi and it runs OK but I 'd like to make it faster. Timing is done by simply checking the systemclock and printing is slowed down when I move the mouse. Drawing also slows down printing.
I 'd prefer not to use the RT-kernel because I think programming is then focussed on C/C++ and there would be a lot to learn for me.
My thought was to calculate the step/dir bits (GPIOs ) for one layer (or any other usefull setting), store it in an array located in memory, start printing using DMA und at the same time calculate the next layer.
The step/dir bits would be stored in memory for DMA acces. Source(memory) stride is off and destination(GPIO set register) stride is the distance between the GPIO_SET and the GPIO_RESET register. That way only 2x4 bytes in memory are needed to set and reset all of the 32 GPIO. (is your SW doing something similar?)
DMA clocksource can be a constant PWM. Minimal pulswidth on the GPIO would be half the PWM-frequency. Only one CB is needed.

As you seem to have a good understanding of DMA do you think this is possible? (I hope my explanation is clear enough)

Reply Quote

wallacoloo

Re: New firmware for printing directly from the Raspberry Pi's GPIO pins
June 01, 2015 02:07AM

Registered: 9 years ago
Posts: 18

Hello vlorijer,

I wrote the DMA stuff back in September and it's remained mostly static, but from what I remember, it isn't possible to directly specify a clock rate for a specific DMA channel. The general approach is that you pace DMA by having it right to a peripheral that only accepts writes every N cycles, essentially forcing a stall for the rest (it's not a bad "stall" though, as the bus arbiter just gives control to another DMA channel during this time).

This means that you essentially have 1 control block that does the actual data transfer followed by another control block that just copies arbitrary data into a paced peripheral (most people use the PWM or PCM peripherals since they have highly configurable clocks), and repeat.

In the way you describe, you would actually want both DESTINATION and SOURCE stride on, otherwise you would just end up copying the same 4 bytes to both the GPIO_SET and GPIO_CLR registers. Printipi does use the stride function in this manner.

Printipi's use of DMA can be seen in src/platforms/rpi/hardwarescheduler.cpp with some additional explanation in the corresponding header file. The layout of control blocks can be seen in lines 446-493. It actually uses 3 control blocks per data transfer - one to copy a buffered value to the GPIO_SET/CLR registers, one to reset whatever frame of the buffer was just copied and another to pace the transfer through PWM. The 2nd control block is necessary because the buffer is circular, so otherwise whatever output you had scheduled would be repeated every half-second (or whatever the buffer length is). This is sometimes desireable, but not generally so. It could be eliminated by resetting the buffer in code (by the CPU), but this would require similar amounts of bus traffic (slightly lower since you avoid having to load one control block header) while increasing CPU usage and requiring a higher real-time processing guarantee, so I didn't go that route.

The code that Printipi's DMA engine evolved from can be found here, which might be easier to study since it's self-contained.

There is one line that might be a bit unintuitive (I should go back and comment it): line 474 and 489 of hardwarescheduler.cpp in which

cbArr.STRIDE = i/3;

2 of the 3 control blocks didn't use the stride feature, so I was able to use that part of the block to store arbitrary data (looking back, one could cram the desired values of GPIO_SET/CLR into those spare bytes if they wanted to). This gave me an easier way to do a reverse-lookup of the current control-block index by just querying DMACH(n)->STRIDE, which makes it trivial to diagnose clock drift or jitter.

I hope that helps. Sorry it took me a few days to get back with you!

Edited 1 time(s). Last edit at 06/01/2015 02:07AM by wallacoloo.

Reply Quote

vlorijer

Re: New firmware for printing directly from the Raspberry Pi's GPIO pins
June 01, 2015 08:36AM

Registered: 8 years ago
Posts: 12

hi wallacoloo,
thanks for answering.
I will have to read it more than once to completely understand it and might have some more questions/answers then.
So, give me one or two days to do so.

Reply Quote

vlorijer

Re: New firmware for printing directly from the Raspberry Pi's GPIO pins
June 04, 2015 07:21PM

Registered: 8 years ago
Posts: 12

hi wallacoloo,
I think to understand how it works now and I have some more questions.
you say "from what I remember, it isn't possible to directly specify a clock rate for a specific DMA channel. "
Is not pacing DMA by PWM like specifying a clock rate? this is how it being done in PWM via DMA.

Can data transfer not be done like this?
S_ADDR = addr(buffer)
D_ADDR = addr(GPIO_SET )
SRC_INC = 1 // = Source address increments after each read. The address will increment by 4
DEST_INC = 0 //Destination address does not change.
TDMODE = 1
YLENGTH = size(buffer)
XLENGTH = 4
D_STRIDE = addr(GPIO_RESET ) - addr(GPIO_SET )
S_STRIDE = 0
page 53 of BCM2835-ARM-Peripherals says: In 2D mode it is interpreted as an X and a Y length, and the DMA will perform Y transfers, each of length X bytes and add the strides onto the addresses after each X leg of the transfer.
( after reading this I don 't think this method will work)
or
use 2 channels. one copying to GPIO_SET and one to GPIO_RESET. Do they stay synchronized if started at the same time?

Thanks for explaining!

Reply Quote

wallacoloo

Re: New firmware for printing directly from the Raspberry Pi's GPIO pins
June 06, 2015 04:09AM

Registered: 9 years ago
Posts: 18

When I said "from what I remember, it isn't possible to directly specify a clock rate for a specific DMA channel," I meant that the DMA controller will always run off the RPi's core clock (default 500 MHz for everything but the B2's and future versions, though this may be scaled down by the DMA controller). So it will always try to pump data through the system's memory bus at this rate. If there are more than 1 active DMA channel, they have to be arbitered - they cannot both use the memory bus at the same time, so they alternate control of it rapidly. Furthermore, the memory bus has to be shared with the CPU too. I know there are special buses for allowing different CPU peripherals to communicate with eachother directly, avoiding interference from the rest of the system (if you looked into this, you might be able to find a way such that the GPIOs are actually toggled at precisely the times you calculated).

So the result is that you cannot explicitly clock the data transfers. They will essentially happen as fast as possible, and will vary based upon what the rest of the system is doing. But what you can do is to transfer to a clocked peripheral like PWM or PCM. Since these can only accept data until they fill their (relatively short) buffer, the DMA engine will momentarily pause until the peripheral frees a spot in its buffer (you must set the relevant DREQ (Data Request) signal in the DMA header to achieve this behavior), which occurs at a precise frequency (e.g. something like 48 kHz for audio). So if you interleave your GPIO data transfers with transfers to a buffered peripheral, you can rate limit the GPIO transfer as well. But you still aren't precisely clocking it - although you can configure the peripheral such that you know no more than N transfers/sec will occur, and you can set N such that under 99.99% of circumstances, the DMA engine will be able to meet that demand, there's still some variability between when the peripheral accepts its data and when the DMA controller delivers the following bytes to the GPIO pins, since everyone is still competing for bus access.

So pacing DMA transfers with clocked peripherals isn't the same as clocking the actual DMA transfer itself. Both let you specify the rate at which to transfer data, but actual clocking has very little variability from cycle-to-cycle whereas pacing through a peripheral does have measurable variability. I don't know of anyone who has measured this variability physically, but it's easy to place crude upper-bounds on it by having the CPU read the active DMA address and the system clock in a while loop. Pacing dma transfer at 500 kHz (2 uS period), something like 95% of the transfers occur within 10 uS of their specified time. Again, very crude, as much of this variability is introduced by the actual measurements themselves since they're done on a CPU, so there's plenty of padding in that upper-bound.

The main difference I was getting at in regards to not being able to explicitly clock DMA transfers is that it means that between each write to the GPIO bank, you must interleave a write to the paced peripheral. Generally, this means that you need a separate control block ( CB ), so your first example won't work (since you're never writing to the paced peripheral, your transfer will just happen as fast as the DMA controller can manage).

If you get clever, you can still achieve this with 1 CB per GPIO buffer frame. GPIO_SET and GPIO_CLEAR are laid out something like so (I may have mixed this up - writing from memory): [GPIOSET0, GPIOSET1, 4-byte padding, GPIOCLR0, GPIOCLR1] where each register there is 4 bytes. The pins controlled by GPIOSET1 and GPIOCLR1 aren't routed to the header on any of the RPis (except perhaps the new B2s - not sure). Therefore if you write 16 sequential bytes to GPIOSET0, incrementing the destination address, you can write to all the needed GPIOs without using the STRIDE feature.

Now the STRIDE feature can be used to advance the write position to the input buffer of some paced peripheral, thus transferring the data and pacing the transfer with only 1 CB. Example:

S_ADDR = addr(buffer)
D_ADDR = addr(GPIO_SET0 )
SRC_INC = 1 // = Source address increments after each read. The address will increment by 4
DEST_INC = 1 //Destination address also increments
TDMODE = 1
YLENGTH = 1 //YLENGTH indicates the number of *strides* to make, so YLENGTH=1 means 1 stride means copying 2 sets of XLENGTH. The documentation is slightly off here.
XLENGTH = 16
D_STRIDE = addr(PWM_BUFFER) - addr(GPIO_SET0)
S_STRIDE = 0

(Because you're writing 4 words to PWM_BUFFER, you'll need to make sure that the 3 words after the buffer are actually OK to write to. IIRC, the address difference between the PWM_BUFFER and GPIO_SET0 is actually too large to fit into the D_STRIDE field. But I believe other paceable peripherals, like PCM, are within the needed range.)

Unfortunately, after that, you would need to load a new CB. So you need 1 CB per each GPIO write. You can't just have NEXT_CB = this, because then your S_ADDR will again point to the first element of the buffer. You could pull some really crazy shit and try placing the control block header in the bytes proceeding PWM_BUFFER, or perhaps exactly 1 D_STRIDE after the PWM_BUFFER and dynamically set the S_ADDR field, and then have NEXT_CB=this. This would indeed allow everything to be done with just 1 CB. But the gains haven't justified me investing more time into these possibilities, as the jitter and frequency achievable with 500k transfers and 3 CBs / transfer seem acceptable to me.

Quote

use 2 channels. one copying to GPIO_SET and one to GPIO_RESET. Do they stay synchronized if started at the same time?

Based on the previous description of how DMA channels are all competing with eachother and the rest of the system for bus access, no, they will not stay synchronized. You could try pacing each channel through a different peripheral, where each one is clocked at the same frequency, but I don't know enough about how the clocks are divided (i.e. how the system scales down the core clock to achieve the specified clock frequency for each peripheral) to know if that would keep them synchronized. I wouldn't count on it.

There may be other ways to use multiple channels in DMA though. For example, it may be possible to have one channel copy data into the paced PWM buffer and have another channel rapidly copy the PWM's current output to the GPIOs over and over (unpaced). This doesn't offer any clear benefits to me, and just makes things more error prone (if the unpaced channel is unserviced for a substantial amount of time, then one sample is lost altogether, resulting in the software thinking the motors are at a different location than they truly are). I mainly bring this up to encourage thoughts into different ways DMA can be used for this task.

Perhaps the best thing to achieve would be to find a peripheral which can provide both incoming and outgoing DREQ (Data Request) signals and use two separate DMA channels for interfacing with this peripheral. One DMA channel would copy the peripheral's output to the GPIOs upon receiving a DREQ. I believe this would be done on the peripheral <-> peripheral buses, thus avoiding most bus contention and achieving greater timing precision that the current implementation. The other DMA channel would stream the next desired GPIO state into the peripheral upon its DREQ. The first DMA channel would probably be given higher priority. This method requires that a peripheral have one DREQ signal that is cleared upon write and a separate one that is cleared upon read. If you tried to use just the write-based DREQ signal from the PWM peripheral, the second DMA channel could fill the empty buffer slot and clear the DREQ signal before the other channel got the chance to copy the other end of the buffer to the GPIOs. Or, the first channel could read multiple words during the DREQ, which would be a problem if you're trying to use DEST_INC.

I believe the SPI peripheral can provide the necessary DREQ signals for the above idea, but I'm not sure that the data written by the second DMA channel can be exposed to the reader DMA channel without directing the SPI's output to a GPIO pin and physically connecting that to the SPI's input pin.

If you're really interested in this DMA stuff, you might want to dig up the code used in MachineKit. They may be using a few tricks in their code that I haven't thought of.

Reply Quote

Dejay

Re: New firmware for printing directly from the Raspberry Pi's GPIO pins
June 06, 2015 04:42AM

Registered: 9 years ago
Posts: 210

*Wooosh*

Reply Quote

vlorijer

Re: New firmware for printing directly from the Raspberry Pi's GPIO pins
June 11, 2015 12:02AM

Registered: 8 years ago
Posts: 12

hi wallacoloo,
thanks again for your answer!

Quote

GPIO_SET and GPIO_CLEAR are laid out something like so (I may have mixed this up - writing from memory): [GPIOSET0, GPIOSET1, 4-byte padding, GPIOCLR0, GPIOCLR1] where each register there is 4 bytes. The pins controlled by GPIOSET1 and GPIOCLR1 aren't routed to the header on any of the RPis (except perhaps the new B2s - not sure). Therefore if you write 16 sequential bytes to GPIOSET0, incrementing the destination address, you can write to all the needed GPIOs

I tought about this but the datasheet says "reserved" for the bytes between SET and CLEAR. So can we know nothing happens if we write to these addresses?
I made a printout of your answer and take it with me on holiday. I 'll be back in a few weeks and hope get some ideas in that time.

Reply Quote

wallacoloo

Re: New firmware for printing directly from the Raspberry Pi's GPIO pins
June 16, 2015 05:08PM

Registered: 9 years ago
Posts: 18

Quote
vlorijer
I tought about this but the datasheet says "reserved" for the bytes between SET and CLEAR. So can we know nothing happens if we write to these addresses?
I made a printout of your answer and take it with me on holiday. I 'll be back in a few weeks and hope get some ideas in that time.

That's a good point. I interpreted "reserved" as meaning "reserved for future revisions", i.e. "this address isn't currently used, but it might be used for something in the future". In that interpretation, writing to those addresses won't do anything for current raspberry pi's, but it might incur undefined behavior in future versions.

In any case, my guess for why those bytes are reserved in the first place is in case future chips support more I/Os. In that scenario, it's safe to expect that the reserved bytes will follow the pattern GPSET0, GPSET1, GPSET2, ... GPCLR0, GPCLR1, GPCLR2, .... Since writing zero's to GPSETx or GPCLRx is a no-op, I would feel comfortable with writing zero's to the reserved regions too since it will likely still be a no-op if the GPIOs are ever expanded.

But yes, I don't think you can technically guarantee any behavior for what happens when you write to a reserved address. So it's a do-at-your-own-risk type of thing - if you try it and it works (I can say that it did work on the original RPi model B back when I evaluated that method around September) and you're comfortable with the possibility of it breaking on future revisions, then you're probably O.K.

Reply Quote