goroutines' memory management

Disclaimer: This is an unrefined post, see Consistency over Perfectionism.

Goroutines are application-level threads that mimic the behaviour of OS threads, except for the fact that they don’t have priorities because they don’t deal with OS events. The Go scheduler is in charge of periodically making goroutines yield so that other goroutines can run: for that reason, it maintains a local run queue (LRQ) pointing to the running goroutines.

The number of goroutines that can run in parallel on any machine is given by

number_of_cores * number_of_hardware_threads_per_core

Each hardware thread is associated to an operating system thread.

Every goroutine and OS thread gets a contiguous block of memory called stack: the size of the stack changes according to the operating system’s own decision (Windows can allocate up to 1Mb, Linux initially allocates the equivalent of a page size and later increases it as needed).

Go sets the size of each goroutine’s stack to 2Kb, so that tens of thousands of goroutines may run at the same time, if required. A goroutine’s stack may grow over time and, since it must be a contiguous block of memory, all* values in it are copied to the new block of memory. This is another reason why values that are shared across goroutines are not placed on the stack: they would have to be copied very frequently.

Every time a goroutine wants to execute a function, it will take a slice of stack memory called frame, which acts as a sandbox: every function execution is essentially a data transformation.

A pointer allows us to modify a variable which might belong to another frame without making a copy: it’s a performance tradeoff which comes at the cost of side effects that may compromise data integrity.

The Go compiler performs escape analysis to determine where a value should be constructed in memory based on if, how, and where it is shared. If a variable is shared with another function/goroutine down the call stack, we won’t have any problems because it will still be available when the downstream function runs: if it’s shared up the stack instead, it would be cleaned up along with the function stack once the function declaring it has completed; to avoid this, the compiler will allocate it on the heap instead (which means that the value will now represent something that’s outside the frame).