</p> This article was first published 2007 March 22, and is converted from a doctools document. Some translation errors may remain. I’ve tested viewing this page under both Linux and Macintosh environments, with Chrome and Firefox. (I do not have access to Internet Explorer, sorry.) </p>

# 1 Introduction

When I first started coding in Haskell, not one month ago, the concept of
*monads* just thoroughly confused the heck out of me. Sure, like any clueless
newbie, I could code in Haskell without knowledge of monads. But who wants to
use a gun without knowing how to clear the chamber first? Who wants to cook
without knowing how the stove works? Who wants to *learn Haskell* without
knowing about monads? After all -- it's only natural!

After much frustration, I finally came to know them well enough to write this article.

# 2 Monads as Constructors

To grossly over-simplify, monadic expressions are *constructors.* You read
that right -- *constructors*. As in object constructors. To understand why
this is the case, we need to look at the simplest possible monad: the *State
Monad.*

OK, scenario one: you are trying to LR-parse data from some input stream, and
you need to maintain some kind of state while iterating over the input. Let's
conveniently ignore `unfoldr` for the time being; you're inexperienced, and
you are trying to apply your knowledge of how to code something in C towards
the Haskell solution. At least, that is, for this example.

Scenario two: you're writing a window manager for the X Window System, and you
need to maintain the list of windows currently visible on the screen. How do
you do this in a purely function environment, especially when you *also* need
to process events from a multitude of different sources, parse (*ahem*)
configuration files, etc?

Due to the genericity of the problem of maintaining state, I won't get into specific data structures for such state. Instead, I will concentrate plumbing required to make the magic of maintaining state happen in a functional language.

# 3 Observation on Sequential Evaluation

In any functional language, you *cannot* guarantee the order of execution of
functions. At least, you're not *supposed* to. Just think: according to all
the math texts out there, the results of a function magically appear only once
all its inputs are satisfied. Functions are supposed to "just work." There's
no real way to impose an order of evaluation on them. Or can you?

A function cannot be evaluated meaningfully without providing all the parameter
values it needs to produce a result. Thus, it follows that you can
artificially impose a sequential evaluation order by *threading* the output of
one function into the input of another, like this:

> let hello w = w ++ "!" > in "hello" : (hello "world") : []

The above code looks like a normal list construction, but remember that you
can't build the list until you have the nodes to stuff into it in the first
place. Hence, it *must* evaluate the `hello` function before the list can be
fully evaluated.

# 4 Observation on Threading State

Now that we know how to thread the execution of functions properly, we turn our attention to the obvious problem of maintaining state from one function to the next. Well, this is actually pretty simple -- simply pass, and return, that state as parameters and/or tuples. For example:

> data Stack a = Stack a (Stack a) > push :: a -> Stack a -> Stack a > push a stk = Stack a stk > pop :: Stack a -> (a, Stack a) > pop (Stack a stk) = (a, stk)

Let's ignore errors for the time being. What is important to observe in the
above code is that (a) we're passing explicit state (though it's not strictly
true, you can think of `stk` as an object, and its use as analogous to
`self` or `this` in more traditional object oriented languages) at all
times, and (b) the stack "methods" are always *returning* an updated state as a
result; this is *implicit* in object oriented languages, where state is updated
in-place. This isn't necessarily true in functional languages, though
optimizations usually produce equivalent code.

Well, as you can see, having to maintain all that state, manually threading the output of one function to the input of another, can get substantially tedious! This can lead to code that is harder to read, even harder to debug, and all but impossible to modify at some future date.

There has to be some way of factoring this hand-threaded code out.

It turns out that there is. But first, we need one more observation about the nature of software before we can put all the pieces together.

# 5 Observation on Threading of Flow Control

In any given imperative programming language, like Python for instance, it's generally accepted that when you write something like:

>>> a = 5 >>> b = 4 >>> c = a+b >>> print c

it is pretty clear that `a` is assigned a value, *then* `b` is assigned a
value, *then* the sum of those values are printed to the screen. In other
words, order of execution is determined *exclusively* by order of listing.
Control flow follows, strictly, from top down. Unless told otherwise, of
course, usually by a for loop, or some such. But those are minor details.

It is pretty clear, in the above code, that we cannot evaluate the value of
`c` meaningfully without *first* having evaluated both `a` and `b` first.
Sound familiar? Yes -- that's right -- nested functions. In this case, if
you'll allow me some literary license, we can rewrite the above code using
lambda expressions in a continuation passing notation, like so:

>>> (lambda (a) (lambda (b) (lambda (c) print c)(a+b)))(4)))(5)

*Yikes!* Maybe Python isn't such a good language to express lambda
substitutions with bound variables in. Indeed, lambdas are going away in
Python in the future, so we might as well look at the equivalent code in a
language that directly supports the notion:

> show $ (\a -> (\b -> (\c -> c) (a+b)) 4) 5

What, this still isn't clear? OK, let me show you again, only this time we'll take it statement by statement. Or, rather, their equivalents.

> show $ (\a -> ...) 5

If this looks like normal function calling syntax, that's because it is. What
we're doing is using a lambda expression to *assign* the value 5 to the formal
variable `a`. Pretty slick, eh? Now, let's look at the next "statement":

> show $ (\a -> (\b -> ...) 4) 5

Notice a pattern yet? Note how "the rest of the program" is treated as a
single function, expressed in terms of the preceding variable assignments?
Note how the only way to successfully evaluate the functions is to evaluate
them in the proper order? Yes; that's right -- a *program,* mathematically
speaking, is just a *nested set of functions.* Evaluate them in the proper
order, and you will get the same results as an imperative languge program.

# 6 Observation: Thinking Algebraically

By now, you're likely to have had at least one epiphany on where we're going.
If not, I'll now make things more explicit by making one more observation: at
any point in an imperative program, you can always split it *between*
statements, such that evaluation of the top part is a necessary precondition to
the execution of the bottom part. It seems pretty obvious at first glance, but
it's a *critical* observation to make explicit. Because, after all, if we can
manipulate functions like anything else in higher-order languages, and we can,
then it should be possible to build some *function* which *returns* a properly
composed *sequence* of *other* functions, that does precisely that. I mean,
execute A before B, that is. For the sake of argument, let's call this `>>`.
We can therefore rewrite our Python example like so:

(a = 5) >> (b = 4) >> (c = a+b) >> (print c)

The associativity of `>>` really doesn't matter; it can be shown that:

(a = 5) >> ((b = 4) >> ((c = a+b) >> (print c)))

is equal to:

(((a = 5) >> (b = 4)) >> (c = a+b)) >> (print c)

Hence, we don't usually bother drawing parentheses around such constructs. Haskell is, by default, left-associative, so Haskell says, also, that >> is left-associative. But, strictly speaking, it doesn't have to be. Anyway, drawn yet another way:

a = 5 >> -- Note how these operators are "between" statements b = 4 conceptually. >> c = a+b >> print c

By the definition of `>>`, whatever is on the left-hand (top) side of the
operator *must* evaluate before the right-hand (bottom) side.

The `>>` operator properly arranges for sequences of code execution, but it
certainly doesn't address that sticky issue of transferring state from one
statement to the next. How does, after all, one *bind* variables to values?
Well, remember that dirty trick of nested lambda expressions? `>>` creates
those, but as shown above, doesn't thread state from one portion of a program
to the next. Fortunately, we have `>>=` to do that for us:

> 5 >>= \a -> 4 >>= \b -> a+b >>= \c -> show c

Remember the "we don't care about associativity because it just works out" rule
for `>>`? It's the same for `>>=` too, since it, too, sequences bits of
the program correctly. In fact:

> a >> b = a >>= \_ -> b

In other words, `>>` is just a special case of `>>=`, where we don't
particularly care about the result of `a` at all.

# 7 Putting it Together: The Not So Obvious

So, we now have *all* the pieces we need to finally explain just what the heck
monads are. So, let's get to work building our very own *state* monad!

## 7.1 Data Types

We need *something* to stuff into our state, but we don't know precisely what.
Hence, we're going to use parametric types to describe it. But one thing *is*
for sure, we know that properly sequenced functions depends on nested lambda
expressions. Therefore:

> newtype ExState s a = ExS (s -> (s,a))

That's right; our data type basically *is* itself a function. It takes some
state as input, and returns (hopefully) some value of interest, along with a
modified state. Remember the observation about how we threaded state as input
and got another state as output? The plumbing is right there, but is
abstracted in a `newtype`. Note that a `data` type could work here just as
well.

Next, we need some function to *compose* pieces of our program together -- we
need to define the `>>=` operator. It's pretty clear that a program takes
raw input, but provides a transformed version of those inputs in some capacity
(otherwise, what's the point of writing the program?) Even if the program
provides no useful value for `a`, the fact that it updated some kind of state
*somewhere* is of great value. Hence, *the result of a program* **must** *be
of some type of ExState*. Otherwise, what's the point?

> (>>=) :: ExState s a -> (a -> ExState s b) -> ExState s b > top >>= btm = ExS (\initialState -> > let (sTop, vTop) = perform top initialState > (sBtm, vBtm) = perform (btm vTop) sTop > perform (ExS f) = f > in (sBtm, vBtm))

Look at what it's doing. The result of `>>=` is a kind of *function*, just
like we said it should be before. It takes an `initialState`, and returns
the pair `(sBtm, vBtm)`, where `vBtm` is the return value from the complete
computation, and `sBtm` is the new state. And, as both names and intuition
would suggest, these values correspond to the results you'd get by finishing
the program at the *bottom* of its listing; just like in any other programming
language. We see that `perform (btm vTop) sTop` is used to compute these
values. Note that `btm vTop` must, by definition, be some kind of *ExState*;
we use the helper function `perform` to yank the function out of it. That
allows us to invoke the bottom part of the program's functionality.

Remember in our earlier discussion how `>>=` bound only a single variable?
Well, it turns out that by doing so, and requiring that function to return a
monadic value itself, we get the following algebraic substitutions:

> let (sBtm, vBtm) = perform (btm vTop) sTop > let (sBtm, vBtm) = (\someState -> (anotherState, anotherValue)) sTop > let (sBtm, vBtm) = (finalState, finalValue)

Pretty slick, eh? Note the middle line where substitution produces a function. But where does sTop come from? Yes; it comes from the first let-binding:

> let (sTop, vTop) = perform top initialState

So, in order to properly evaluate the bottom, we *must* first evaluate the top
portion of the program. The result of the binding is itself a function which,
upon evaluation, evaluates these "sub-programs" in the proper order. And, as
you might expect, sub-programs can consist of sub-sub-programs, and
sub-sub-sub-programs, ad infinitum. The associativity rules of `>>=` allow
us to just keep on threading and threading and threading.

And since we're passing around all these crazy data structures with functions
containing functions containing functions, and states being carefully threaded
from function to function, now you see why I said, earlier in this article,
that monadic expressions are constructors. What we're building, literally, is
a *direct-threaded* representation of a program. The name *direct-threaded*
isn't accidental; invented in the 1960s and first successfully used by the
Forth programming language, it's purpose was to thread together a bunch of
functions, passing state from function to function. Just like what we're doing
here!

But, there is one last issue involved with all this: once we are working inside a monad, how do we actually make use of the state we're maintaining?

## 7.2 Accessors

There are generally two kinds of accessors: *getters* and *setters*. Haskell
makes this patently clear when working with monads, especially State monads,
because the only way to access the "hidden" state is through such accessors.

> getState >>= \st -> doSomethingWith st >>= . . .

It's clear that `getState` must be a function that returns `ExState` in
this case, since executing it must produce the pair `(sTop, vTop)` inside the
`>>=` function.

> getState = ExS (\s -> (s,s))

Yes, it really *is* as simple as that. What we're doing is we're taking our
state, and returning it as the next return value; we're also leaving it
unchanged in the next state as well.

Changing state is accomplished with a similar, but complimentary, function:

> putState x = ExS (\s -> (x,x))

Note that we "return" x as well; it's not strictly necessary, since 99.999% of
the time, `putState` would be used with `>>` rather than `>>=`, thus
discarding the result anyway. But, no matter what, it's patently clear that
the *state* half of the pair returned is, in fact, being set to `x`;
precisely what we want.

One final "accessor" that really isn't is the `return`. This is a kind of
hybridized `putState`; its purpose is simple: return a value, leaving the
state otherwise unaltered:

> return x = ExS (\s -> (x, s))

## 7.3 Running "Programs" in the State Monad

So, let's recap. We have a set of operators that construct structures in memory representing what has to be performed (at least conceptually; I should point out that compilers designed to work with monads often optimize this step out), at what time, and with what state. Speaking of state, we have operators which grant us access to it, for both alteration and query purposes. We can construct the cure for world hunger with these basic primitives, but if only we can actually invoke the programs we create, and get the results!

Once we have the facilities for all that plumbing in place, and we can now
clearly define the concept of "program execution" in terms of simple algebraic
functions, we can now turn our attention to actually *doing* something useful
with it. Like, incrementing a counter, for example:

> inc = getState >>= \s -> (putState (s+1))

Looks pretty straight-forward; admittedly, it's a far cry from curing world hunger, but it is a start! In fact, we can thread several "invokations" of this function along too:

> counterIncrementByThree = inc >> inc >> inc

Before you know it, we'll be curing *AIDS.* But, this syntax, while useful at
times, is pretty ugly and hard to manage overall, so haskell allows us to use
conventional *imperative-style* programming constructs, called *do-notation*,
to write clearer code:

> inc = do > s <- getState > putState (s+1)

See what's happening? The `<-` operator maps to the `>>=` operator,
complete with lambda variable binding. Quite convenient indeed! But, we
needn't always use variables:

> counterIncrementByThree = do > inc > inc > inc

In this case, because we're not assigning variables, the Haskell compiler knows
to use the `>>` operator. But, remember how `>>` was defined in terms of
`>>=` earlier? That's why we didn't need to *explicitly* define our own
`>>` operator.

This is all fine, but, the observant reader will point out that we still have
yet to explain how to actually reap the rewards of our programming. We need to
extract the results of the computation. This is usually done with a *runner*. In fact, we have already seen such a runner:

> perform (ExS f) = f

That's the one. Previously, we defined it to be local to the `>>=` operator, and to not bother with the details of passing initial state. A more complete runner, however, does exactly that -- deal with the state, I mean:

> runExS (ExS prg) initialState = snd $ prg initialState

Since the result of running a monadically constructed function is a pair
containing (state', result), we use the `snd` function to extract only the
result. In some cases, you'll be more interested in the state; in this case,
you can define another function if you like:

> stateExS (ExS prg) initialState = fst $ prg initialState

If both are of interest to you, then instead of evaluating `prg` twice, which
can be a gratuitous waste of time if you're computing the digits of *pi* to a
billion places, then you can forego the pair selection functions, and just
return the whole pair itself:

> valueAndStateExS (ExS prg) initialState = prg initialState

Typically, you'd use it something like this in a real-world program:

> main = do > let (someState, someValue) = valueAndStateExS myCounter 0 > putStr $ "Resulting value: " ++ (show someValue) > putStr $ "Resulting state: " ++ (show someState)

Doesn't look like much, does it? But note that `0` hanging off to the right
on the call to `valueAndStateExS`? That is the program's *initial state* --
in C or C++, this is equivalent to the global state, usually established by
numeric constants in main(), or through global (static, hopefully!) variables
in some module.

# 8 Conclusion

Now that you know how the state monad works, you can apply this concept to
any other monad, including the IO monad. There are other kinds of monads that
are *not* stateful or state-like though. These include the list monad (`[]`)
and specialized data types like `Maybe`. But, for now, I'll let these
specializations of the concept rest. You've already been through a lot, and
I'm itching to get this paper online. And, besides, having gone through all
this work, you can now *finally* appreciate the `Control.Monad.State` and
related libraries.

Well, there you have it; I must bring this essay to a close now, knowing that you have read yet another tutorial on monads and what they actually are. My essay differs from most of the others in that it doesn't invoke the concept of containers, which apparently has confused a lot of people. It also doesn't invoke any complex mathematics (like category theory, where monads come from). Instead, I resort to normal, day-to-day programming experience, possessed by any programmer of any imperative programming language. I hope that this explanation has proven as useful to you, as it has to me.

# 9 Appendix

The software contained in previous sections are valid Haskell fragments of
code. However, they don't tell the *whole* story. Some details I've had to
leave out for brevity's sake. But fear not -- contained herein is the complete
program that allowed me to write this essay. It's pretty short and dense, but
by following along with the earlier parts of this essay, hopefully you'll be
able to see how all the data flows fit together with the monadic plumbing. I
should also warn you, this code is far from optimal -- it's written so that it
works, and is clear. If you look at, e.g., `Control.Monad.State` library,
you'll find a substantially terser definition, that does the same basic things.

> newtype ExState s a = ExS (s -> (s,a)) > > instance Monad (ExState s) where > top >>= btm = ExS (\initialState -> > let (vTop, sTop) = perform top initialState > (vBtm, sBtm) = perform (btm vTop) sTop > perform (ExS f) = f > in (vBtm, sBtm)) > > return x = ExS (\initialState -> (x, initialState)) > > getState = ExS (\initialState -> (initialState, initialState)) > putState x = ExS (\initialState -> (x, x)) > > stateExS (ExS prg) initialState = fst $ prg initialState > runExS (ExS prg) initialState = snd $ prg initialState > valueAndStateExS (ExS prg) initialState = prg initialState

# 10 Acknowledgements

I would like to take this time to thank the folks in the #Haskell IRC channel for opening my eyes to this topic. I could not have come to this understanding without their input. I especially would like to single out Cale, Dolio, Kowey, and SamB, whose patience with me has been the only thing keeping my interest in understanding monads alive.

Special thanks to Cale and Dolio for reviewing this document. Their contributions were invaluable at helping to clarify various points.

And, oddly and finally, thanks to James Burke, for providing a literary style
that I continuously endeavor to match. It's highly conversational, and very
engaging; it's not the dreary doldrum you'd expect from a paper on, of all
things, stuff called "Monads." I mean, c'mon, who ever heard of *monads?* They
sound like medical conditions! Anyway, his creative use of *suspense* and his
masterful *touch* of humor in educational literature (as distinct from
*academic* literature) lures the reader into *wanting* to learn more, which is
precisely the effect I'm looking for. I hope I've succeeded.