Doing Haskell: Part Three: The REPL main loop

Ok, following on from last time where we briefly hinted at something called referential transparency and see-through referees. We've only really started to explore all the interesting things that are going on, so to recap, here's the code for the main loop again:

main :: IO ()
main = do
  hPutStr stdout prompt
  hFlush stdout
  cmd <- hGetLine stdin
  let parts = words cmd
  case (length parts) of
    0 -> main
    _ -> execCmd parts >> main

Crikey, I can feel a book coming on, first let's rip through the above with a "potted" explanation and then dig a bit deeper...

  1. main :: IO ()
    this is the function type for "main", it says that main is a function that has no arguments (how do we know that?) and returns something called "IO unit". The "IO" is in fact the much-feared and revered "IO Monad". No doubt you've heard horror stories about monads, but if you did your homework like I suggested by repeatedly watching Brian Beckman -- Don't fear the Monads until you choked on it, then by now you should at least have an appreciation of what they are. If you didn't watch it, go and watch it now and then come back if you are still awake! LMAO. :)
  2. main = do
    this is essentially the function definition, it starts the main function with the "do" keyword which is a very pleasant way of hiding some real ugly looking stuff when you first see it. For now just accept that "inside the IO monad", the word "do" is how you tell Haskell you want to run code sequentially kind of like you would in stock imperative programs. Again, more on this in a while.
  3. hPutStr stdout prompt
    this line is where we print the value of the prompt to the console, "stdout", this is exactly like any C program would do or anything else that writes to the terminal. Yes, Haskell allows you to write "unix filters" if you want, any language that reads from stdin and writes to stdout falls into that category so nothing too special in that.
  4. hFlush stdout
    Haskell is the only thing I know that is lazier than I am. The concept of "lazy I/O" also takes some getting used to, for now just accept that the flush call forces the prompt string to the console so the user can see it.
  5. cmd <- hGetLine stdin
    INCOMING! This is where we actually get the typed input from the console, "stdin". As we are operating in the "IO monad" we can call other I/O functions like hGetLine stdin can give us things from the outside world. The return type of hGetLine is "IO String", and in order to "extract" the pure String value from the "impure" IO monad we use the "<-" operator, think of it as lifting the payload out of the monad wrapper, resulting in "cmd" having the type "String" and not "IO String". If we had called a "pure" function then we would have assigned the value into "cmd" with the "let" keyword instead, so there's a rule: binding return values from pure functions uses "let", from impure functions use "<-".
  6. let parts = words cmd
    Oh! Look, we used "let" to call the "words" function and assign the result into the variable binding called "parts". It's signature is String->[String], it takes a string and returns a list of strings, no I/O in sight. How do we know? Because it would have the signature String -> IO [String] if it did. Simples.
  7. case (length parts) of
    Having now turned the input string into a list of strings, we can assume that the first word is the command and the remainder of them are the arguments to that command. Remember this is quick-and-dirty; we haven't bothered to trim whitespace or anything. This line then calls the length to see if the user entered anything at all.
  8. 0 -> main
    If the length is zero then the user just hit RETURN without entering anything, so all we do is call the main function again, and thanks to something called tail-call optimisation we don't run out of stack space.
  9. _ -> execCmd parts >> main
    If the user *did* enter some input then we call "execCmd" with the list of words and then jump around to the main function again. Forever and ever, until the user quits. How we detected quit is done in the "execCmd" function but that's for another posting, we've a lot of raised points to address now. Starting with the "_" in this line of code!

Pattern Matching

The underscore "_" used the catch anything that isn't a zero in the case statement is something that you will see a lot of; even Erlang has the same concept, both pattern matching and the underscore as the "I don't want to know what it was" consumer. To sum up the case statement, if the user enters nothing we go around the lighthouse again, if they entered any number of words then we call the execCmd function to deal with them. The underscore is not restricted to numbers, it takes on the type of whatever the return value was from the function called to pivot the case around.

Syntactic sugar, "do" and that weird ">>" thing in line 9...

I mentioned that the "do" keyword was a way to instruct the Haskell compiler that we want to enter "sequential" mode somewhat akin to standard imperative programming languages where the order of code is the order of execution. Huh? You mean Haskell doesn't run my code in the order I typed it in? How the hell does it work then ? Welcome to the first real mind-f* of Haskell and lazy evaluation!

A PHP Comparison

Let's digress a bit and see how a PHP function and a Haskell function might look the same but in fact the Haskell program could do anything with the execution order! If you grasp at least the concept of lazy evaluation and why it can leave you with holes in your head sometimes through hair-pulling, then you will be a little more informed about how Haskell thinks! It's trying to help, really it is!

Take this PHP function as an example, it's really simple but it demonstrates one of the downsides to strictly imperative approaches, and this affects C, C++, Objective-C, Perl, you name it, if it is "imperative" then this applies equally.

function brain_drain($foo) {
  $results = largeDatabaseQuery();
  doSomething();
  if (42 == $foo) {
    return count($results);
  }
  return 0;
}

Yes, totally stupid but lets read it through: we have a parameter called $foo passed in, and then the first thing we do is a potentially huge resource-hungry server sapping database query that might return a million rows of data. Next we call a function "doSomething()" that does something.
Finally, if $foo has the value '42' we return the number of rows we read, assuming $results is an Array (Haskell has types, I like types!). If $foo is anything other than 42 it returns zero.

What a waste of time and resource! Now I know what you are saying, you could re-code it like this and the issue goes away: (forget the fact that doSomething() might have assumed that the query actually did run; that's a whole other world of pain!)

function brainDrain($foo) {
  doSomething();
  if (42 == $foo) {
    $results = large_database_query();
    return count($results);
  }
  return 0;
}

Yes, but *you* had to recode it. The cyberspace is littered with stuff like that, you can't recode it all. PHP isn't smart enough to figure out at run-time that the execution path may or may not use the results of the query and so it does it anyway because you told it to. Haskell is cool. Haskell creates something called a "thunk" which is like a "delayed call" or a "closure" thing that contains the call but it hasn't yet done it. Why? Because nobody asked for the results yet! If during the course of the execution it realises that something is now trying to use the results then, and only then, will it execute the query.

brainDrain :: Int -> Int
brainDrain foo =
   let results = largeDatbaseQuery
   doSomething
   case foo of
     42 -> length results
     _ -> 0

I've deliberately avoided anything to do with monads in the above code, it is purely to show you a parallel to the PHP code. Lazy evaluation means that code only executes when it absolutely needs to. This means that code is not guaranteed to execute in the order you wrote it. Sounds odd, unreal and highly infeasible but it actually does work like that! Look hard: first we "pretend" to do the query, then we always call "doSomething", then we decide what value "foo" has and act accordingly. Here comes the cruncher / Eureka thing: if foo is 42, then the database query will execute AFTER the call to "doSomething" despite being physically before it in the program code. If foo is not 42 then "doSomething" is the only function that executes. Savour. Enjoy Haskell-s cleverness.

Function Signatures

Time to mention these now as we've used one again in the previous code snippet, Int -> Int. What does that mean ?

Without going into too much head-hurting detail, after all, books like Real World Haskell and Learn You a aHaskell do this stuff much better than I ever could, a function signature in Haskell is "read" like this:

Int -> Int -> Int  -- a function that takes two integers and returns an integer
String -> [String] -- a function that takes a string and returns a list of strings
String -> Int -> () -- a function that takes a string, and integer and returns "unit"

"UNIT" written as "()" in Haskell can be thought of as "NULL" in PHP or "void" in C/C++.
So, what's with all those arrow thingies? Do you like curry? I hope so... in the next article we are going to dig right into one of the core techniques that allows Haskell to do really clever stuff with functions as first-class objects.
My head is hurting now so I need to recover from this article and figure out how to explain stuff like "currying", "partially applied functions" and other things, which once understood will make reading Haskell code a lot easier.

Thanks for reading this far.
:)

Content Tags: 

Add new comment

Filtered HTML

  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <pre> <a> <em> <strong> <cite> <blockquote> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Syntax highlight code surrounded by the {syntaxhighlighter SPEC}...{/syntaxhighlighter} tags, where SPEC is a Syntaxhighlighter options string or class="OPTIONS" [title="the title"].
  • Lines and paragraphs break automatically.

Full HTML

  • Web page addresses and e-mail addresses turn into links automatically.
  • Syntax highlight code surrounded by the {syntaxhighlighter SPEC}...{/syntaxhighlighter} tags, where SPEC is a Syntaxhighlighter options string or class="OPTIONS" [title="the title"].
  • Lines and paragraphs break automatically.

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
Image CAPTCHA
Enter the characters shown in the image.