Wednesday, August 11, 2010

The Iron Beach I/O Language, Part I

This is a Global Script-based concurrent procedural stream I/O language. The IBIO-specific portion of its standard library will be detailed in this post and a following one; subsequent posts will explain the concurrency and exception libraries (which are not IBIO-specific), and there will probably be a final follow-up post.

Rather than define a specific type which programs must have, IBIO provides a set of overloads and allows programs to be polymorphic over them. This allows IBIO programs to work more easily with monad transformers, and also allows other Global Script-based languages to support running IBIO programs and sub-programs (and, hence, libraries).

Basic Functions

overload module ibio.c m

The most important overload is module ibio.c m. An instance of this overload declares that subprograms of the type m in out α are capable of performing stream input of symbols of type in and stream output of symbols of type out; it is a sub-overload of ∀ 'in 'out. module monad.c (m in out).

ibio.print :: ∀ 'm 'in ('out :: •). {module ibio.c m} →
    [out] → m in out 〈〉

This is IBIO's only exposed output subprogram. Operationally, the intention is that executing print xn schedules output of xn to the current output channel; outputs scheduled will be performed, in the order they were scheduled, in parallel with on-going execution. The list xn is operationally a stream, not a buffer, so some values may not be available immediately; output (but not execution) will be halted until the next symbol needed is available. In general, symbols will be output as soon as they are available; however, if the current output channel is attached to an external file, output may be halted for a bounded length of time to accumulate more symbols before sending them all off together.

ibio.parse :: ∀ 'm ('in :: •) 'out α. {module ibio.c m} →
    (∀ 'p. {module ibio.parser.c p} → p in α) → m in out α

This is IBIO's only exposed input subprogram. Operationally, the intent is that executing parse p will specialize p in an online parser type (see also the next section). The program will then schedule input of the maximal prefix of the contents of the current input channel which is recognized by the parser p; it is an error for there to be no such prefix. Input, like output, is performed in the order it is scheduled, and in parallel with on-going execution. parse p returns the parse tree resulting from matching the input scheduled against the parser p. Branching on a portion of the result may block until enough input has been performed to decide which constructor to use for that portion.

IBIO performs input lazily; in other words, only a bounded amount of input will be performed beyond the furthest symbol which has so far been actually required for execution. The exception to this is in the case where input is scheduled, and the result is discarded; in this case the input will be performed eagerly, since doing so does not consume excess memory to hang onto the input.

Parsers

mdoule ibio.parser.c p is a sub-overload of ∀ 's. module monad.c (p s) and ∀ 's. mdoule alternative.c (p s); it also defines the following methods:

ibio.parser.symbol :: ∀ 'p 's. {module ibio.parser.c p} → p s s

This permits input of a single symbol, provided the rest of the parser succeeds.

ibio.parser.match :: ∀ 'p 's. {module ibio.parser.c p} →
    regex.t s → p s [s]

This permits input of a maximal string matching a supplied regular expression; the standard Global Script library may be assumed to supply a reasonable regular expression combinator library for this purpose. (char regular expressions may in Global Script be written qr/re/, where re indicates normal regular expression syntax; IBIO permits match qr/re/ to be abbreviated m/re/ as well.) symbol and match are both considered to match a single ‘token’; in the event of ambiguity, the branch whose initial token is of maximal length will be preferred. In the event this rule fails to eliminate the ambiguity, the length of the next token will be considered, and so on until the ambiguity is eliminated or every token has been considered. It is an error for ambiguity to remain after all accepted tokens in all branches have been accepted.

ibio.parser.eof :: ∀ 'p 's. {module ibio.parser.c p} → p s 〈〉
IBIO makes the assumption that the operating system it is running on lacks an explicit EOF condition; instead, reading from a regular file with the file pointer at EOF successfully reads in an empty buffer and does not move the file pointer. Unix and Plan 9, at least, satisfy this condition. Therefore, when an IBIO input channel is attached to an external file, IBIO keeps track of when a read returns an empty buffer, and refuses to permit matching symbol or match re to read past that point. Instead, the special parser eof is supplied. This matches precisely when symbol would fail to match because input is at a point where an empty buffer was read in; it accepts the empty buffer and, in the event further reads would return data, permits input to proceed past it.

Channels and Re-Direction

type ibio.in.p, ibio.out.p, ibio.socket, ibio.channel :: * → *

These types store input channels, output channels, sockets, and in-process channels, respectively. A ‘socket’ for IBIO's purposes is just a pair of an input channel and an output channel with the same symbol type; it may represent something like one end of a network connection, or it may represent the read and write ends of two in-process channels.

ibio.c.in.local :: ∀ 'm 'in0 'in1 'out 'α. {module ibio.c m} →
    ibio.in.p in0 → m in0 out α → m in1 out α

When an IBIO program begins executing, its input and output channels are attached to the initial process's standard input and standard output. ibio.c.in.local permits a sub-program to have its input attached to another input channel; ibio.c.in.local p a executes a, but with its input attached to p rather than to ibio.c.in.local's input channel.

ibio.c.out.local :: ∀ 'm 'in 'out0 'out1 'α. {module ibio.c m} →
    ibio.out.p out0 → m in out0 α → m in out1 α

ibio.c.out.local permits a sub-program to have its output attached to an input channel other than the program's standard output; ibio.c.out.local p a executes a, but with its output attached to p rather than to ibio.c.out.local's output channel.

overload open a m s :: a → m s

open is, obviously, an incredibly general operation; it is intended to allow input and output channels to be obtained from other types, such as filenames and first-class channels. Major client of open are the high-level re-direction operators:

(ibio.<<) :: ∀ 'in0 'in1 'out 'm 'a 'α.
    {module ibio.c m} →
    {ibio.open a (m in1 out) (ibio.in.p out0)} →
    m in0 out α →
    a →
    m in1 out α
;
(ibio.>>) :: ∀ 'in0 'in1 'out 'm 'a 'α.
    {module ibio.c m} →
    {ibio.open a (m in out1) (ibio.out.p out0)} →
    m in out0 α →
    a →
    m in out1 α
;
(ibio.<>) :: ∀ 'in 'out ('s :: •) 'm 'a 'α.
    {module ibio.c m} →
    {ibio.open a (m ibio.in out) (ibio.socket s)} →
    m s s α →
    a →
    m in out α
;

These permit input, output, or both to be re-directed from/to anything that can be opened at the appropriate type. In particular, to permit input and output to be re-directed from/to channels and sockets, module ibio.c m is a sub-overload of the following overloads:


∀ 'in 'out ('s :: •). ibio.open (ibio.channel s) (m in out) (ibio.in.p s),
∀ 'in 'out ('s :: •). ibio.open (ibio.channel s) (m in out) (ibio.out.p s),
∀ 'in 'out ('s :: •). ibio.open (ibio.socket s) (m in out) (ibio.in.p s),
∀ 'in 'out ('s :: •). ibio.open (ibio.socket s) (m in out) (ibio.out.p s),

ibio.channel :: ∀ 'm 'in 'out ('s :: •). {module ibio.c m} →
    m in out (channel s);
ibio.socket.pair :: ∀ 'm 'in 'out ('s :: •). {module ibio.c m} →
    m in out 〈 0 :: socket s; 1 :: socket s; 〉;

These two functions permit fresh in-process channels and socket-pairs, respectively, to be allocated, allowing for communication between different subprograms and particularly between subprograms running in different threads.

Friday, July 2, 2010

Quick note on imports


    import m.f; -- f must have only a single component
    import module m.n; -- n must have only a single component
    import type m.t; -- t must have only a single component
    import m.(..); -- import everything
    import m.(names); -- import certain names with a single declaration
    import m.(..\names); -- import all but certain names
To do this we basically want a way to say 'gimme a name provided it is not followed by a period' which we can probably do since it needs to be followed by a semicolon instead. We can parse the other cases pretty simply.

Thursday, July 1, 2010

Haskell Package Review: regex-posix

The 15th most popular Haskell library is regex-posix. This library is an official component of the Haskell Platform, and is part of the "batteries included" concept. The following commentary is based primarily on its Haddock documentation; links are provided. regex-posix exports six modules:
  • Text.Regex.Posix — directly exports only one symbol, a constant. Also re-exports instances from the remainder fo the package.
  • Text.Regex.Posix.ByteString — Exports four data types (including newtypes), three type synonyms, nine constants, and three functions, all in the IO monad; acocrding to the documentation, it also provides orphan instances for two type classes, RegexLike and RegexMaker, provided by the regex-base package.
  • Text.Regex.Posix.ByteString.Lazy — Exports exactly the same API as the previous package, but with different orphan instances.
  • Text.Regex.Posix.Sequence — Exports exactly the same API as the previous package, but with different orphan instances.
  • Text.Regex.Posix.String — Exports exactly the same API as the previous package, but with different orphan instances.
  • Text.Regex.Posix.Wrap — Exports four data types (including newtypes), three type synonyms, 21 constants, five functions in the IO monad, one constrained polymorphic function returning a simple value, and one constrained polymorphic function returning a value in an arbitrary monad.
So, in total, the library provides (including duplicates in multiple modules' export lists) 20 data types, 15 type synonyms, 58 constants, 17 functions in the IO monad, one constrained polymorphic function returning a simple value, and one constrained polymorphic function returning a value in an arbitrary monad. A necessary condition for a function to be a combinator is that it have at least one argument of the same type as its result. Of the functions directly exported by this library, only four come close to satisfying this criterion: the four modules with nearly identical signatures all export a function

    regexec :: Regex -> string -> IO (Either WrapError (Maybe (string, string, string, [string])))
for different types string. However, in fact, these functions are not combinators, for the following reasons:
  • The argument is of type string; the result is not. The result strings are in fact nested in a mult-result return wrapped inside three monads.
  • All three monads involved are exception monads, meaning that attempting to un-pack the result of a regexec call may fail to produce any string values at all.
  • The string values returned, if any, are shorter and simpler than the input.
All of these characteristics are decidedly un-combinatorial in nature. The possibility remains that the instances exported by this package include one or more combinators as methods. Since the classes these instances implement are defined by the regex-base package, which these reviews will be reaching in a few days, we defer consideration to the review of that package. In the meantime, we can conclude that there is no evidence that this library is written in a combinatorial, rather than imperative, style.

Wednesday, June 30, 2010

Haskell Libraries: Procedural or Combinatorial

Functional programming is a style of programming that emphasizes the use of functions, especially pure functions. That's boring. It's also not terribly obvious that functional programming, in that sense, is an improvement; going from this:

    int accum = 0, number;
    while (getnumber(&number) != EOF)
        accum += number;
    printf("%d\n", accum);
to this:

    int
    get_accumulated_number(int accum)
    {
        int number;
        if (getnumber(&number) == EOF)
            return accum;
        return get_accumulated_number(accum + number);
    }

    int accum = get_accumulated_number(0);
    printf("%d\n", accum);
is a pretty terrible regression, style-wise. I can't find who to cite this to, but it's been said that tail recursion is the goto of functional programming. So let's forget about using functions instead of loops and try to find a good reason to abandon conventional imperative languages. Here's a better distinction. Imperative programming is a style of programming that emphasizes constructing programs out of sequences of actions for the machine to take. Imperative libraries tend to be collections of re-usable routines that encode commonly-performed sequences of actions; if they take parameters, those parameters tend to be flags controlling conditionals, or pieces of data to be manipulated. If they return results, those results tend to be data calculated produced the machine as it carries out the sub-routines actions. Combinatorial programming is a style of programming that emphasizes constructing programs by composing sub-programs, without limiting how the sub-programs are composed. Combinatorial libraries tend to be collections of re-usable patterns of program composition. The results of their members tend to be sub-programs that can be combined with each other in a variety of ways; their parameters tend to likewise be sub-programs that the function will combine in interesting ways. Combinatorial programming is the style of programming encouraged by the program calculus Global Script. As an example, in the Iron Beach I/O language, which is based on Global Script, the column-sum example might be written as follows:

    for (
        'nums ← parse $ many $ number.parse;
        'total = foldl (+) 0 $ nums;
    ) print $ number.toString total ++ qq{\n}
Note that, unlike both the imperative and functional examples above, this program uses neither recursion nor general looping constructs; instead, combinators such as many and foldl provide special-purpose loops. Haskell advertises itself as a functional programming language. But, as seen above, that's boring. Which style of programming does Haskell encourage, imperative or combinatorial? Do Haskell libraries, and the functions they offer, mostly allow you to re-use canned packages of commands for the machine to carry out? Or do they allow you to compose your own programs in elegant and expressive ways? This is not an idle question: for an academic language, Haskell is a veritable hot-bed of library writing. The standard Haskell package distribution channel, known as Hackage, now has:
  • 2,182 packages, written by
  • 575 separate programmers, uploading
  • 12 new packages per day, with a total of
  • 130,000 downloads per month.
It's time for a reckoning: is all that programming done in a combinatorial style, or is Haskell — in practice — just another conventional language with funny syntax? Over the next week, I will examine the top 15 libraries, by total recent downloads, and attempt to classify the APIs they offer as combinatorial or imperative. My thesis is that Haskell development has, in the last 5-10 years, gone from a primarily combinatorial style to a primarily imperative one; as such, I will attempt to classify each library into one of two categories:
  1. Legacy Haskell: more than 5-10 years old.
  2. Imperative Haskell.
For the record, here are the most popular 15 libraries on Hackage, according to Haskell god Don Stewart: I will examine each of these in turn, primarily looking for evidence against my thesis that Haskell programming has become primarily imperative in nature.

Sunday, May 23, 2010

Re-thinking imports

Global Script has a short import syntax that looks like this:

  import m.f; -- > 'f = m.f;
  import type m.t -- > type 't = m.t;
  import module m.n -- > module 'n = module m.n;
I quite like this syntax; it has the nice bonus that, if you do need a string of imports, by formatting them this way:

   import xml.parser.attribute;
   import xml.parser.cdata;
   import xml.parser.element;
you can alphabetize the import list by piping the entire code segment by M-2 on |sort. As an added bonus, this respects module names as well. This is fairly straight-forward to parse: take a name after the import keyword, check that it has multiple components, split it into a module name and a field name on the final component division, check that the field name is a valid function name, and store the module name and function name separately. This syntax really only doesn't work for two cases:
  • Importing everything from a module, and
  • Importing a qualified name from a module.
For the second case, we can just say the programmer can suck it up and write:

  'n.f = m.n.f;
For the first case, we can use the fact that spaces are never legal in identifiers to permit a syntax like:

  import m ..;