Wednesday, August 11, 2010

The Iron Beach I/O Language, Part I

This is a Global Script-based concurrent procedural stream I/O language. The IBIO-specific portion of its standard library will be detailed in this post and a following one; subsequent posts will explain the concurrency and exception libraries (which are not IBIO-specific), and there will probably be a final follow-up post.

Rather than define a specific type which programs must have, IBIO provides a set of overloads and allows programs to be polymorphic over them. This allows IBIO programs to work more easily with monad transformers, and also allows other Global Script-based languages to support running IBIO programs and sub-programs (and, hence, libraries).

Basic Functions

overload module ibio.c m

The most important overload is module ibio.c m. An instance of this overload declares that subprograms of the type m in out α are capable of performing stream input of symbols of type in and stream output of symbols of type out; it is a sub-overload of ∀ 'in 'out. module monad.c (m in out).

ibio.print :: ∀ 'm 'in ('out :: •). {module ibio.c m} →
    [out] → m in out 〈〉

This is IBIO's only exposed output subprogram. Operationally, the intention is that executing print xn schedules output of xn to the current output channel; outputs scheduled will be performed, in the order they were scheduled, in parallel with on-going execution. The list xn is operationally a stream, not a buffer, so some values may not be available immediately; output (but not execution) will be halted until the next symbol needed is available. In general, symbols will be output as soon as they are available; however, if the current output channel is attached to an external file, output may be halted for a bounded length of time to accumulate more symbols before sending them all off together.

ibio.parse :: ∀ 'm ('in :: •) 'out α. {module ibio.c m} →
    (∀ 'p. {module ibio.parser.c p} → p in α) → m in out α

This is IBIO's only exposed input subprogram. Operationally, the intent is that executing parse p will specialize p in an online parser type (see also the next section). The program will then schedule input of the maximal prefix of the contents of the current input channel which is recognized by the parser p; it is an error for there to be no such prefix. Input, like output, is performed in the order it is scheduled, and in parallel with on-going execution. parse p returns the parse tree resulting from matching the input scheduled against the parser p. Branching on a portion of the result may block until enough input has been performed to decide which constructor to use for that portion.

IBIO performs input lazily; in other words, only a bounded amount of input will be performed beyond the furthest symbol which has so far been actually required for execution. The exception to this is in the case where input is scheduled, and the result is discarded; in this case the input will be performed eagerly, since doing so does not consume excess memory to hang onto the input.

Parsers

mdoule ibio.parser.c p is a sub-overload of ∀ 's. module monad.c (p s) and ∀ 's. mdoule alternative.c (p s); it also defines the following methods:

ibio.parser.symbol :: ∀ 'p 's. {module ibio.parser.c p} → p s s

This permits input of a single symbol, provided the rest of the parser succeeds.

ibio.parser.match :: ∀ 'p 's. {module ibio.parser.c p} →
    regex.t s → p s [s]

This permits input of a maximal string matching a supplied regular expression; the standard Global Script library may be assumed to supply a reasonable regular expression combinator library for this purpose. (char regular expressions may in Global Script be written qr/re/, where re indicates normal regular expression syntax; IBIO permits match qr/re/ to be abbreviated m/re/ as well.) symbol and match are both considered to match a single ‘token’; in the event of ambiguity, the branch whose initial token is of maximal length will be preferred. In the event this rule fails to eliminate the ambiguity, the length of the next token will be considered, and so on until the ambiguity is eliminated or every token has been considered. It is an error for ambiguity to remain after all accepted tokens in all branches have been accepted.

ibio.parser.eof :: ∀ 'p 's. {module ibio.parser.c p} → p s 〈〉
IBIO makes the assumption that the operating system it is running on lacks an explicit EOF condition; instead, reading from a regular file with the file pointer at EOF successfully reads in an empty buffer and does not move the file pointer. Unix and Plan 9, at least, satisfy this condition. Therefore, when an IBIO input channel is attached to an external file, IBIO keeps track of when a read returns an empty buffer, and refuses to permit matching symbol or match re to read past that point. Instead, the special parser eof is supplied. This matches precisely when symbol would fail to match because input is at a point where an empty buffer was read in; it accepts the empty buffer and, in the event further reads would return data, permits input to proceed past it.

Channels and Re-Direction

type ibio.in.p, ibio.out.p, ibio.socket, ibio.channel :: * → *

These types store input channels, output channels, sockets, and in-process channels, respectively. A ‘socket’ for IBIO's purposes is just a pair of an input channel and an output channel with the same symbol type; it may represent something like one end of a network connection, or it may represent the read and write ends of two in-process channels.

ibio.c.in.local :: ∀ 'm 'in0 'in1 'out 'α. {module ibio.c m} →
    ibio.in.p in0 → m in0 out α → m in1 out α

When an IBIO program begins executing, its input and output channels are attached to the initial process's standard input and standard output. ibio.c.in.local permits a sub-program to have its input attached to another input channel; ibio.c.in.local p a executes a, but with its input attached to p rather than to ibio.c.in.local's input channel.

ibio.c.out.local :: ∀ 'm 'in 'out0 'out1 'α. {module ibio.c m} →
    ibio.out.p out0 → m in out0 α → m in out1 α

ibio.c.out.local permits a sub-program to have its output attached to an input channel other than the program's standard output; ibio.c.out.local p a executes a, but with its output attached to p rather than to ibio.c.out.local's output channel.

overload open a m s :: a → m s

open is, obviously, an incredibly general operation; it is intended to allow input and output channels to be obtained from other types, such as filenames and first-class channels. Major client of open are the high-level re-direction operators:

(ibio.<<) :: ∀ 'in0 'in1 'out 'm 'a 'α.
    {module ibio.c m} →
    {ibio.open a (m in1 out) (ibio.in.p out0)} →
    m in0 out α →
    a →
    m in1 out α
;
(ibio.>>) :: ∀ 'in0 'in1 'out 'm 'a 'α.
    {module ibio.c m} →
    {ibio.open a (m in out1) (ibio.out.p out0)} →
    m in out0 α →
    a →
    m in out1 α
;
(ibio.<>) :: ∀ 'in 'out ('s :: •) 'm 'a 'α.
    {module ibio.c m} →
    {ibio.open a (m ibio.in out) (ibio.socket s)} →
    m s s α →
    a →
    m in out α
;

These permit input, output, or both to be re-directed from/to anything that can be opened at the appropriate type. In particular, to permit input and output to be re-directed from/to channels and sockets, module ibio.c m is a sub-overload of the following overloads:


∀ 'in 'out ('s :: •). ibio.open (ibio.channel s) (m in out) (ibio.in.p s),
∀ 'in 'out ('s :: •). ibio.open (ibio.channel s) (m in out) (ibio.out.p s),
∀ 'in 'out ('s :: •). ibio.open (ibio.socket s) (m in out) (ibio.in.p s),
∀ 'in 'out ('s :: •). ibio.open (ibio.socket s) (m in out) (ibio.out.p s),

ibio.channel :: ∀ 'm 'in 'out ('s :: •). {module ibio.c m} →
    m in out (channel s);
ibio.socket.pair :: ∀ 'm 'in 'out ('s :: •). {module ibio.c m} →
    m in out 〈 0 :: socket s; 1 :: socket s; 〉;

These two functions permit fresh in-process channels and socket-pairs, respectively, to be allocated, allowing for communication between different subprograms and particularly between subprograms running in different threads.