Archive

Archive for the ‘Scheme’ Category

My third Lisp books

December 29, 2007 Leave a comment

In time passed since my last post I was able to read two books: “Scheme programming language” by K. Dubvig and “Structure and interpretation of computer programms” by Abelson and Sussman, and whould like to share my opinion on them. Both books are freely available on Web, so anyone can read them.

“Scheme programming language” is not a very good book. While it describes a lot of features of scheme language, most of explanations too short, not deep and do not help to understand. So, for beginners this book is too complex, for advanced people this book is not interesting.

“SICP” is completelly different story. This book is brilliant! However, reader should be ready that book is about computer science, and not dedicated to Scheme as computer language. This means that book is concentrated on problems which arise in programming independently of language used. But it is important that authors show deep logical issues in usage of each element of Scheme for particular problem. So, reading this book will not make reader a specialist in tricky areas of Scheme (this book, for example, totally misses macroses), but instead it will make reader a specialist in logic, algorithms, program design and will show how in best way use basics of Scheme. Scheme is a simple language, and even it’s basics are enough in lots and lots of cases, so this book is very practical. Recommended to everybody!

Advertisements
Categories: Lisp, Scheme

List abbreviations

November 23, 2007 1 comment

Since List is a basic and very usefull data structure of Lisp, so there are several ways to create lists. I whould like to discuss some subtle details of the subject.

Book “How to design programs” at Section 13 Intermezzo 2 says that
‘(1 2 3)
abbreviates to
(list 1 2 3)
but later it says that
‘(a b c)
abbreviates to
(list ‘a ‘b ‘c)
See the difference ? symbols are quoted, and numbers are not! Why that difference ?

It turns out that a single approach could be taken: to quote everything. It works, because quoting a number doesn’t produce a symbol, but a number itself, so (+ ‘1 ‘2) evaluates to 3.

More interesting issue is that ‘(a b c) is not perfectly the same as (list ‘a ‘b ‘c). In my interpreter, I’ve made it completelly different. For parser, first expression is a sequence of quote and a list of three elements, and second is a list of four elements.

Evaluation is also different. For first expression, quote allows evaluator to treat the following item as data, so a list of three symbols created by parser is simply re-used. Second expression is a procedure appliance, so first arguments are evaluated, then they are passed to list procedure, which returns a list of three elements. Then evaluator uses first expression in form returned by parser, it contains raw parser data: symbols and lists, and each time it evaluates symbol it first checks if symbol should be converted into number. This is a good test:
(symbol? (car ‘(1)))
evaluates to #f

My implementation handles this expression in following way: parser creates list for expression text, which consists of symbols and other lists. Then evaluator starts evaluating expressions. quote returns list generated by parser, which contains symbol “1”, then car extracts that symbol, and then procedure symbol? checks if it is a number.

Second approach, which is probably better, whould be to let parser recognize numbers. It should be more performant, then to convert symbol to number every time it is accessed. Third approach is to perform conversion once in evaluator, then replace symbol with number in that list. However, this last approach doesn’t eliminate checking if symbol is a number every time real symbol is accessed

So, I see three implementation strategies, which provide similar results:

  • Parser builds representation for expression from lists, symbols and numbers. Quote simply returns that list
  • Parser builds representation for expression from lists and symbols. Quote simply returns that list. Symbols are converted into numbers then they are accessed.
  • Parser builds representation for expression from lists and symbols. Quote treats lists in special way (recursively)
Categories: Lisp, Scheme

Unifying define-struct

November 16, 2007 Leave a comment

My initial impression of Lisp in whole and Scheme in particular is that lists are used as general-purpose data structure. But sometimes it could be more convenient to access structure member by name, not just by “cddar”. Writing accessor functions manually is long and boring way. While reading HtDP I discovered “structures” – the thing I was looking for. Declaring structure automatically generates constructor and selector functions. However, structures are not portable between Scheme implementations. Compare:

Scheme implementation syntax
PLT (define-struct structname (field1 field2))
ChezScheme (define-structure (structname field1 field2))
Gambit (define structure name field1 field2)

So, writing portable code which uses structures was not possible for me.

Later I understood that “define-struct” is often mplemented using Scheme macro facility. It automatically generates code for constructor and selector functions which you otherwise whould write yourself, so it is just a “productivity feature” on top of Lisp basic primitives. But, since “define-struct” is a macro, then I also can use macros to make a common definition for “define-struct”!

I decided to use PLT syntax as common. After that I needed some macroses which will convert my common syntax to syntax which already implemented on each platform. Of course, PLT doesn’t require any special transformer. So, I’ve started to learn macro language. I can’t say I became a master, but my aim is achieved.

Here is a R5RS-style macro for ChezScheme, which works fine:

(define-syntax define-struct
  (syntax-rules ()
    ((define-struct name (fields …)) (define-structure (name fields …)))
  )
)

Unfortunatelly, my macro for Gambit is not working:

(define-syntax define-struct
  (syntax-rules ()
    ((define-struct name (fields …)) (define-structure name fields …))
  )
)

Problem with Gambit is that it doesn’t support define-syntax natively. You need to (load “syntax-case”). But then aboveshown macro is executed, it reports error which I don’t know how to fix.

Gambit also has Common Lisp-like macro facility called “define-macro”. After some time I came with following, which seems to work:

(define-macro (define-struct name fields)
  (cons ‘define-structure (cons name fields))
)

Now I have unified syntax, and can write code which will work on any of those implementations.

Categories: Scheme

Spirit of Lisp

October 8, 2007 Leave a comment

While working on my “Scheme interpreter made in Java” I often thought about Lisp language itself. My interpreter is very small and simple, yet very powerfull, so I was trying to understand: where lies the core of it’s power. Since I’m learning OCaml now it is interesting to compare both languages, and to understand what makes Lisp different. At the beginning there were a lot of confusion in my head, so this article is an attempt to bring an order in my chaotic thoughts.

People often hear about very important feature of Lisp which is “code is data”. But, my impression is that this feature is rather secondary. My current interpreter doesn’t have any macro facility yet, it even doesn’t have car and cdr, and it’s still very capable. So, I think that core of the Lisp lies elsethere, not even in ability to process lists.

I think that Lisp was started as a functional programming language, so Lisp programs should be a set of function definitions and one expression to evaluate. Expression is a combination of function invocations, where result of one function is passed as argument to another. A distinguishing feature of Lisp is that function invocation is written as (func arg1 arg2 … arg3) instead of mathematical notation func(arg1, arg2, …, arg3). Mathematical notation is preferrable if used in expressions like y=func(arg1, …), but Lisp’s notation is preferrable if result of one function is directly passed to another : (func1 (func2 argX) argY). Lisp’s notation is also preferrable if function itself should be evaluated: ((lambda (x) (+ x 1)) 2). Compare with mathematical-like notation used in C for invocation of function referenced by pointer: (*func) (2, 5), which is more confusing for me.

Then I started learning Lisp, lambda looked like a magic for me: function is “produced”. Now it’s more simple: lambda is just a delayed expression. My implementation doesn’t analyze “expression” part of lambda, it is just stored as is. Then function returned by lambda is evaluated, it my implementation just maps function arguments to their values in environment, then evaluates lambda-expression. Other implementations may perform some optimizations, but they are not nesessity. If runtime itself supports recursion, then lambda will be automatically recursive.

Parentheses-based syntax, together with define, lambda and cond provide expressive, simple and clean functional core which is, in my opinion, a spirit of Lisp. My current inperpreter has implemented just that core, and I believe that provided language could already be called “Lisp”. It is definetely not OCaml because of syntax and also because expressions are evaluated dynamically. Dynamic evaluation makes implementation very simple. Note that there are no single mention of lists in language defined above: parentheses are used only to group functions and their arguments, in other words: to define structure of expression. Implementation decides how to represent parsed expression structure internally. I use ArrayLists, for example.

For more complex programs language should provide compound data structures. There are two basic choices: arrays and lists. For funcitonal programming lists are preferrable. “cons” creates lists, “car” returns content, “crd” advances to the tail of list. For functional programming sharing of data is safe. Introduction of lists also means that some memory management like garbage collection should be provided

Now I come to the main source of confusion. Then Lisp prints list objects, it uses the same parentheses-delimeted format as for entering expressions. I don’t know who made that choice, but it made Lisp both powerful and confusing. Next step was (quote), which allows to input list structure using the same syntax as used for output of lists, which is the same syntax as used for specifying program structure. From this very moment it becomes clear that single parser is used to create data lists and program, so internal representation of program and lists are the same. Starting from this moment it is possible to write “eval” in Lisp: parser will get parentheses-separated expression as data and will turn it into internal representation, then list-based functions will allow to traverse that expression. And this is a foundation of macros.

All written above is just my current opinion. Now I start to remember that good books like “HtDP” avoid mixing lists and expressions. Bad books such as “The Scheme programming language” start mixing lists and expressions very early. I think it is important to understand the difference, because it makes all things simple and straitforward. Paul Graham also thinks the same.

Categories: Lisp, Scheme

Scheme interpreter extension

September 27, 2007 1 comment

I’ve spent one more day on extending my Scheme interpreter. I’ve added booleans, cond, list, length, -, = , so now it is quite capable. For example, now I can write a recursive function which will calculate sum of numbers starting from 1 up to provided number. Internally code was refactored quite a bit, except of ListTools, which remained exactly the same. I’ve also made a tracing facility, which was surprisingly easy to implement.
Very important discovery for me is that now I understand Scheme language much better. It’s a general rule: to use something effectively you should understand how it works. Implementation and description are the same.
Now I want to summarize my results so far. Scheme interpreter consists of two parts: REPL (which includes reader and result printer) and Evaluator. Reader invokes ListParser, which parses user’s input presented as sequence of characters into list structure, so elements of list could be character strings or other lists. Then result of parsing is passed to evaluator. Evaluator evaluates all items in top-level list, and returns list of result values, which are printed by result printer. If error occurs in ListParser or Evaluator, it is also printed.
Since Evaluator is provided with representation of user’s input, it can distinguish raw symbolic data and lists. Raw symbolic data are evaluated by checking if they are numeric, symbolic or boolean literals, and if not they are considered variable names and their values is looked up in environments. Lists are first checked to be special syntax forms (like define, cond, lambda), and if not, then first item is evaluated. If first item is a procedure, then arguments are evaluated and then procedure body is evaluated.
I think that simplicity of language itself and it’s implementation comes from two things: simple syntax and dynamic nature.
Now I use ArrayList to represent lists. I can implement cons, but it will not provide true sharing, it will make copies instead. But users will see no difference until set-car! and set-cdr! are implemented. Later I will implement cons sells, and will make List implementation based on them.
Making Scheme in Java is a simple task, because both languages have garbage collection. I’m afraid that implementation in Z80 assembler will be more diffucult.

Categories: Java, Lisp, Scheme

Scheme interpreter in two days

September 26, 2007 Leave a comment

Then I started to learn Scheme, I often saw people writing that Scheme interpreter could be build in several days. Such writings always make me feel stupid, because I can’t even think where to start writing Scheme interpreter. Finally, I decided to try and make a Scheme interpreter in Java, and it was unexpectingly easy! In two days I was able to make a very limited implementation, which is a good base for further improvement.
I’ve started my implementation from REPL, a loop which reads a line, evaluates the expression and prints results. My first day was not very productive in terms of code, but I did lots of thinking. Most diffucult task was to design parser for Lisp notation, which was very buggy.
On second day I’ve re-written my parser, and implemented evaluator, which was quite easy. Here are my results.
ListTools class performes list parsing and printing. Main idea is that parentheses are not part of list which is parsed, they are part of parent expression.

public class ListTools {
public static void printList(List list, Appendable out) throws IOException {
out.append(‘(‘);
for (Iterator it = list.iterator(); it.hasNext();) {
out.append(it.next().toString());
out.append(‘ ‘);
}
out.append(‘)’);
}
public static String printList(List list) {
StringBuffer sb = new StringBuffer();
try {
printList(list, sb);
} catch (IOException e) {
throw new RuntimeException(e);
}
return sb.toString();
}
public static int parseList(List list, CharSequence cs, int start) throws ListSyntaxError {
boolean inWord = false;
int elementStart = start-1;
int elementEnd = start-1;
int i = 0;
for (i = elementStart + 1; i = cs.length() || cs.charAt(i) != ‘)’) throw new ListSyntaxError(“List incomplete”, -1);
} else if (c == ‘)’) {
if (inWord) {
list.add(cs.subSequence(elementStart, i ));
}
return i;
} else {
if (!inWord) {
elementStart = i;
}
inWord = true;
}
}
if (inWord) {
list.add(cs.subSequence(elementStart, i));
}
return ++i;
}
public static List parseList(CharSequence cs) throws ListSyntaxError {
List result = new ArrayList();
int i = parseList(result, cs, 0);
if (i < cs.length() && cs.charAt(i) != ‘)’) throw new ListSyntaxError(“Unexpected ‘)’ at:” + i, i);
return result;
}
public static boolean isWhitespace(char c) {
return c == ‘ ‘ || c == ‘\t’ || c == ‘\r’ || c == ‘\n’;
}
}

And this class uses dedicated exception:

public class ListSyntaxError extends Exception {
public final int position;
public ListSyntaxError(String message, int pos) {
super(message);
position = pos;
}
}

Now I’ll show an evaluator:

public class Evaluator {
public static List evalAll(CharSequence cs, Map environment) throws ListSyntaxError {
List results = new ArrayList();
List expressions;
expressions = ListTools.parseList(cs);
for (Iterator it = expressions.iterator(); it.hasNext();) {
Object elem = it.next();
try {
Object result = eval(elem, environment, true);
if (result != null) results.add(result);
} catch (EvalError e) {
System.out.println(e.getMessage());
}
}
return results;
}
public static Object eval(Object elem, Map environment, boolean topLevel) throws EvalError {
Object result;
if (elem instanceof CharSequence) {
result = eval((CharSequence)elem, environment, topLevel);
} else if (elem instanceof List) {
result = eval((List)elem, environment, topLevel);
} else {
throw new EvalError(“Unknown type”);
}
return result;
}
public static Object eval(CharSequence cs, Map environment, boolean topLevel) throws EvalError {
// try numeric conversion
try {
int val = Integer.parseInt(cs.toString());
return String.valueOf(val);
} catch (NumberFormatException e) {
// do nothing
}
// check if symbol
if (cs.charAt(0) == ‘\”) {
return cs.subSequence(1, cs.length());
}
// now try to lookup in environment
Object result = environment.get(cs);
if (result == null) {
throw new EvalError(“Unbound variable: ” + cs);
} else {
if (result instanceof CharSequence) {
return (CharSequence)result;
} else if (result instanceof List) {
return ListTools.printList((List)result);
} else if (result instanceof Procedure) {
return result;
} else {
throw new EvalError(“Unknown type”);
}
}
}
public static Object eval(List list, Map environment, boolean topLevel) throws EvalError {
if (list.size() == 0) {
// hack from CL, null list evals to itself
return “()”;
} else {
Object car = list.get(0);
if (car instanceof CharSequence) {
if (car.toString().equals(“define”)) {
if (!topLevel) throw new EvalError(“invalid context for definition”);
if (list.size() != 3) throw new EvalError(“Wrong number of arguments for define”);
Object value = eval(list.get(2), environment, false);
if (value == null) throw new EvalError(“Argument could not be evaluated”);
environment.put(list.get(1), value);
return null;
} else if (car.toString().equals(“lambda”)) {
if (list.size() != 3) throw new EvalError(“Wrong number of arguments for lambda”);
if (!(list.get(1) instanceof List)) throw new EvalError(“Arguments for lambda should be lists”);
if (!(list.get(2) instanceof List)) throw new EvalError(“Arguments for lambda should be lists”);
return new CustomProcedure(((List)list.get(1)), environment, ((List)list.get(2)));
}

}
Object result = eval(car, environment, false);
if (result instanceof Procedure) {
return ((Procedure)result).eval(list, environment);
} else {
throw new EvalError(“Object is not a procedure”);
}
}
}
}

Evaluator also has a special exception:

public class EvalError extends Exception {
public EvalError(String message) {
super(message);
}
}

As you can see, Scheme procedures are implemented as instances of class Procedure:

public abstract class Procedure {
public Object eval(List arguments, Map environment) throws EvalError {
return null;
}
}

A good example of procedure is a “+”:

public class AddProcedure extends Procedure {
public Object eval(List arguments, Map environment) throws EvalError {
if (arguments.size() < 3) throw new EvalError(“Wrong number of arguments for +”);
int result = 0;
for (int j = 1; j < arguments.size(); j++) {
Object o = arguments.get(j);
try {
String val = Evaluator.eval(o, environment, false).toString();
if (val == null) throw new EvalError(“Argument could not be evaluated”);
result += Integer.parseInt(val);
} catch (Exception e) {
throw new EvalError(“Wrong argument for +”);
}
}
return String.valueOf(result);
}
}

Evaluation of “lambda” leads to creation of new instance of CustomProcedure:

public class CustomProcedure extends Procedure {
private final Map environment;
private final List expression;
private final List argNames;
public CustomProcedure(List argNames, Map environment, List expression) {
this.argNames = argNames;
this.environment = environment;
this.expression = expression;
}
public Object eval(List funcApplicationExpr, Map environment) throws EvalError {
Map copyEnv = new HashMap(this.environment);
Map arguments = putArguments(funcApplicationExpr);
copyEnv.putAll(arguments); // they override
return Evaluator.eval(expression, copyEnv, false);
}
private Map putArguments(List funcApplicationExpr) throws EvalError {
Map arguments = new HashMap();
int i;
for (i = 0; i = (funcApplicationExpr.size() – 1)) throw new EvalError(“Number of arguments is too small!”);
CharSequence argName = (CharSequence)argNames.get(i);
Object argValue = funcApplicationExpr.get(i + 1);
arguments.put(argName, argValue);
}
if (i ‘);
while (true) {
int size = in.available();
if (size > 0) {
byte[] data = new byte[size];
in.read(data);
sb.append(UTF_8_CHARSET.decode(ByteBuffer.wrap(data)));
List results = null;
try {
results = Evaluator.evalAll(sb, environment);
} catch (ListSyntaxError e) {
if (e.position == -1) {
continue;
} else {
System.out.println(e.getMessage());
sb.setLength(0);
out.print(‘>’);
continue;
}
}
print(results);
if (isEnd(results)) break;
sb.setLength(0);
out.print(‘>’);

} else {
Thread.sleep(100);
}
}
} catch (Exception e) {
e.printStackTrace();
}
}
public void print(List data) {
for (Iterator it = data.iterator(); it.hasNext();) {
out.println(it.next().toString());
}
}
public boolean isEnd(List data) {
for (Iterator it = data.iterator();it.hasNext();) {
if (it.next() instanceof ExitProcedure) {
return true;
}
}
return false;
}
public static Map getDefaultEnvironment() {
Map env = new HashMap();
env.put(“+”, new AddProcedure());
env.put(“exit”, new ExitProcedure());
return env;
}
}

This REPL could be started from main(), or from any Thread, which, for example, listens some socket.

This implementation is, of course, very limited and not optimal, but look what it is able to do!

>(+ 1 2)
3
>(+ 10 11 12)
33
>(define a 1)
>(+ a 2)
3
>(define b 2)
>(+ a b)
3
>(define b a)
>(+ a b)
2
>(define a 0)
>(+ a b)
1
>(define add (lambda (x) (+ x 5)))
>(add 1)
6
>(define add-gen (lambda () (lambda (y) (+ y 1))))
>((add-gen) 1)
2
>

My next steps whould be to implement =, eq?, quote, let, cond, booleans, cons, car and cdr. I also see how integers could be optimized.

By posting this I don’t claim that my code is perfect. To be honest, I just don’t have any place to put my code, so I’m putting it here. I’m not going to compete with other implementations of Scheme in Java, such as SISC or Kawa, I’m just learning and playing. My final goal will be to do my implementation in OCaml, and in Assembler for ZX Spectrum.

Categories: Java, Lisp, Scheme

Value capture in Lisp

September 14, 2007 2 comments

In one of my previous posts I’ve told about how (define) works and how (lambda) works. Main difference is that (define) evaluates second argument, and (lambda) doesn’t evaluate anything, it just binds all variables to specific scope. Variables could even be defined after definition of (lambda)! See following example (in Scheme):

>(define f (lambda (x) (+ x y)))

>(f 5)

Error: variable y is not bound.

>(define y 5)

>(f 5)

10

In this example, x is bound to scope of function’s arguments, and y is bound to local scope. Each time the function is applied, it tries to use (evaluate) variable in initially defined scope.

The story continues in OCaml. Now I’m reading a book about this language, and I was surprised to know that it is implemented exactly in the way I was mistakenly thinking about Lisp earlier. So, function definition in OCaml doesn’t capture the scope, it captures the value! Look:

# let a = 10;;
val a : int = 10
# let add2a x = a + x ;;
val add2a : int -> int = <fun>
# add2a 5;;
– : int = 15
# let a = 20;;
val a : int = 20
# add2a 5;;
– : int = 15

So, we have fundamental difference between OCaml and Lisp. I don’t know which behaviour is better, and I don’t bother. I just know how it will behave in both languages. But, my next idea was: is it possible to achieve value capture in Lisp, and scope capture in OCaml ?

Value capture in Lisp is, of course, possible. It is done by using (lambda) inside (let). Look:

>(define x 5)

>(define f
    (let
        ((x1 x))
      (lambda (y) (+ x1 y)))
    )

>(f 5)

10

>(define x 10)

>(f 5)

10

Let allows us to create local scope, and each clause in let is like (define), so it evaluates variable. In this way we achieve value capture in scope accessible only from body of function.

Regarding OCaml: I’m sure that “scope capture” will not work. The reason is that OCaml, unlike Lisp, is a language with static type checking. Since any variable could be re-defined with different type after funciton definition, OCaml implements value capture to avoid type errors.

Categories: Lisp, OCaml, Scheme