Archive for April, 2008

Critique of Java NIO frameworks

April 21, 2008

Starting from version 1.4 Java has a new library called NIO. Everyone who writes scalable network software uses this library because it supports non-blocking network operations. Without this library, you will have a separate Thread for each connection, and Java doesn’t scale well with threads. Of course, this library is not just about non-blocking network operations. It also accelerates access to files, and has whole architecture for byte-to-char conversion and back.

Non-blocking IO is slightly harder to understand than blocking IO, but not too much. However, there are people who think that NIO is too low-level. They also think that since NIO was created to have a threading model different than “one thread per socket”, so it is a good idea to provide a threading model out-of-box. These people have written NIO frameworks. Let’s name some of them:

All those project pretend that they are “easy to use” because they “hide the complexity of plain NIO”. Is it really so ? I don’t think.

All mentioned libraries try as much as possible to hide selector, channels and bytebyffers from you. Instead, they propose “chain of responsibility” pattern, consisting of diffrent “protocol stages”. Processing on each stage could be synchronious or asynchronious.

Main problem with these frameworks is that they are too generic. They wrap 5 NIO classes with 50 their own classes. Yes, authors have analyzed >20 use cases and have created libraries which supports them all. Now, instead of manually using plain NIO in a specific way you need, you’ll have to analyze how your particular case fits into this framework, then to find out a dozen of places where you need to put pieces of your code so they will be executed in a correct sequence. They provide all the “glue” for you, so just put your “diamonds”, and overall thing will pretend to be an “iron”.

My opinion is that these frameworks should be avoided. Using them makes the code bloated and hard to understand. Better to learn NIO once. Yes, I myself often forget to flip byte buffer after reading from channel, but I prefer plain code to raviolli code. All those libraries are perfect examples of Java developers loving to produce frameworks. And this is just a part of more generic case of OO developers to love design patterns and templates.

And, final advice: if possible, don’t use NIO, use plain IO instead. If you have a single socket, then there are no reason to use Selector. If your thread should wait for the answer, make it blocking on read operation, instead of making it wait on a queue.

Update: subsequent post shows that I’ve changed my preferences from plain IO to blocking NIO. But my dislike to NIO frameworks is still here.

Explanation of concurrent programming. Part 3: implementation

April 11, 2008

At this moment we have covered some of the theory of concurrent programming, and even given some examples of it. Now let’s discuss how does this work inside.

We’ll start from scheduler. There are two common approaches: cooperative tasks, which give away control themselves, and pre-emptive task switching, where scheduler has a power to stop execution of any task at any moment. For interpreted language a scheduler is easier to implement as a part of interpreter: it can just count instructions and switch to another task then number of processed instructions reached some threshold. Making scheduler a part of interpteter means that scheduler works on lower abstraction level (and on higher capability level) then logic of tasks. For tasks which are executed as machine code a scheduler is also implemented in machine code, because there are no lower abstraction level available. When scheduler and tasks work at same abstraction layer, then switching happens in following way: at some point an execution should be interrupted and transferred to scheduler itself, and scheduler will run another task.

Usually this is achieved using build-in facility of CPU for called “interrupt”. Originally this facility was designed for on-demand processing for events on external devices. Interruption takes place when device sends electrical signal to CPU. Since one of those devices could be a timer, interrupt facility could be used for interrupting a task. So, even a scheduler is programmed in machine code, an interruption facility is a part of CPU, so it is on lower abstraction level.

Low-level scheduler for CPU-native tasks is usually a part of OS. Main reason is that OSes already use interruption facility of CPU for I/O operations, so giving to a user program an ability to use (and misuse) interruption facility whould be dangerous. Another reason is that OSes follow a design philosophy that all ”system-level” code should be a part of an OS. As a side effect of including a scheduler, OS itself takes advantage of multi-tasking by splitting itself into several tasks.

Now let’s talk about task interaction facilities. Message-passing facilities are usually a part of OS, because it requires memory manipulations, which is usually done  by OS. However, it is not nesessary.

We alrady explained that If tasks use unprotected shared memory, then they should use some facilities to ensure that shared data are consistent.

Some shared objects of small size could be realized without any facility at all, by using “atomic” operations of CPU. To understand the importance of atomic operations, let’s use classical “array traversal” code as an example:

int[] a;  // shared
int i = 0; // shared
for (i = 0; i < size; i++) { /* do something with a[i] */ }

This code works well in single-tasking environment, but if we want to run several instances of this loop simultaneously, we’ll get in trouble. This code doesn’t protect several tasks from processing same array element several times. Let’s try to modify this code:

int[] a; // shared
int i = 0; // shared
for (i = 0; i++ < size; ) { /* do something with a[i-1]*/ }

Here we have tried to increase an index just after check so other instance of task will process next element. However, it will not always work as expected, because check and increase are two separate operations. The following sequence is possible: taskA checks i, task B checks i, taskA increases i, taskB increases i. To solve this problem, we need atomic “check and increase” operation. Modern CPUs provide such operations. Using them is a recommended way for small objects, because such operations are fast and don’t depend on OS.

Achieving atomic behaviour for larger objects is easy once we have atomic operations for small objects. We just need a small “flag” for each large object. Task which modifies an object should do those four steps:

  1. Check if flag is in “ready” state. Repeat until this condition is satisfied.
  2. If yes, then switch flag into “object is being modified” state. Note that checking and switching should be performed as single atomic operation.
  3. Read/modify change large object as needed.
  4. Finally put flag back into “ready” state.

This flag is usually called “the lock”, because it controls access to some other object. It’s states are called “locked” and “unlocked”, and operations are called “try to lock” and “unlock”.

In lots of multi-tasking implementations locks are used for ad-hoc critical sections. If task is unable to lock an object because it was locked by another task, then it voluntarily “spins around” the lock repeatedly trying to lock it. So, the task which has locked an object is virtually uninterruptible: scheduler will switch between tasks as usual, but other tasks virtually do nothing, they just spin around the lock.

To summarize abovementioned: usage of unprotected shared memory is possible without ineraction with scheduler, so without any OS-related facility. But sometimes using an OS could be good.

SInce spinning around a lock is just a waste of time, it is a good idea to put all waiting tasks to sleeping, and wake them up then object will be unlocked. A usual approach is to have a list of pending tasks for each lock. If object is locked, then current task is added to the list, and scheduler is asked to put this task asleep. Then lock is released, then the scheduler is asked to wake up all pending tasks. Since interaction with scheduler is nesessary, such “blocking locks” should be implemented as part of an OS.

An ability to wake up a particular sleeping task is useful regardless locks, so it is also available through OS.

Let’s summarize it all. A cornerstone of multi-tasking implementation is a scheduler, which is usually a part of an OS. Simplest form of inter-task communication is through unprotected shared memory. Data consistence could be achieved by using atomic operations or spinlocks without dealing with scheduler. If objects are locked for a long time, it is a good idea to use blocking locks, which are available from OS.

Further reading:

Explanation of concurrent programming. Part 2: examples

April 4, 2008

In the previous part we have covered a lot of theory, so in this part we will take a look at real-world examples of concurrent environments and will try to see how well they will fit with our theory.

Classical Unix environment could be considered as “message-passing”-class system. Each task is represented by “process”. All processes have access only to their local data. OS even protects memory of the process from being physically accessed from other processes. Processes can communicate through files: one process can write into a file, and another process can read from it. There are even special kinds of files, called “pipes”, which are designed specially for inter-process communication: if there are no data in pipe then reader will block until data will arrive or pipe will be closed. Unix guarantees that writing less then certain number of bytes into a file will happen atomically. There is an easy solution for this problem: you can have two files, one for data, and another for signalling. Subsequent versions of Unix-like OSes have also another message-passing facility called “message queue”.

A “semaphore” is a good example of guarded shared memory. However, it’s capacity is limited to single integer. More than that, it interacts with scheduler, so some operations like “get” can block the process.

Another well-known concurrent framework is called “POSIX threads”. It specifies that tasks have all their data memory shared, but each task has an isolated stack. For languages like C and C++ this means that “automatic” variables, which are stored on stack, will always be thread-local, but “global” and “heap” variables will be shared. This framework could be classified as “shared memory with critical code sections”. Critical section is started then thread “grabs” an object called “mutex”, and ends then thread “releases” this mutex. While grabbed by one thread, mutex cannot be grabbed by any other threads. There are two kinds of “grab” operation: “blocking” which puts thread asleep if mutex could not be grabbed, and “non-blocking” which returns error if mutex could not be grabbed. There is also a facility called condition variable. It is useful then thread doesn’t have any work to do, and wants to go asleep, but it doesn’t know when to wake up, so it will be woken up by another thread.

Threads in Java are very much like POSIX threads. The only difference is that it is implemented not as library, but as a part of language itself.

Explanation of concurrent programming. Part 1: theory

April 3, 2008

I’ll start an introduction to concurrent programming with example. Imagine a real-time multiplayer strategy game. Lots of things are going on simultaneously: your units move to destinated point, AI-controlled units also move, animated scenery is drawn, number of harvested resources increases, player’s mouse pointer moves across the screen. How is this program written ? Some time ago there was an approach to program’s structure called “main loop”: program first checks if mouse was moved and updates the pointer, then AI analyzes the situation and takes decisions, then all units move according to their orders, then screen update takes place, then again from the beginning. Several tasks are executed sequentally in loop. Loop itself is quite simple, main difficulty is to implement tasks themselves. They must be short, because otherwise it will be noticable that those actions happen sequentally. All tasks are coded along the same pattern: restore a context (like last mouse position, last drawn frame of unit, last amount of resource), perform action, which usually changes the context, then store a context back. Temporary data which were generated during execution of task are just thrown away. Writing of such games is not an easy task, because developer should remember what should be in context, and how to store/restore it. Making task too short is also bad: context will be bigger to hold data which otherwise would be temporary. For example, it is a good idea to have whole unit sprite drawn in one cycle and just increase unit’s frame number, otherwise it will be nesessary to remember screen position of last drawn pixel.

A concurrent programming is an attempt to perform storing/restoring of context automatic and transparent. Instead of having specific context for each task, some low-level “execution context” is introduced. Main loop is replaced with “scheduler”, which works on time basis and switches tasks. So, each task has a “slice” of CPU’s time for execution. Main benefit of concurrent programming is a transparency of context switching: code becomes much simpler.

Another benefit of concurrent programming is that it works transparently on several CPUs. Tasks are really executed in parallel, and automatic context switching allows to continue task execution on another CPU.

Introduction of concurrent programming rised several interesting issues. One of them is a consistence of common data. In sequental programming, task is continuous and could not be interrupted, so it can do any actions as long as it will leave a shared data in context in consistent state upon exit. In concurrent programming, tasks could be interrupted at any time, possibly leaving data in broken state.

Let’s stop at this point, and summarize that was already said. Concurrent programming is a solution for certain kind of programs. It works by moving a context outside of program’s scope. If program is compiled into machine language, then context is located at hardware level. Concurrent program works like it has several sequental tasks which are execuded in parallel by time-slicing single CPUs. There is an entity called “thread”, which is this sequential executor of some task. At this point a “thread” should look like as completelly artifitial entity, but there are reasons for it’s existence, and they will be revealed soon. Some tasks are completelly independent, some tasks work together, so there should be means for task isolation and for task interaction. And there is an entity called “scheduler” which chooses what task (or tasks, in multi-CPU machines) should be executed at each moment. 

If tasks are completelly isolated, then we just have a single machine which works as several independent machines. For example, if we will launch several inet server daemons (for FTP, HTTP, e-mail protocols), then we just have a single machine which is totally equivalent to several machines, each running a single daemon. There are absolutelly nothing which should be done to convert simple sequental task into “isolated task”. However, some actions are possible for isolated tasks to take benefit of concurrent environment.

Tasks are isolated from each other, but they can interact with scheduler. For example, if task is sure that there will be no work for it for some time, it can ask scheduler to “put her asleep” and use CPU for other tasks. Another possibility is to modify task’s “priority”, which defines how much of CPU should be given to this task compared to other tasks. All these actions use “thread” entity to indentify a particular task. So, “thread” is an indentification entity used then task should interact with scheduler, so scheduler can work with tasks as abstractions.

Now let’s discuss interaction between tasks. There are two known approaches. First is to have all task’s data isolated from other tasks, and have special facility for ineraction between tasks. For example, in a computer game a task which tracks mouse state can detect that mouse was moved, then it can calculate a rectangle between old and new position, then it can signal to task which paints on the screen to re-draw this rectangle because mouse pointer should be drawn in different place. Such approach is called “message passing”, because tasks pass messages between each other. Then some task sends a message, then delivery facility copies all the supplied data from local memory of sending task into internal buffer. Receiving task can copy data from facility’s buffer into it’s local memory.

This approach has shown us that task interaction adds a new entity into whole picture. Except tasks and scheduler we need a message delivery facility. But, should tasks and scheduler change in order to make interaction possible? No, it is not nesessary, but it could be very useful. For example, instead of having receving task to poll a facility in a loop waiting for a message, it is possible to make scheduler ”to put asleep” a task by not scheduling it for execution, so CPU time could be used by other tasks. Programmers say that “non-blocking” receive operation is turned into “blocking” operation, which allows to avoid “spinning” around it.

This approach solves data integrity problem by copying data two times. When data has been copied into internal buffer, nobody can change them. Access to buffer is available only then “send” operation has been completed.

Message passing is somehow similar to manual multi-tasking described at the beginning. Instead of context store/restore there are send/receive operations. However, this approach doesn’t restrict “task granularity”: tasks can send messages when they want, which is very convenient.

Now let’s duscuss a second approach to task interaction, called “shared memory”. Tasks still have some local data which are isolated to access from other tasks, and there are some common data, which could be accessed by all tasks simultaneously. For example, to signal a need of screen refresh, one task can set a common flag, and screen painter task will detect this and re-draw a screen.

Let’s investigate if this approach requires some special facility or not. First, a special facility may (or may not) be needed to define which memory is shared and which is not. Second, an access to shared memory may happen through some facility. Both of these statements use word “may”, meaning that “it is an implementation detail, not required by theory”.

The reason of possible data inconsistence is the uncertainty of order of read and write operations. Writer task could be interrupted in the middle of “write” operation, and reader task will try to read shared data. So, the problem should be solved by making access to shared data non-interruptible. Two solutions exist.

First solution is to make all shared memory access through special facility. This facility interacts with scheduler so all access operations are non-interruptible. It is obvious that this approach is very similar to message passing. Instead of “receive” operation which either blocks or returns nothing if there are no messages there is a “read” operation which returns old value of shared data.

Second solution is to have a facility which will just make some task non-interruptible, regardless if it accesses shared data or not. Part of the task during execution of which it will not be interrupted is called “critical section”. Simplest approach is to have this facility just to ask scheduler, but for multi-CPU machines it means that some of CPUs will waste their time. This problem is solved by having several shared areas, and specifying a specific shared area then entering non-interruptible section, so other CPUs can execute tasks which don’t access this shared area. Since dividing a task into interruptible and non-interruptible parts is a responsibility of a developer, this solution lacks unified mechanism for enforcing data integrity.

That’s all for today. Let’s summarize it all. Concurrent programming is an approach for solving following tasks: to emulate several sequental machines on one through “time sharing” of CPU, to simplify internal structure of certain kind of programs, and to take benefit of multi-CPU machines. Concurrent programming introduces following entities: tasks, scheduler and interaction facilities. Task interaction could be done using one of three approaches: message passing, guarded shared memory and shared memory with non-interruptible code sections. Facilities for task interaction are usually (but not always) internally interact with scheduler.