SIP stories, part 2: dialog forking

November 5, 2008 by kmatveev

When I’ve changed a way my SIP stack deals with dialogs, I’ve encountered with a problem that two unit tests stopped working. These unit tests implemented “corner cases” which are not very well described in RFC 3261, so I implemented them based mostly on my own understanding of SIP. I had two options: to claim that those use cases are wrong and remove them, or to fix my implementation. Obviously, I’ve started with use case analisys.

When “200 OK” response is sent on INVITE, corresponding client and server transactions are instantly terminated on UAC, UAS and all proxies involved. UAS should retransmit “200 OK” until it will receive ACK, but ACK may go in a different path than INVITE. Since reliable delivery of “200 OK” response is specific for UA elements, it is impelemented not in transaction layer, but in UA core of TU layer.

Because of proxy forking, a sender of INVITE can receive dialog-establishing provisional responses from several UASes. When one of them answers with “200 OK” response, the proxy must cancel all other branches. However, some UAS may also respond with “200 OK” before receiving CANCEL. When this second “200 OK” response will arrive at UAC, a client transaction for INVITE will be already terminated. But this response should be delivered to upper layers. RFC 3261 specifies (in chapter 13.2.2.4) that such responses should be matched against ongoing dialogs, and if no matching dialogs are found then new dialog must be constucted. Matching of a response against a dialog has a purpose: to cut off retransmissions of “200 OK” responses. A logic is simple: if response matches a dialog in confirmed state, then it should not be reported to application layer. However, all this idea about matching responses against dialogs has lots of flaws:

  • Described logic means that for any “stray” response UA must construct a dialog, then pass it to application layer. There are no means to check if response corresponds to request actually sent from this node. The idea is that only responses for sent requests should be processed, and processing should stop some time after receiving first “200 OK” response. But there are no means to ensure that.
  • Application is usually interested in context of a response. For example, it may be interested in knowing a request for which response was received. For normal responses a request could be obtained from client transaction. Descibed procedure doesn’t explain how this could be solved.
  • re-INVITEs are sent within existing dialogs. Retransmitted “200 OK” responses on re-INVITE will be always reported to application.

It is clear that RFC 3261 has a big flaw here. In my old implementation I had a workaround. Before sending an INVITE I’ve prepared a “dialog prototype”. All “stray” responses were checked against dialog prototypes by comparing “Call-ID” header, “From” header and URI of “To” header, and if there were a match then I’ve created a dialog based on a prototype. A prototype also held a reference on INVITE, so I could provide a full context for response. This homebrew solution worked, but I firmly decided to follow RFC 3261 to a letter. Searching for ideas I took a look at open source Java implementations of SIP stacks.

NIST implementation of JAIN SIP API matches “stray” responses against ongoing dialogs based only on RFC rules. If some early dialog matches a response, then initial transaction of this dialog is used as a context for response. If response doesn’t match existing dialog, this response is just passed to application layer. So, in this part NIST SIP stack is not compliant to RFC 3261. Some may argue that application can implement this functionality, but I don’t buy it: since there is no a context for response (request is unknown, transaction is null), an application can do very little.

Sailfin implements an interesting hack. They don’t actually terminate client transaction after receiving “200 OK” response. Instead, they transition a transaction into “established” state. This both helps to ensure that response is received for request which actually was sent, and also provides a context for a response at upper layers. Such approach seems to be a very good idea, except that it is not RFC 3261-compliant.

That’s all for today. As I promised to M.Ranganathan, I’ve pointed out problems with RFC 3261 and JAIN SIP API.  In upcoming article I’ll tell about second use case, how I’ve solved these problems, and why Sailfin’s hack is not a hack after all.

SIP stories, part 1: dialogs

November 5, 2008 by kmatveev

This story is a long one. It tells about many things which I learnt about SIP lately. Maybe it will be useful for other fellow developers. It will be told in several subsequent articles. Here is a first part.

I must admit that for a long time I understood a concept of SIP dialog incorrectly. I correctly understood that the only purpose of dialog is that, once established, you can send subsequent messages for them. So, for end-points they provide session mechanism, and intermediate nodes can work statelessly. However, I wrongly believed that requests within a dialog could be sent only after succesful response on INVITE (or SUBSCRIBE). In other words, I thought that in-dialog requests are possible only for confirmed dialogs. Thus, I was wandering: why early dialog is needed? Later I added reliable provisional responses to my list of means of dialog confirmation, thus solving the problem of PRACK and UDPATE.

In a process of optimizing memory usage I’ve decided to shorten lifecycle of a dialog by removing early state at all. Fortunatelly, before doing this I sat and read very carefully about dialogs once more. And my point of view changed dramatically.

Dialogs in a present way were introduced because of just one protocol feature: proxy forking. Without forking life whould be much easier: INVITE whould start a dialog. But with forking, each recepient of INVITE should distinguish itself for sender by providing tag in “To” header. So, each end-to-end relationship (which is a definition of a dialog, by the way) could be defined only after first response with tag in “To” header. Tag for “From” header was added just for symmetry.

If a recepient of INVITE has answered with provisional response containing tag in “To” header, then later it must supply exactly the same header in other provisional and succesful responses, because otherwise sender will think that those responses come from different UAS. When answering with error response a tag is not nesessary, because error response terminates all dialogs started for INVITE, no matter from which UAS it was sent.

Thus, all components used for matching request against dialog (Call-ID, and tags) will not change after first response, even if this response is a provisional one. It means that (contrary to my belief) there is no problem in sending in-dialog requests for early dialog. Then, what is the difference between an early and a confirmed dialogs ? Here they are:

  1. Early dialog will terminate when error response on INVITE will be received. Confirmed dialog is terminated by sending or receiving BYE request
  2. When transitioning from early state to confirmed state, a route set can be re-computed. In confirmed state, route set is never re-computed.

After I’ve learnt all these things, I’ve changed my implementation. The code became much clearer, because I’ve removed all methods like “matches()” which checked responses and requests against dialog in some custom (and incorrect) ways. The only correct way of retrieving a dialog is by comparing dialog ID calculated from contents of the message with dialog ID of internal representation of dialog. HashMaps work perfectly for this task.

However, the changes I’ve made had a very interesting consequences.

Book review: “Internet multimedia communications using SIP”

July 21, 2008 by kmatveev

For a long time my only source of information about SIP was RFC 3261 and others. However, recently we got a new book in a library Book is called “Internet multimedia communications using SIP” by Rogelio Martinez Perea. Now I’ve finished about 90% of the book, so I already have an opinion which I would like to share.

This book is interesting in it’s approach. It tells about lots of things, often oversimplifying them just to show only essential features.

First part is called “Fundamentals”. It explains principles of signaling/media approach, then moves to explanation of TCP/IP protocol family basics. Then on abstract level (without syntax!) it explains functionality of SIP. This part is intended for people who know absolutely nothing about networks.

Second part is called “core protocols”. First it explains basics of SIP syntax and interaction. Then it contains Java practice with free SIP stack. Next it moves to SDP, also with small Java practice. Next it moves to RTP, also with Java practice. And finally, all pieces are assembled together in soft-phone application and proxy application.

For me, such composition of the book is not very good. If somebody needs explanation of TCP/IP principles and never used sockets, then he probably will not be able to troubleshoot an application. On the other hand, advanced people like me will find code too amateurish, and most explanations as oversimplified.

Third part is called “Advanced topics” and tries to cover in small degree many (I’ve found 10) areas of SIP usage. There are no programming here, just concepts. This chapter, in my opinion, is the best in the book.

This book is quite good for students: it contains “main facts” about technology, so person will at least know that is it, and he can get deeper knowledge, if necessary. Other people who can benefit from the book are curious engineers, programmers and managers. I suspect that nobody will actually use practical part of the book.

To prove that I actually read the book and that my opinion has a base, I’m posting here a list of small annoyances, errors and mistakes I’ve found so far. Unfortunately, there are no site or e-mail address where I could send this list, so I’m posting it here in hope that author finds it useful.

  • Figure 6.15 on page 91 has an error. ACK request after 200 OK response always follows the same path as BYE request.
  • In chapter 8.6.7, an Example 4 doesn’t explain at which moment a “Dialog” object was created. There are no code, compared with Example 3, which would create a dialog. I perfectly understand that dialog is created automatically by stack, because default value of property “javax.sip.AUTOMATIC_DIALOG_SUPPORT” is “on”, so dialogs in fact are created in both cases. I believe it is worth mentioning, or replacing with creation of dialog explicitly. I believe that in this case oversimplification is bad.
  • In chapter 12.5.5 author explains that he uses TimerTask to close a dialog in case ACK is not received. Such approach is totally wrong. Actually a timeout for ACK is handled by stack, which will notify SipListener through processTimeout() method. Also, if TimerTask is intended only for ACK timeout, then a task should be cancelled if ACK is received. I believe that TimerTask should be removed at all.
  • In chapter 13.2.1, an exception #4 contains wrong information about CANCEL processing. In fact, proxy doesn’t send “487 terminated” response on INVITE. Such response will be sent by UA. Chapter 13.3.3 contains correct information about processing a CANCEL. I believe that author made this mistake because he wanted to make simpler proxy example.
  • In chapter 13.6.5 author explains proxy example. He uses ArrayList for context storage. A Map should be used instead, because code will be simpler and will work faster.

I hope that author will accept this list as a sign of respect from another professional. I believe that going to the matter is better then empty praise and stupid testimonials.

SIP gets more and more mindshare, and I fill that demand for professionals will grow. That puts me in a good position.

Application of HTTP parsers to SIP

July 18, 2008 by kmatveev

SIP was intentionally designed to have same syntax as HTTP. Except of familiarity for network engineers, it allows using same parser for both protocols. However, I never saw any case of “common” parser. So I decided to try myself. I’ve took open source HTTP servlet containers, located their HTTP parsers, and tried to apply those parsers to SIP messages.

Jetty

First container was Jetty. I’ve used version 6.1.11. Parser functionality is easily located, because it is encapsulated in class org.mortbay.jetty.HttpParser. This class accepts some buffers for internal use, an EndPoint which is the source of data to parse, and EventHandler which is notified when some message component was parsed. Main method is parseNext(), which tries to parse as much data as available. Reaching end of available data, this method remembers last “state” and last position before returning, so next time it will continue from a place it has stopped. I will call such parser “stateful non-blocking stream-oriented”.

Applying Jetty parser to SIP messages is pretty easy. First, an unparsed message should be wrapped either in ByteArrayEndPoint (if message is in form of byte array) or in StringEndPoint (if message is in form of character sequence). Second, an EventHandler should be implemented, which will store parsed data in object which represents SIP message. At last, an instance of HttpParser should be created and parse() method invoked.

Parser of Jetty is not “complete” and not “strict”. For example, it doesn’t distinguish between requests and responses, it just parses three components of start line. Detailed parsing and validation of all specific headers could be done later.

Code of parser itself is quite complex. Parser works at byte level. It even provides message components as instances of org.mortbay.io.Buffer, which is very similar to java.nio.ByteBuffer. This buffer could be decoded into character format.

Tomcat

Second container was Tomcat. I’ve used version 6.0.16. HTTP parser here is located in component called “coyote”. Class which performes parsing is called org.apache.coyote.http11.InternalInputBuffer. Tomcat uses this class from method process() of class org.apache.coyote.http11.Http11Processor, which accepts Socket as input parameter. Using Http11Processor is not convenient, so I’ve used InternalInputBuffer directly, which is quite simple. Constructor accepts instance of org.apache.coyote.Request which will store parsed data. As a source of data InternalInputBuffer accepts any java.io.InputStream, so I’ve just used ByteArrayInputStream containing whole SIP message. After invoking methods parseRequestLine() and parseHeaders() it is possible to extract all nesessary data from Request, such as method, and headers, then put those data into any suitable form.

When used from Http11Processor, a parser is provided with InputStream of Socket, so read() operation used by parser will block if there are no data. I will call such parser “blocking stream-oriented”. A big drawback of Tomcat is that it isn’t able to parse SIP responses. Parser works on byte level, and all message components are stored in Request class as instances of MessageBytes class. This class performes byte-to-char conversion on demand.

Resin open-source

And finally, Resin open-source version 3.1.6. It is quite simple to locate HTTP parser here. Class is called com.caucho.server.http.HttpRequest, and a particular method is handleRequest(). Constructor of HttpRequest requires an implementation of Connection which is used as source of data. I’ve created a fake connection and assigned a readStream with StringReader which took SIP message in String format. Another option was to use ReaderStream which obtains data through any java.io.Reader (like InputStreamReader). Parser doesn’t do real UTF-8 decoding. Instead, it takes a byte, simply extends it to character, does all parsing using this character, then stores it into object field.

However, it seems to me that parser is not re-usable, because constructor of HttpRequest accepts object of Server class as argument, and I cannot fake this Server.

Conclusion

It is quite possible to use some HTTP parsers for SIP. Jetty is a winner here: you don’t need to modify it (just put a jar file into classpath), it will work perfectly with NIO and amount of code to write is small. The only drawback is that code of parser is quite complex. Parser of Tomcat is simpler, but works only for requests and only with synchronious I/O. Resin is not re-usable without modifications.

Lost layer of SIP

July 10, 2008 by kmatveev

SIP is defined by RFC as a layered protocol. This means that all the functionality is grouped into several layers, which have “entry points” where they accept data from other layers. Whole protocol is described as internals of each layer and inter-layer communication. Layers are not organized in strict sequence: for example, transaction user layer sometimes interacts with transport layer directly, like when sending ACK.

Implementations of SIP usually follow layered model. However, since layers defined as “functional”, it is possible to move an implementation of a layer upper or lower, as long as result will be the same. For example, parser layer by definition is located below transport layer. If implementation follows this approach, then it first opens a socket, then encodes a message and writes resulting bytes into that socket, possibly in streaming mode. But other implementations first encode a message, then pass a byte array to common network code which will send those bytes, possibly re-using some existing connection. It is even possible to store a result of encoding in a transaction, thus avoiding encoding a message for each retransmission.

Unfortunatelly, a layered model of SIP is not perfect. It seems to me that there is an additional layer lost in a spec. This article is an attemt to point this fact out, and to show how existing SIP implementations deal with it.

A chapter 12.2.1.1 explains how to populate an outgoing request based on route set of a Dialog. A chapter 8.1.2 of RFC 3261 describes actions UAC should take to send an outgoing request. To find out which URI should be used to define a request destination, it is not enough to have just a request, but it is also should be known if a first element of a route set was a strict or loose router. When a URI is chosen, it should be converted to a list of IP addresses using a procedures described in RFC 3263. After that a request together with IP+port+transport should be passed to transaction layer for sending, unless a request method is ACK. Chapter 8.1.3 describes response processing.

Exactly the same steps are descibed for Proxies in chapter 16.6, steps 6 and 7. My observation is that there should be an additional layer called “Client”. This layer whould be common for proxies and UACs. There are also other pieces of functionality which naturally fit into this layer, like sending a CANCEL.

Sailfin, an open source SIP servlet container, has a layer called “ResolverManager” which is located above TransactionManager and below DialogManager. This layer partially implemens my logical “client layer”. When sending a request it does IP/port/transport resolution, which is stored in ”_remote” field of a request and later used on transport layer. ResolverManager also handles 503 responses by re-trying another IP address, if exists. Treating client timeouts as “408″ responses is done entirely on transaction layer. Strict routing is not supported by Sailfin.

Reference implementation of JAIN SIP API is much more obscure. There are two methods for sending requests: through SipProvider, and through ClientTransaction. Both methods use “Router” to determine IP address. If application doesn’t define any special Router, a DefaultRouter will be used, which does route info post-processing and URI determination. Address resolution is not performed as described in RFC 3263, instead they just use standard Java facility. So, “client layer” here is hidden inside transaction layer. Well, as I said at the beginning of this article, re-ordering of layers is possible if whole picture is the same. But this implementation makes it impossible to implement high availability by trying several client transactions, if hostname has several IP addresses.

I wish RFC 3261 whould be more clear. I also wish JAIN SIP API whould be much better than it is. ClientTransaction interface already suitable to be used as interface for “client layer” , we just need to throw away all methods like getBranchId(), getState(), createAck(). However, a proposal for new, better and cleaner SIP API is a subject for another article. I’m open for suggestions!

Evaluation of Sailfin

July 3, 2008 by kmatveev

This article contains the result of my evaluation of SIP Servlet container implementation named SailFin.

Just a little bit of background. SailFin is an open-source SIP Servlet container hosted by java.net project. Big parts of it were written by Ericsson. SailFin is not a stand-alone container, it is build on top of Glassfish.

Using Sailfin

If you want to play with SailFin yourself, then you should download a binary package. I’ve used “milestone 4″. The file which you download is a “unpacker”, which will unpack a large directory structure, containing Glassfish with Sailfin. Second thing you should do is to run “setup.xml” script using pre-bundled ant. I had problems at this stage, but they were solved then I removed ANT_HOME environment variable and also removed my existing ant from CLASSPATH. This “setup.xml” script creates one “domain”, which is a entity administered by instance of Glassfish. Domain is represented by folder, and all configuration, deployment and logging is stored there. Then I’ve started Glassfish, and connected to its management console through HTTP. All was working.

Next I’ve tried to use some SIP servlet. A distribution contains a directory with samples. However, for my purposes those samples are too complex. It seems that whose samples are designed to show benefits of SIP-JEE integration, which was not interesting for me at moment. So, I’ve written my own servlet which does simple proxying. Samples have build.xml scripts for ant which do everything including deployment, but those scripts refer to other scripts, which in turn refer to other scripts, thus making it diffucult for me to figure out which properties I should set. Instead of hacking those scripts I’ve made a simple script myself just for compilation and packaging, and deployed my servlet manually by copying it into auto-deployment directory. Then I’ve checked through management console that my servlet was succesfully started.

I have a habit of testing intermediate SIP nodes by calling through them from my softphone to media server, and if I can hear a recorded announcement then I assume that node is working. Unfortunatelly, attempt to call to media server through my servlet has failed. As a developer of server software, I’ve tried to find logs which whould contain any clues. The only one I’ve found was called “server.log”, and it didn’t contain anything useful. I thought “OK, maybe I need to set more verbose log level”. After a long seach if management console I’ve finally found a place where I could specify correct log level. At the bottom of the page “Application Server/Logging/Log levels” there is a small table “additional properties”. Length of text fields is small so you can’t see whole property names without scrolling through text field with cursor keys (or property names too long), that’s why it took so long for me to find it. One of those “additional properties” is called “javax.enterprise.system.container.sip”, which corresponds to log level of SIP logger. Default value for this property is “INFO”, which I’ve changed to “FINE” and pressed “save”. Then I’ve made another call, and at this time I’ve got a much more verbose log. Now I could see that request was received, but an exception was thrown while parsing Request-URI:

ReqUri = sip:annc@10.50.3.83:5060;early=no;
play=file:////opt/snowshore/prompts/generic/en_US/try_again.wav

javax.servlet.sip.ServletParseException: Unexpected exception while parsing URI: java.lang.StringIndexOutOfBoundsException: String index out of range: -20
        at com.ericsson.ssa.sip.SipURIImpl.<init>(SipURIImpl.java:84)

It seems that SIP URI of media server is probably too complex for Sailfin, so I’ve tried to call to another softphone instead. This time an attempt was succesfull. I’ve changed log level back to INFO, and called to softphone again. There were nothing about new call in logs. I think that it is wrong to log errors on “FINE” severity.

Another little annoyance was that SIP ports shown in configuration of “SIP Container” were not actual values used by SIP stack. Instead, there is another configurable entity called “SIP Service”, which has “SIP Listeners”, actually used for processing incoming request.

Reading source code

Source code bundle contains sources for both Glassfish and Sailfin. Whole size of sailfin is 780 classes, which in my impression is quite a lot, so I suspect it is a little bit overengineered. For source code analysis I’ve applied my usual technique: find some class which purpose I understand, then backtrack to “main” class which manages whole stack (“the origin”). I’ve quickly discovered classes which do parsing and network I/O, then tried to backtrack using “Find usages” feature of my IDE. Unfortunatelly, this was not working, because I couldn’t find references which make sence. After some hacking through code I’ve understood the reason of failure: the code uses reflection. Anyway, soon I’ve understood the whole architecture.

Container startup

Starting point of stack is located in module sailfin/integration. There are two interesting classes here: SipContainerLifecycle and SipServiceListener. Those classes are specified in separate places of Glassfish’es configuration file  (which is located in “domain1/config/domain.xml”). Both those classes implement interface LifecycleListener, but in fact they are two different interfaces with same name! SipContainerLifecycle implements com.sun.appserv.server.LifecycleListener, and SipServiceListener implements org.apache.catalina.LifecycleListener. I don’t know exactly why to use two interfaces, but I suspect that first interface is used for lifecycle of whole cluster, and second interface is for one node.

When SipContainerLifecycle class starts, it reads a list of class names from “stack configuration”. When SipServiceListener starts, it takes this list and creates those classes through reflection. Each class represents processing stage for incoming and outgoing messages. Those classes are called “processing layers” and implement common interface Layer used for bi-directional processing. Some of those layers also have lifecycle methods, which are also invoked by SipServiceListener through reflection. I whould personally prefer having special interface like LayerWithLifecycle which whould extend Layer with lifecycle methods, but I’m not the author. SipServiceListener also connects layers into pipeline-like sequence.

By default there is nothing specified in configuration file, so stack uses hard-coded default configuration (from StackConfig.defineStatic() ). Being able to configure which layers will be present in stack gives a lot of flexibility. For instance you can choose at which level to have overload protection: between parser layer and transaction layer (by default) or before application dispatcher. If you know for sure that your AS will host only proxy-style servlets, then you can remove DialogManager and get small performance boost because your requests will not be checked against ongoing dialogs. Unfortunatelly, it doesn’t work as I thought (see later). However, such generic layered mechanism allows you to create a configuration which doesn’t make sence, for example, you can put DialogManager before TransactionManager. Also most layers just could not be removed without breaking the stack. So I whould prefer having specific interface at each inter-layer border.

Processing of incoming messages

Lowest layer is called NetworkManager and includes functionality which RFC 3261 calls “parsing layer” and “transport layer”. There are two implementations: straightforward one (called OLDNetworkManager) and another based on Grizzly (called GrizzlyNetworkManager). Both of them implement Runnable interface, so SipServiceListener will recognize it and run in dedicated thread. Implementation of run() method in those managers does multiplexed reading from several network channels using NIO Selector.

Then NetworkManager receives some SIP data, it reads them in ByteBuffer and handles for parsing. Parser is hand-written (as oposed to automatically generated parsers based on BNF grammar), so it should be fast. Parser creates object tree for SIP message. Root object is either SipServletRequestImpl or SipServletResponseImpl, and it contains a collection of headers as a map of header name (String) onto collection of values (com.ericsson.ssa.sip.Header). This collection of header values provides common interface for two cases: single value (SingleLineHeader) or variable number of values (MultiLineHeader). I’ve used same approach in my code for a long time, so this looks very familiar to me.

Next layer is TransactionManager. As I expected, it contains two ConcurrentHashMaps for client transactions and for server transactions. Sailfin uses Strings as transaction identifiers, but I prefer using special objects which just correctly implement equals() and hashCode(). Correct String keys require concatenation of several message fields, which sailfin doesn’t implement, so their implementation of transaction lookup is not 100% correct. However, they have // TODO marks that they will implement correct transaction lookup in future. Also I suspect they have a bug what receiving retransmissions of CANCEL will create another NonInviteServerTransactions each time.

After several other layers an incoming message is processed by ApplicationDispatcher. For initial requests ApplicationDispatcher selects a servlet responsible for processing of the request. Based on actions of servlet a container decides if servlet will act as proxy or user agent, and inserts some kind of PathNode (either ProxyContext or UA) into the pipeline. If servlet acts as proxy then forwarded request will also be processed by ApplicationDispatcher, leading either to handling it to another servlet (which will add another PathNode) or to sending request into network.

For subsequent requests ApplicationDispatcher merely passes that message to first PathNode. Responses are processed by servlets in reverse order, so they are passed to last PathNode.

Preliminary conclusion

I’ll continue describing sailfin sources next time. My overall impression about source code is that it is well written and quite professional. Variable, class and method naming is quite good. What I don’t like is an overall bloat and overuse of reflection. I don’t like how logging is done. I perfer that all branching in message processing logic which depends on previous state (like processing on transaction layer, which depends on if transaction exists or not) should be logged.

Testing

Very bad thing about Sailfin is the absence of unit tests. I’ve decided to apply my own suite of unit test and to see what will happen. My unit tests are written meaning that ”unit” is a whole container, so they treat container as a “black box” with two interaction points: servlet and network.

I’ve replaced existing NetworkManagers with my own, which doesn’t actually sends anything, but just remembers which bytes were sent to which addresses. Also I’ve replaced ApplicationDispatcher with my simple one, having only one particular Servlet which remembers all requests and responses it was provided with. Each tests does some actions on one side (either network or application) then checks outcome on another side.

While adopting my tests for Sailfin, I’ve found many interesting things. For example, sending is done asynchronously. Also I’ve understood that DialogManager and ResolverManager are mandatory layers.

Modifying Sailfin was not an easy task. It is not as modular as I expected, and horrible singleton pattern is used too often. It seems that bad design prevents it from having unit tests.

With my tests I’ve found several minor bugs (like session.getRemoteParty() and session.getLocalParty() are reversed for incoming requests) and some annoyances (like exception if invoking getProxy() on subsequent requests). The thing which I dislike most is that request-URI or Route header for in-dialog requests must contain special “dialog fragment identifier”. Yes, RFC 3261 specifies that UA should put remote target URI in request-URI, but this is a requirement for another network element, and Sailfin should not depend on it. RFC 3261 clearly states that dialog should be specified only by Call-ID, remote tag and local tag, so “fragment identifier” should not be used.

Conclusion

Sailfin is not yet ready for production. However, it looks quite promising. I think it is really hard to support such huge pile of code without unit tests. Good luck you guys!

Container approach in Java. Part 2: instances (version 2.0)

June 2, 2008 by kmatveev

In a previous article I’ve defined a component/container architecture and explained the reasons why such architecture is used. This article will cover real examples of containers which are known to me. in historical perspective.

Servlet containers

Earliest containers known to me are servlet containers. These are typical IoC containers, used on servers communicating asymmetric request/response protocols, like HTTP. This allows applying IoC approach to handling of protocol logic: servlets must implement “Servlet” interface containing method “service()”. This method is invoked by container for each incoming request. For HTTP, IoC principle also applies to creation and sending of responses. 

Interface “Servlet” also contains methods “init()” and “destroy()”, which are used for lifecycle management. Method “init()” also used for ”context” injection. From this “context” a servlet can extract references to all components it depends on. 

Servlets are developed according to special convention, so they are collections of classes and XML descriptor.

Well-known examples of servlet containers are Apache Tomcat, Jetty, Resin.

EJB containers

EJB spec is also quite old. It is a “generic” component/container architecture for complex data-processing logic used inn enterprize IT systems. Curioulsy enough, it was developed as distributed architecture, meaning that each component will be located on dedicated machine, and container will provide inter-component communication. IoC principle is applied to lifecycle and persistence. Dependencies are resolved using lookup (in JNDI facility), however binding in JNDI is done automatically.

Later EJB spec changed to support “local” access between components. Anyway, it was and still criticised for being complex and slow, and many other frameworks emerged to fix its flaws.

Well-known examples of EJB containers are: Glassfish, Apache OpenEJB, JBoss, JOnAs, Bea WebLogic, IBM websphere, and lots of others.

Microcontainers

Microcontainers are “generic” component containers focusing only on local access and providing just lifecycle and late binding. All other features could be realized ”on top” of container, by implementing them as a components. 

Apache Avalon was first known to me attempt to build “lightweight” container. Later its developers divided, but they have tried to support common framework for containers. This framework follows “interface injection” approach, which means that dependency on something is declared by implementing certain interface, and the same interface is used for injecting this dependency. Thus, Avalon framework contains lots of interfaces, for example for lifecycle, logging, configuration. However, injection is used only for dependencies which are part of framework. For resolving other dependencies there is also lookup facility.

Apache Hivemind is another minimalistic container, with IoC principles applied to lifecycle, configuration and automatic dependency injection, although lookup is also supported. This container is best classified as “declarative de-centralized”. It uses a special format for its components (code + XML-based descriptor). Dependency injection is supported through components following a naming convention.

Other examples of microcontainers are: Picocontainer, Butterfly, Guice.

Spring

Spring framework is a set of components aimed to be simple, lightweight and cheap alternatives to  all parts of JEE. As a a replacement of EJB it provides a much simpler IoC container. However, this container is often used not just for binding user components together, but also for binding them with system components. This was new at the time, because JEE application servers didn’t allow to customize the “system” part. Spring container is not just “business logic integration point”, but whole “application integration point”.

IoC principle in Spring could be applied to lots of concerns, including lifecycle, dependency resolving via injection, configuration. However, usage of IoC is not mandatory and could be avoided, but it will make whole arcitecture less consistent.

One interesting application of IoC principle used Spring is “aspect-oriented programming”: a container “wraps” modules with its own “proxy”, and injects this proxy into dependent modules. This “proxy” allows inserting some functionality before invocation and after invocation, so some component can affect interaction between two other modules without modifying them.

Spring is very popular, because it provides a large base for building custom server-side software, either complex or simple.

 JEE application servers

After success of Spring vendors of many JEE application servers understood that their products should be more customizable, so those servers were re-designed as microcontainers. Apache Geronimo is a microcontainer allows deployment of components called “GBeans”. JBoss also implements microcontainer architecture. Glassfish and JOnAs stated that they will move to OSGi.

Apache Geronimo is an example of “embedded container” architecture: EJB container (OpenEJB) is itself a component in another container (Geronimo microcontainer).

OSGi

The distinguishing feature of OSGi framework is a complex classloading. This allows “hot upgrade” through dynamic loading and unloading of classes, and other interesting capabilities. IoC principle is applied only to lifecycle. As a separate component there is a “service facility” which could be used for basic dependency lookup, and also allows subscription to notifications about lifecycle events in other services. There is also component for dependency injection called “Service Binder” based on XML descriptors.

There are several implementations of OSGi framework, including Eclipse Equinox and Apache Felix.

JSLEE

JSLEE is another non-generic component architecture for Java. It is similar to EJB in a way in which IoC principle is applied to persistence and lifecycle. JSLEE includes scalable event-delivery facility, and IoC principle is appled to some aspects of interfaction between components and event-delivery facility. could be viewed as a combination of Servlet Container and EJB container. Dependencies are resolved using lookup (in JNDI).

Container approach in Java (version 2.3)

May 28, 2008 by kmatveev

This article discusses reasons for existence of software flavor called “container” with emphasis on Java. For a long time I was wandering if technologies like Spring, Apache HiveMind, Apache Fortress, OSGi are really useful. And here is the story what I’ve learned. Please be warned that I never used any of those technologies myself! I’m not an expert in any sense, and all this stuff I write based only on reading documentation. This first part is theoretical, and second part contains real-life examples.

Software modularity.

Software modularity is known for a long time. It was invented as approach to software analysis and design. Whole software is decomposed into pieces, and each piece implements certain functionality. The reason for modularity is to simplify analysis by controlling levels of detail. There are basically two means of expressing modularization: referring to something compound through a name (e.g. modules, packages, classes, procedures, functions), and preventing something from getting outside of context (e.g. local variables). Another term for levels of detail is called “levels of abstraction”, since detail is complementary term to abstraction. The main point I want to emphasize is that modularization is logical because it is meant to bring logical order by keeping many things local and dependencies explicit.

Next level of modularity is called “source modularity”, which means that source code is divided into parts representing logical modules. Java is remarkable in expressing logical modularization strictly through source. An accidental benefit of source modularity is a possibility of division of work, if analysis precedes implementation.

Another accidental side effect of modularity was the ability to re-use modules in several programs. Whole program is unlikely suitable for more than one purpose. However, a set of logical “blocks” which human’s mind uses is limited, so program analysts often use same concepts while designing different programs, leading to re-use of code which implements those concepts. Programmers started tearing programs apart. Some programming languages are proud of not just expressing modularization, but enforcing it. Some people started to write incomplete programs called “libraries” with idea that others will use them. Analysis of a program now includes a research of which parts could be re-used from other programs.

For me, most straightforward way to re-use existing module is by including its source code into program’s source code. However, if program is compiled, then source code of the module is compiled too. Because main program is changed much more frequently then module, and compilation takes time, so separate compilation was invented and new “linking” stage assembles modules together. An additional benefit of linking is that it very well expresses dependencies between coarse-grained modules.

It doesn’t matter if modules are assembled at source level or at linking level, the result will be a solid program. Compilation and linking are one-way processes, so it is not possible to remove a module from a program. In other words, module itself is a prototype which is “cloned” with no cost, and each clone is “fused” into a program instance forever. Java is slightly different in this way, because it doesn’t have linking stage, JAR files just look solid.

Software modularity as a development practice nowadays is used in any non-trivial software. Yes, it could be misused, but amount of experience in this area today allows developers to quickly learn how to use it in right way.

Struggle for de-coupling and implementation lookup.

Next thing which software developers have noticed is that modules could be replaceable. Several modules can have same interface and differ in implementation, performance, price and license, so module user has a choice. Module users were so excited about “flexibility” their programs suddenly achieved, that they agreed to abandon more straightforward interfaces in favor of “standard” ones. Biggest selling point of most Java technologies is that customer will not depend on single vendor.

Programmers started inventing interfaces and using modules only through interfaces. So, dependencies have not gone, they have just changed. First, instead of depending on particular module, program now depends on interface. And second, a complete program consists not of interfaces, but of particular modules. So dependency on implementation is still introduced, it just happens later. With languages like C a dependency on real module is introduced at linking stage. Unfortunatelly for Java, dependency is specified in the source code, at place where instance of implementation is created. So, for Java “late binding” always means “run-time binding”.

I had this in my Java programs. Whole program used module through interface, and the single place where my program depended on particular implementation was a place where I called “new InterfaceImpl();”. I had extracted this expression into factory method, and replaced “new” with creation through reflection, taking class name from system property. I was very proud of myself, because from this time this code was “flexible”. I have pulled out this dependency into script which launched the program. Such approach for resolving the dependency is called “implementation lookup”, because class name was obtained by searching in some single-instance location using specified key.

Obvious advantage of ”late binding” is “deployment flexibility”, which means that you don’t need to re-compile your user module when you want to use another implementation. This could be useful, because some people just afraid of programming, other people don’t want to download source code. However, other people prefer pure programming solution. So, this concern is purely human one. Another advantage, which is more important to developers, is that there is no dependency on particular implementation at compile time, so compiler enforces de-coupling. There are also disadvantages: code became more complex and much less straightforward. If I have specified an incorrect implementation, I would get a runtime error. And instead of single direct dependency it now has two dependencies: on interface and on binding policy.

Implementation injection and Inversion of control

Programmers who value compiler-enforced interface-based programming and don’t value a flexibility of deployment have invented another programming technique called “implementation injection”. Factory method is moved in separate module (“factory module”), and user module receives an instance of interface from this factory module through setter method. Thus, by multiplying amount of modules and complicating a compilation process a programmer will have compiler-enforced flexibility and still pure programming solution.

Implementation injection is a particular case of more general programming practice called “inversion of control”. Inversion of control is a process of changing your module in the way so it is no longer executes some precise action to obtain a result (e.g. creating instance) , but instead it just exposes some means of obtaining a result (like setter method). In other words, Inversion of control is a process of turning an active component into passive one. Another colorful name for this approach is called “Hollywood principle (don’t call us, we’ll call you)”.

Inversion of control is quite general principle. Another particular implementation of it is called “event-driven programming”, which means that instead of having event-polling loop in each component there is a one global event-polling and dispatching loop. Applying inversion of control makes some code more flexible, but less capable (because now it does less), so it should be done only when extracting common behaviour from several places for sharing and unification.

Speaking of complexity, injection and lookup are similar. The only difference is that lookup is more imperative and injection is more structural.

Management containers

Sometimes you need to execute several totally independent programs in the same runtime environment, for example, in the same JVM. This could be useful on equipment with limited memory, so common class files will be shared. Most straightforward way would be to write simple Launcher, which will create and start all co-programs. Programmers which prefer to stick with single language will write Launcher in the same language as co-programs (Java). But if a set of co-programs to be run changes, then updating such class means full circle of editing, compiling, deploying. There are approaches with better “deployment flexibility”, such as:

  • launcher with shell for interactive specifications of co-programs
  • launcher which reads configuration or script in some special language
  • launcher which automatically discovers co-programs. For this case, modules should follow some common convention, so launcher could discover and use them. This is called “plug-in” concept: “if something exists, then it works, if you don’t want it working – remove it”.

Launcher is a simplest form of “management container”, because the only thing it controls is if a co-program will be started or not. Being able to manage co-programs from single point seems good to lots of people, so number of concerns to be managed have grown far beyond lifecycle.

Each co-program could be managed differently. To unify and automate management tasks most containers introduce “policies” which co-programs must follow. For example, they must implement certain interface used by container for lifecycle. So, making a co-program “manageable” often means inversion of control.

For example of lifecycle-management container, you can imagine window which shows several applets, with controls allowing to start/stop them independently.

Two benefits of container architecture (single management point for users, and having common code in single place for developers) really matter only if number of co-program is significant (for me, more then 10). For just a couple of co-programs, it is not worth to bother.

Component containers

We are coming to most important point in whole story. Somehow developers decided that modules of which co-programs consist should be separate co-programs in container. Main reason for that is an ability of runtime sharing of code and associated resources (for example, one window frame, one HTTP stack, one event-polling thread). As a result, co-programs are no longer complete, they use other co-programs. Let’s call such co-programs a “components”, since they are parts from which a complete solution is assembled.

To implement sharing, a used component should be created by container (that’s why it is a separate component), and user modules should obtain a shared instance.

One approach for user modules to obtain shared instance is a “lookup”: a reference to shared instance is bound to some “name” (or “key”) in lookup facility, and user modules can obtain an instance using this “name” as a parameter. Since lookup could be done at any moment after user module creation, it is important to create and bind used module before creating user module.

Binding of shared module to name in lookup facility could be realized in different ways:

  • “Imperative” way. Binding is done in the code of used module.
  • “Declarative” way. Used module declares that it provides some service, and binding is done automatically by container after creation of used module, using service name as key.
  • “Container-managed” way. Binding is done by container, either interactively, or through configuration.

Second and third approaches are cases of inversion of control, because common binding code is being moved from components into container. Applying inversion of control to user module turns “instance lookup” into “instance injection”: user module provides a setter method though which it will obtain a reference to used module. Like binding, injection could be realized in ”declarative” per-component way and in “container-managed” way.

Thus we got a “component container”, which is a special flavor of management container capable of performing “wiring up” or “linking” components into complete application.

Dependency management in containers

You probably already noticed that the same “lookup” and “injection” approaches are used for two different purposes: for de-coupling of user code from implementation and for discovery of shared instance. Both tasks are cases of general task called “runtime dependency resolving” or “late binding”. Accordingly, ”lookup” and “injection” are general solutions to this task. For component containers this means that existing linking facility could be also used for removing “hard” dependencies between components and replacing them with “managed” dependencies.

For simplicity, and because some components simultaneously play roles both of user of one interface and of implementation of another interface (so called “layers”), containers usually have uniform approach for dependency management. Here are typical cases.

“Imperative” container is not far beyond lifecycle manager plus bind/lookup facility. Both binding and lookup is hidden in code, so dependencies are not known to container. If “dependency could not be found” error occurs during startup, then container admin must check documentation and find out if he has missed some component or he should change startup order. In other words, management is simple yet not flexible, and troubleshooting is hard. However, this approach appeals to developers who prefer use single language for everything and do stuff programmatically. Components depend on container (lifecycle and bind/lookup facility) and on interface naming policy.

“Declarative de-centralized” container with dependency injection is a most widespread approach. Separate declaration is explicit, which requires special language for it (e.g. XML, annotations). This is a plug-in approach with container resolving dependencies automatically and printing helpful messages in case  of errors. Start order is determined automatically by reversing dependency order. Components depend on container (lifecycle), interface naming policy (because interface must have same name declared in implementations and users) and dependency-declaration language. In other words, components are complex, container management is simple yet not flexible.

“Flexible centralized” container. Components are “plain old java objects” (“POJOs”), which themselves have no dependencies on container and on interface name. Container has an interpreter of some language (declarative, like XML or imperative, like BeanShell) for lifecycle management and wiring up components. This is a centralized and very flexible, but more complex management.

Container wars

Flexible containers are transparent for components. However, they require some manual work for wiring up components, and in this sense they are just another way of programming.

Other approaches imply that components are designed for particular container. There were attempts to de-couple components from particular containers by designing a common standard for containers (“container framework”). Examples of such efforts are: EJB, Apache Avalon, OSGi. Neither framework prevailed, so developers have following options:

  • Stick with single container
  • Support several containers (usually not many) directly in cost of harder maintenance
  • Use “wrapper” component for each container

There are still “container wars”. Some containers (like OSGi or JSLEE) have some distinguishing features which they use as selling points. Other containers try to be “generic” and attempting to attract users by supporting many dependency-management approaches and having more ready-to-use components packaged. More about that will be explained in second part.

Criticism

We already mentioned that containers provide some benefits, like uniform management, runtime module sharing and implementation replaceability. However, it comes with cost of harder development and maintenance: architecture is more complex, code is less straightforward, “second” language and possibility of runtime errors.

My main criticism to containers comes from the fact that drawbacks are never mentioned by container vendors. They promote their products and push users to create over-engineered systems.

I believe that runtime errors are bad and should be avoided where possible. By “avoided” I mean “checked at compile stage instead”. In this sense, a claim that “containers help by enforcing modularity” is absolutely false.

To promote containers, their vendors claim that it will become possible to do ”rapid application assembly from known components” (taken from azuki site). However, an amount of ready components is very small (in my opinion, only Spring has some). I believe that containers are appealing to non-programmers who think that they can avoid programming. But if your application is just an assembly of “known” components, what gives you advantage over competitors? I never believed in just “assembly”, something always should be written in code.

Then why containers are used ?

In my opinion, container approach is used mostly in server software. There are several reasons for that. Servers often host several independent ”services” (co-programs). Server software clearly distinguishes between “service users” and “administrators” who manage from single point. Services often share common modules which consume system resources (threads, memory pools, database connections). For server-side software there is a market for “system-level” components (message queues, network stacks, object-relational mapping, tracing/logging) providing one of standard JEE APIs.  So, you can see that all requirements which led to creation of containers are often applicable for server software.

Conclusion

To sum up all said, here are short advices.

  • Avoid containers where possible.
  • Don’t listen to their propaganda. They want to complicate your life, because they need to justify their worthless efforts to buil something from nothing.
  • If forced by circumstances, evaluate something existing
  • Be prepared to harder maintenance

Coming soon

A second part, which will try to map theory on real life by checking which of existing containers use which approach.

Useful links:

  • Apache container 2.0 paper is a good overall presentation. However, I don’t agree with author about definition of “Inversion of control”.
  • Great article of Martin Fowler supposes that you already believe in importance of using de-coupled components and discusses in depth topics of dependency injection and dependency lookup without investigating reasons of emergence of containers

Ultima Underworld

May 27, 2008 by kmatveev

Ultima Underworld. First smooth first-person-point-of-view 3D RPG. Unbelievable advanced game for its time. I’ve just completed it, second time in my life. First time it was in 1996.

Back then, everything looked very impressive for me. Medieval style, 3D graphics, dark atmosphere, sword combat, water, spells, conversations, trade, skill system. It is good that I forgot most puzzles, so this time completing the game was still challenging. This time I was dissapointed with plot and quality of dialogs: they seemed not deep at all. But I understand now that for this game plot is secondary. Focus was made on exploration and combat.

I don’t think that character class matters in the game. I suspect that it only defines main stats and a set of skills you can have initially. Those could not be changed in game, as opposed to skills, which could be trained. Most important stat for me is strength, because it determines both damage and max weight you can carry. So, probably best choice whould be fighter or paladin, because such character will be able to fight efficiently in the beginning and will be able to carry much more belongings with him. Later in the game, if you want, you can train your magical abilities to become a good mage.

Training is done by “praying” near the shrine. You just choose mantra, which determines which skill will be improved. Training will reduce the amount of free skill points. More skill points can be earned through advancement in experience. Unfortunatelly, there are no indicator of remaining skill points. Experience is given for exploration and defeating enemies. So, if you can’t defeat some monster, then go try a different route, and come back later then you’ll become stronger. Fortunatelly, game is always non-linear.

Fighting is essential. There are four classes of weapons (swords, axes, maces and ranged) and you can also fight unarmed. In each class there are weapons with different damage. Each weapon has a “condition”, and from frequent use they will become damaged. Some weapons are magical. To determine a magical property you need high lore skill or special spell. I don’l like ranged weapons myself, so I just use swords, axes and maces. Also I’m not just standing and wielding with weapon: I’m running forward and back, so I will hit my enemy and get away then he hits me.

Magic is useful, however it is not essential. If you have low magical skill, then you still can use wands, scrolls, potions and rings. However, I’ve found “flameproof” extremely useful in lower levels, so I whould advice spending some skill points to train magical skills, especially mana capacity. Offensive spells could be powerfull, but they require some space to be released, and they consume lots of mana.

Magic in the game is “rune-based”. This means that you assemble your spell from runes. Then I first time played this game back then, I had a pirate copy without any manuals. Just after start I’ve found some runes, and then I’ve tried combining them I’ve discovered some spells myself. Later in the game some characters revealed more spells. So I thought that all game spells should be discovered by player, and was dissapointed with small amount of clues. This approach seemed great to me, because it requires some intellectual work which is supposed to be done by mages. Later I discovered that game manual has a list of spells, which is much less fun for me. However, since clues will help you discover only about dozen of spells, I sugges you to download official manuals from here

This game has a “rogue” aspect. Monsters can hear you and see you, but you can train to be less noticable. This is not very practical, since this game has lots of narrow tunnels where this skill will not help you. “Lockpick” skill will help unlocking doors without keys, but this skill is also not practical, since almost all locked doors have keys for them. Other means of opening door or chest could be by force (just hit them), however, this will not work for massive doors or portcullis. And, of course, there are magical spell for opening. Skill “search” will reveal secret doors, but it is also replacable with spell.

So, I’m using half-figher half-mage character, and usually don’t train rogue skills. Spells replace all rougue skills very well and provide protection against fire and magic.

I like very much the fact that all the characters in game are like me: humans and humanoids wear armor and fight in same manner as main character. Mages will cast the same spells. So, everything is fair.

So, enjoy the game! Plot and dialogs are stupid, as well as some puzzles, but exploration, fighting and atmosphere is great. Lots of attention have been put into details: there are skulls and broken weapons everythere, fire blasts burn some items, expired “fly” spell turns into “slow fall” spell, dead opponent drops all its items, and lots of other interesting moments. So, it’s very much like living world.

Critique of Java NIO frameworks

April 21, 2008 by kmatveev

Starting from version 1.4 Java has a new library called NIO. Everyone who writes scalable network software uses this library because it supports non-blocking network operations. Without this library, you will have a separate Thread for each connection, and Java doesn’t scale well with threads. Of course, this library is not just about non-blocking network operations. It also accelerates access to files, and has whole architecture for byte-to-char conversion and back.

Non-blocking IO is slightly harder to understand than blocking IO, but not too much. However, there are people who think that NIO is too low-level. They also think that since NIO was created to have a threading model different than “one thread per socket”, so it is a good idea to provide a threading model out-of-box. These people have written NIO frameworks. Let’s name some of them:

All those project pretend that they are “easy to use” because they “hide the complexity of plain NIO”. Is it really so ? I don’t think.

All mentioned libraries try as much as possible to hide selector, channels and bytebyffers from you. Instead, they propose “chain of responsibility” pattern, consisting of diffrent “protocol stages”. Processing on each stage could be synchronious or asynchronious.

Main problem with these frameworks is that they are too generic. They wrap 5 NIO classes with 50 their own classes. Yes, authors have analyzed >20 use cases and have created libraries which supports them all. Now, instead of manually using plain NIO in a specific way you need, you’ll have to analyze how your particular case fits into this framework, then to find out a dozen of places where you need to put pieces of your code so they will be executed in a correct sequence. They provide all the “glue” for you, so just put your “diamonds”, and overall thing will pretend to be an “iron”.

My opinion is that these frameworks should be avoided. Using them makes the code bloated and hard to understand. Better to learn NIO once. Yes, I myself often forget to flip byte buffer after reading from channel, but I prefer plain code to raviolli code. All those libraries are perfect examples of Java developers loving to produce frameworks. And this is just a part of more generic case of OO developers to love design patterns and templates.

And, final advice: if possible, don’t use NIO, use plain IO instead. If you have a single socket, then there are no reason to use Selector. If your thread should wait for the answer, make it blocking on read operation, instead of making it wait on a queue.

Update: subsequent post shows that I’ve changed my preferences from plain IO to blocking NIO. But my dislike to NIO frameworks is still here.