Archive for November, 2008

SIP stories, part 3: INVITE retransmission

November 6, 2008

A second use case which broke after I’ve implemented strict dialog matching was a case of INVITE retransmission. When UAS immediatelly answers to INVITE with “200 OK” response, it terminates server transaction. If this response will not be delivered to UAC, then UAC will retransmit an INVITE. However, UAS will treat retransmission of INVITE as a new request, because the server transaction is terminated, and the INVITE doesn’t have a tag in “To” header. This is another serious flaw in RFC 3261.

As with previous case, I’ve used custom matching of INVITE against dialogs, based on Call-ID, tag of “From” header and URI of “To” header. It was a dirty hack, but it worked. I’ve looked at NIST SIP stack and Sailfin to see how they handle this problem, and it seems that these stacks have not solved it. I couldn’t believe that I alone have encountered this problem in RFC 3261.

After some search I’ve discovered a document which addresses both this and previous problems. It’s a  internet draft which may later become another RFC. It’s a first document which proposes fixing RFC 3261 instead of extending it. The main idea is to change state machines for INVITE client transactions and INVITE server transactions. So, if this draft will turn into RFC, then hack used in Sailfin will not be a hack after all, because it is very similar to approach proposed in this draft.

As for me, I totally agree with state machine extension. After some analisys I’ve came up with a situation when a proxy has the same problem with several succesfull responses on an INVITE as UAC has. So, the problem should be fixed at a common layer of UA and proxy, which is transaction layer.

I’ve already implemented the approach of that draft, because I believe that having working solution is better then following broken RFC 3261. I can’t believe that someone will rely on broken behaviour and will demand SIP stack to follow it. This new solution is nicer than my previous hacks because I have cleaner code. I don’t even have a method which matches responses against dialogs, because it is not nesessary anymore, and it can give better performance. As a drawback there is a longer lifetime of transactions, which consume memory. But I think it is acceptable.

If my opinion matters, I totally support this internet draft and wish it becomes an RFC. My big “thank you” goes to Mr. Robert Sparks who did it. I just wander why this internet draft was published only recently ?

That’s the end of the long “dialogs, forking and races” story. I hope someone will find it useful. If I’ll encounter some other problems with protocol, I’ll surely share it in another article of “SIP stories”.

SIP stories, part 2: dialog forking

November 5, 2008

When I’ve changed a way my SIP stack deals with dialogs, I’ve encountered with a problem that two unit tests stopped working. These unit tests implemented “corner cases” which are not very well described in RFC 3261, so I implemented them based mostly on my own understanding of SIP. I had two options: to claim that those use cases are wrong and remove them, or to fix my implementation. Obviously, I’ve started with use case analisys.

When “200 OK” response is sent on INVITE, corresponding client and server transactions are instantly terminated on UAC, UAS and all proxies involved. UAS should retransmit “200 OK” until it will receive ACK, but ACK may go in a different path than INVITE. Since reliable delivery of “200 OK” response is specific for UA elements, it is impelemented not in transaction layer, but in UA core of TU layer.

Because of proxy forking, a sender of INVITE can receive dialog-establishing provisional responses from several UASes. When one of them answers with “200 OK” response, the proxy must cancel all other branches. However, some UAS may also respond with “200 OK” before receiving CANCEL. When this second “200 OK” response will arrive at UAC, a client transaction for INVITE will be already terminated. But this response should be delivered to upper layers. RFC 3261 specifies (in chapter 13.2.2.4) that such responses should be matched against ongoing dialogs, and if no matching dialogs are found then new dialog must be constucted. Matching of a response against a dialog has a purpose: to cut off retransmissions of “200 OK” responses. A logic is simple: if response matches a dialog in confirmed state, then it should not be reported to application layer. However, all this idea about matching responses against dialogs has lots of flaws:

  • Described logic means that for any “stray” response UA must construct a dialog, then pass it to application layer. There are no means to check if response corresponds to request actually sent from this node. The idea is that only responses for sent requests should be processed, and processing should stop some time after receiving first “200 OK” response. But there are no means to ensure that.
  • Application is usually interested in context of a response. For example, it may be interested in knowing a request for which response was received. For normal responses a request could be obtained from client transaction. Descibed procedure doesn’t explain how this could be solved.
  • re-INVITEs are sent within existing dialogs. Retransmitted “200 OK” responses on re-INVITE will be always reported to application.

It is clear that RFC 3261 has a big flaw here. In my old implementation I had a workaround. Before sending an INVITE I’ve prepared a “dialog prototype”. All “stray” responses were checked against dialog prototypes by comparing “Call-ID” header, “From” header and URI of “To” header, and if there were a match then I’ve created a dialog based on a prototype. A prototype also held a reference on INVITE, so I could provide a full context for response. This homebrew solution worked, but I firmly decided to follow RFC 3261 to a letter. Searching for ideas I took a look at open source Java implementations of SIP stacks.

NIST implementation of JAIN SIP API matches “stray” responses against ongoing dialogs based only on RFC rules. If some early dialog matches a response, then initial transaction of this dialog is used as a context for response. If response doesn’t match existing dialog, this response is just passed to application layer. So, in this part NIST SIP stack is not compliant to RFC 3261. Some may argue that application can implement this functionality, but I don’t buy it: since there is no a context for response (request is unknown, transaction is null), an application can do very little.

Sailfin implements an interesting hack. They don’t actually terminate client transaction after receiving “200 OK” response. Instead, they transition a transaction into “established” state. This both helps to ensure that response is received for request which actually was sent, and also provides a context for a response at upper layers. Such approach seems to be a very good idea, except that it is not RFC 3261-compliant.

That’s all for today. As I promised to M.Ranganathan, I’ve pointed out problems with RFC 3261 and JAIN SIP API.  In upcoming article I’ll tell about second use case, how I’ve solved these problems, and why Sailfin’s hack is not a hack after all.

SIP stories, part 1: dialogs

November 5, 2008

This story is a long one. It tells about many things which I learnt about SIP lately. Maybe it will be useful for other fellow developers. It will be told in several subsequent articles. Here is a first part.

I must admit that for a long time I understood a concept of SIP dialog incorrectly. I correctly understood that the only purpose of dialog is that, once established, you can send subsequent messages for them. So, for end-points they provide session mechanism, and intermediate nodes can work statelessly. However, I wrongly believed that requests within a dialog could be sent only after succesful response on INVITE (or SUBSCRIBE). In other words, I thought that in-dialog requests are possible only for confirmed dialogs. Thus, I was wandering: why early dialog is needed? Later I added reliable provisional responses to my list of means of dialog confirmation, thus solving the problem of PRACK and UDPATE.

In a process of optimizing memory usage I’ve decided to shorten lifecycle of a dialog by removing early state at all. Fortunatelly, before doing this I sat and read very carefully about dialogs once more. And my point of view changed dramatically.

Dialogs in a present way were introduced because of just one protocol feature: proxy forking. Without forking life whould be much easier: INVITE whould start a dialog. But with forking, each recepient of INVITE should distinguish itself for sender by providing tag in “To” header. So, each end-to-end relationship (which is a definition of a dialog, by the way) could be defined only after first response with tag in “To” header. Tag for “From” header was added just for symmetry.

If a recepient of INVITE has answered with provisional response containing tag in “To” header, then later it must supply exactly the same header in other provisional and succesful responses, because otherwise sender will think that those responses come from different UAS. When answering with error response a tag is not nesessary, because error response terminates all dialogs started for INVITE, no matter from which UAS it was sent.

Thus, all components used for matching request against dialog (Call-ID, and tags) will not change after first response, even if this response is a provisional one. It means that (contrary to my belief) there is no problem in sending in-dialog requests for early dialog. Then, what is the difference between an early and a confirmed dialogs ? Here they are:

  1. Early dialog will terminate when error response on INVITE will be received. Confirmed dialog is terminated by sending or receiving BYE request
  2. When transitioning from early state to confirmed state, a route set can be re-computed. In confirmed state, route set is never re-computed.

After I’ve learnt all these things, I’ve changed my implementation. The code became much clearer, because I’ve removed all methods like “matches()” which checked responses and requests against dialog in some custom (and incorrect) ways. The only correct way of retrieving a dialog is by comparing dialog ID calculated from contents of the message with dialog ID of internal representation of dialog. HashMaps work perfectly for this task.

However, the changes I’ve made had a very interesting consequences.