One way messages with federated partner and ID 504 in Communicator

Since the blog I did on Live Meeting troubleshooting I have seen a lot of queries leading people to the site for troubleshooting OCS. I’ve also seen a ton of questions on the subject on the MS forums. All this has lead me to the conclusion that OCS troubleshooting isn’t that easy to get a handle on. With that in mind I’m writing this post on troubleshooting federation. First and foremost, this article is about troubleshooting a mistake I made during a deployment recently, and if you ask any PSS engineer they will tell you 80% of the problems they face with OCS are based on the same thing: human error/configuration error. As I said in my last troubleshooting post, I’m no expert on troubleshooting OCS, but hopefully this post will help someone out there. As always I encourage you to share your stories and methods if you think they may help someone else.

So recently while working on an enterprise edition install of 2007 R2 I ran into an issue with federation.  The issue was I could send an IM from the client, but an attempt to reply from our OCS environment ended with an ID 504 error in MOC.  I just so happened to be federating the client with my own company, so I was able to trace from both sides and find the resolution. 

Since 504 errors are typically routing, firewall or DNS related (boy that really limits it doesn’t it!) I started out with the standard DNS and telnet test.  I could resolve the access edge server appropriately and could also telnet to it from my edge server on 5061.  Since the firewalls on both sides looked good and DNS was doing its job, I started a SIP Stack trace.  I started with the Edge server at my company, as we were the ones who couldn’t communicate, and most likely we would see the errors on our side.

On our edge server I started my SIP Stack trace and attempted to send an IM to my test account in the clients environment.  Keep in mind there is a lot of information in a SIP trace, so you want to be quick about this so you don’t overwhelm yourself with logs.

Here’s how I configured logging:

After the test message was sent and the error was received in MOC I stopped logging and clicked the “Analyze Log Files” button.

I made sure only my SIP Stack was selected and clicked “Analyze”

I followed the path listed in the “Output File” field and grabbed the text file that was created.  Once the log file was on my machine I opened Snooper and examined the log.  Here’s what I saw:

I selected the first red line that was relevant to my conversation with the test contact; a “Server Time-Out” error.  From here I moved one line up so I would get the request right before the error and looked at the information in the right hand column.  Under the “Route” section I see not only the pool name of the customers EE pool, but I aslo see the FQDN of the server.  At this point I realized where my error was.

Since the edge server was behind a NAT it had to be able to resolve the public IPs for the public facing edge services  (Sip., AV., and Meeting.).  Also to protect the network we had not allowed the server to even resolve internal names.  To enable the edge server to talk to the pool I had created entries in the host file.  However, I only created an entry for the FQDN of the pool and not of the individual servers in the pool by mistake.  I added an entry into my host file for the FQDN of the front end server and that corrected my issue. 

Although this won’t cure every 504, hopefully the methods used help shed some light on troubleshooting.

Keep in mind 504’s are usually routing, firewall, or DNS related and its best to troubleshoot them from the end receiving the error.   If anyone is interested I can provide a copy of the log file (names and IPs changed of course).

Advertisements

About Kevin Peters

My name is Kevin Peters.
This entry was posted in Uncategorized and tagged , , , , , , , , , , . Bookmark the permalink.

11 Responses to One way messages with federated partner and ID 504 in Communicator

  1. Rutha Bolles says:

    Thanks for posting, I very much enjoyed your newest post. I think you should post more often, you obviously have talent for blogging!

    • Kevin Peters says:

      Rutha,

      Thanks for reading and for the kind words! I’d love to post more often, but I try to make sure I only post when its something interesting, also I write for the blog outside of my already busy consulting job, and when I can take a break from family life for an hour or two two write the articles. I’ll keep writing as much as I can, you keep reading!

      Thanks for reading!

      -kp

  2. UC Guy says:

    Hi Kevin,
    We are facing the exact issue described here. Also we have tried the resolution as suggested here. However it didn’t work for us.
    We have director in place. Everything is working fine, only federation is not working one way. We are not able to see presence as well as unable to send IM to federated partners. However they can see our presence and can send us the IM.
    Please suggest.

    • Kevin Peters says:

      Hi UC Guy,

      Logging is going to be the trick on this one. The best bet is to setup another environment you have control of (if you can) or work with someone who will log from their side as well and just dig in. Start from the bottom and work your way up. Check the routing at each step as well as running port query or telnet to verify ports in both directions. Make sure all certificates CN’s match the FQDN it is talking to, make sure 5061 TCP is open both directions, etc. If you can’t get it worked out feel free to email me (info on contact page) and we can take a look at it.

      Hope this helps!

      -kp

  3. Rajeev says:

    Hi KP,

    I am facing the similar issue “one way federation” and sip stack logs are directing to an error that

    504 Server time-out
    ms-diagnostics: 1007;reason=”Temporarily cannot route”;source=”Director Server”;ErrorType=”Connect Attempt Failure”;WinsockFailureDescription=”The peer actively refused the connection attempt”;WinsockFailureCode=”10061(WSAECONNREFUSED)”;Peer=”Internal edge interface”

    Can you suggest what can we looked apart suggested. All telnet, DNS are through and ofcourse communication between LAN and WAN users is fine. However limited external calling error is appearing in the OC.

    Thanks
    Rajeev

    • Kevin Peters says:

      Hi Rajeev,

      It appears there may be a firewall issue or the servers don’t trust each other. That is usually when you see that error. I can’t give you much more guidance than that without knowing the environment, but I would look hard at networking configurations, certificates and trusted servers lists.

      Hope this helps!
      -kp

  4. Victor Gameros says:

    I am having one way communication as well, they can communicate with us we cannot talk with them. The trace show TLS errors:

    Text: The connection was closed before TLS negotiation completed. Did the remote peer accept our certificate?

    Their environment is OCS R2 and we are running Lync. I read something about enhanced federation, is this something that needs to be set on the OCS side

    • Kevin Peters says:

      Hi Victor, You do not need to configure Enhanced federation in this case. Please check that both companies are using public certificates, it’s most likely that they don’t trust your certificate (as the error says).

      HTH

      -kp

  5. Rajeev says:

    Hello KP,

    We have a new issue now regarding federation. Our federated partner is able to see our presence and can initiate IM session. Once session is on presence of federated contact is still offline also the globe sign (used to represent federated partner contact) is missing.

    We only see presence offline and cannot initiate IM. We can only reply once session is initiated by partner.

    They are using LYNC and we are OCS 2007 R2. Can you suggest something we can check. This is happening only with few users at other end.

    Thanks
    Rajeev

    • Kevin Peters says:

      Rajeev,

      Sounds like the issue is on their end, that is why you can only reply to them and not initiate (typicall this is DNS or FW for them internally). You can get a SIP stack of a user trying to get presence for them, or even a client uccapi log from one of your clients to help them determine the issue. It would be good if they also have a sip stack log on their side. My guess would be 1 of 2 things:
      1. They don’t have DNS entries for their pool name that the edge server can read – test by pinging the pool name from the edge server (also good to ping each FE server by name)
      2. A firewall configuraiton is prevening the edge pool from reaching the FE pool – tst by telneting the pool name and all FE servers by name on port 5061

      HTH

      -kp

      • Victor Gameros says:

        I experienced the same issues on our end. we were able to communicate with users only if they started conversation. The resolution was on the Global-properties of the topology builder I had to select Site Federation route assignment and select-enable and point to director server. After this change you need to publish the topology.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s