Lync Backup Service Related Cmdlets Fail

While working on a Lync Server 2013 deployment I ran into an issue with the commands related to the Lync Backup Service:

  • Invoke-CsBackupServiceSync
  • Get-CsBackupServiceStatus

When I ran them I would receive the following error

You don’t have required permission to perform Windows Communication Foundation (WCF) call to backup service instance on computer

After doing some digging I found that the account I was using had been inadvertently been removed from the “RTCUniversalServerAdmins” group.  Once I added the account back and logged out and back in I could now run the commands.  Interestingly enough, if I checked the RBAC roles this command has been assigned to I see “CSAdministrator” and “CSServerAdministrator”, but having those permissions didn’t allow me to run it.  When I checked the technet information for the cmdlets I see RTCUniversalAdmins was required for the Get-CsBackupServiceStatus but not the Invoke-CsBackupServiceSync cmdlet (they both actually need it).

I have emailed the Lync Documentation team for more information and will post back here with any updates.

Posted in Uncategorized | Tagged , , , , , , , , | 1 Comment

Deploying Lync 2013 – Part 2

Today we’ll pick up where we left off in Part 1.  The focus for this part of the article is to add a second Lync Standard Edition Pool for disaster recovery purposes. When we finish the environment, will have two Lync Standard Edition Pools in two different sites as shown below.

Although this pools main purpose is disaster recovery, we can still use it to provide Lync services for our users.  If possible I’d recommend splitting your users between the two pools evenly, that way only half of the users are impacted by an outage, and you know the “backup” pool is working at all times.  This type of deployment is commonly referred to as “Active/Active”; meaning both servers are servicing users and managing some of the workload.  I prefer “Active/Active” to “Active/Passive” (where one pool is just waiting for workload) because you are getting some use out of the hardware and you know it is working if you need to fail over to it.

When you plan for “Active/Active” scenarios, it is critical to scale appropriately.  This means that each server/pool should be scaled to support all of the users at any given time.  This allows you to provide appropriate resources in the event of a failover.  If you’re deploying Enterprise pools, you also should scale to at least N+1.

Now that we’ve covered all of that, we’ll start by installing pre-requisites on our new server.  We’ll use the same command as before and then reboot the server:

Add-WindowsFeature  MSMQ-Server,MSMQ-Directory,Web-Server,Web-Static-Content,Web-Default-Doc,Web-Scripting-Tools,Web-Windows-Auth,Web-Asp-Net,Web-Log-Libraries,Web-Http-Tracing,Web-Stat-Compression,Web-Default-Doc,Web-ISAPI-Ext,Web-ISAPI-Filter,Web-Http-Errors,Web-Http-Logging,Web-Net-Ext,Web-Client-Auth,Web-Filtering,Web-Mgmt-Console,Web-Asp-Net,Web-Dyn-Compression,Web-Mgmt-Console,Windows-Identity-Foundation,rsat-adds,telnet-client,net-wcf-http-activation45,net-wcf-msmq-activation45,Server-Media-Foundation

Once the server comes back up, we’ll need to log on and start the Lync install; we won’t jump into Step 1 though.  Our first step will be to run the “Prepare First Standard Edition server” option.  Although this isn’t our first SE server, we still need to run this to create the appropriate SQL instances for our new pool to host the CMS.  If you forget this step, adding it later isn’t much fun, so please make sure to run this first if you plan to host CMS on a Standard Edition server.

Next we’ll add our new site and new server in Topology Builder and pair it with the existing Front End.  I won’t go through everything step by step, but for the highlights, we’ll begin by clicking “Lync Server” at the top of the topology and then going to “Action”>”New Central Site

When you finish creating the second site, the “New Front End Pool” wizard will open and you can run through the steps to create your new pool.  Tip: Don’t forget to update your “External Base URL” to a public name that doesn’t match your pool name.

Once the pool is complete we can right click on either pool in our topology and choose “Edit Properties”

Under the “Resiliency” heading, we will check the box for “Associated backup pool” and select the server in the opposite location.  We’ll also want to check the “Automatic failover and failback for Voice” option.  This will populate the failure detection and fail back intervals with default values.

These values tell Lync how long the opposite pool should be down before users are allowed to register against their backup registrar (5 minutes/300 seconds), and how long their primary pool should be up before they can fail back.  In most scenarios I shorten the failure detection interval so voice services are restored more quickly.  The minimum value you can put here is 30 seconds (which is what I typically use).  I normally leave the default for failback so if a server is in some type of crash loop, users don’t get moved back to it before the Lync services can be disabled.

Once all the settings above are configured as desired, we can publish the topology and begin our server build on our new Standard Edition Front End Pool FE2.ocsguy.local.  If you happen to open your “To Do” list after publishing you will see that you need to run local Setup (Step 2) on the first front end, as well as all steps on the new front end.  Also, once you complete those steps you should run the Invoke-CSBackupServiceSync command against both pools to force an update.

I won’t walk through the rest of the setup steps on FE2 since they are the same as the last article with the normal “Next, Next, Finish” dance.  The one thing you may notice is the “OAuth” certificate will already be present on your new Front End in Step 3, which is expected.  Just make sure to run Step 2 on your first Front End as new services and database settings will be pushed to it during that step.

One other thing of note, we’ll want to create a second A record for SIP.ocsguy.us pointing to our new server.  This will allow clients to use DNS load balancing in case of a pool outage.

Now on to the real fun…

Lync Server 2013 added data replication for user information to the DR procedure. This means when you checked the “Associated backup pool” box a few steps back, that the server you selected as a backup will get a copy of the other server’s persistent user data.

Assuming all of the steps for the installer have been run on both servers, and the Invoke-CsBackupServiceSync commands have run, we can now test our failover. Prior to doing that though, I’m going to check my pools to verify they show in sync with the Get-CsBackupServiceStatus command:

I ran this command:

Get-CsBackupServiceStatus -PoolFqdn fe1.ocsguy.local | Select-Object -ExpandProperty BackupModules

As you can see from the screenshot above the data replication is working.

To test, I begin by turning of FE1.ocsguy.local.  At this point all users who were hosted on FE1 will disconnect for 30 seconds and then see the limited services due to a server outage banner in the Lync client.  Calls will still process to and from users, but the users contact list will disappear and meetings hosted on FE1 will be down.

Now I RDP into FE2.ocsguy.local, open the Lync Management shall and Invoke Management Server Failover:

Invoke-CsManagementServerFailover –BackupSqlServerFQDN fe2.ocsguy.local –BackupSQLInstanceName RTC –Force

This will prompt us to continue.  I type “A”, and then press Enter:

That moves the CMS to the backup server.  If we had an edge server and the failed pool was the next hop for it, we would need to run the Set-CsEdgeServer command to change it’s next hop to the backup pool (more info here in step 1)

Next I run the “Invoke-CsPoolFailOver” command to start the disaster recovery

This command will look like this:

Invoke-CsPoolFailover -PoolFQDN fe1.ocsguy.local -DisasterMode -Verbose

You will be prompted to continue.I typed in “A” and hit Enter.

As long as there aren’t any errors you’re now failed over.  Next we need to update our internal simple URLs (meet.ocsguy.us and dialin.ocsguy.us) so users can join meetings again.

Last but not least I verify my Lync client can sign in from a Windows 8 machine without the red banner.

Once the problem is resolved and our original pool is back online, we can begin the failback process.  I’ll start by moving the user services back with the command below:

Invoke-CsPoolFailBack -PoolFqdn fe1.ocsguy.local

You will be prompted to continue.I typed in “A” and hit Enter.

You will see lots of information in the Lync Management Shell. Verify there are no errors (red text) and if there are, review logs to determine the cause.

Once all the users are moved back we can leave the CMS on FE2 as there is no real reason to move it back.  I have sent an email off to the Lync documentation team asking for more information on this and hope to have some more detail soon.

Summary:

I want to say a little about the process behind all of this before I wrap this part of the article up.

First of all, you can only pair like pools, meaning Standard Edition can be paired with Standard Edition, and Enterprise Edition can be paired with Enterprise Edition.  However you cannot pair a Standard Edition pool with an Enterprise Edition Pool.  I’m not sure if Topology Builder will stop you from doing this, but I do know the product group didn’t test it so it isn’t supported.

Second, pairing is now reciprocal, meaning a pool can be paired only with one other pool, and that pairing is two-way.  In our case this means FE1 is paired only with FE2, and vice versa. They cannot pair with any other pools.

Third, there is a new service installed once we a pair pools named “Lync Backup Service” that replicates persistent data from one pool to another.  This is what allows things like the contact list, meeting information, and call forwarding settings to be restored to the backup pools in the event of the primary pool loss.  If the pool hosting the CMS is being paired, then the CMS will also be replicated to the reciprocal server.  This means you can failover the CMS as well in the event of a pool or datacenter loss.

Since I’ve started talking about disaster recovery, I might as well mention Recovery Time Objective (RTO) and Recovery Point Object (RPO).  A RTO of 15 minutes means that all services will be restored within 15 minutes of a disaster being declared, and recovery work starting.  A RPO of 15 minutes means the data that will be restored is no more than 15 minutes old.

Lync Server 2013 has a target RTO and RPO of 15 minutes for a pool with 40,000 concurrent users (this would be Enterprise Edition).  This means we should expect to have a similar RTO and RPO for a Standard Edition deployment.  Keep in mind the clock on the RTO doesn’t start until a disaster is declared and administrators can start fixing the problem.

That’s it for part 2, stay tuned for part 3 this coming week.

Posted in Uncategorized | Tagged , , , , , , , | 6 Comments

OCSGUY QuickUI Updated for Lync 2013 on Server 2012

Hi All,

I’ve updated the OCSGuy QuickUI to install pre-req’s for Lync 2013 Front Ends on Server 2012.  You can download from the scripts center or here:

http://bit.ly/o5w0h2

Enjoy!!

(Remember if you borrow the code it is polite to give credit where credit is due).

Posted in Uncategorized | Leave a comment

Welcome to Lync 2013 – Part 1

As I’ve done with Lync 2010 and OCS 2007 R2 I’ve decided to write an article on how to deploy Lync 2013.  Like in previous years this will be a multi-part post, this being part 1.  However; unlike the previous articles I want to call out some things ahead of time:

  • This article covers doing a Standard Edition deployment for a small or medium size deployment (under 5,000 users)
  • I’ll cover having redundancy in the environment, it is assumed that this redundancy is provided by having a second location with equipment in it and an MPLS connection to the primary location.
  • The redundancy focus is specifically around Lync functionality, I won’t be covering how to plan DR or HA for other services/servers
  • It is assumed you will have PSTN and Internet connections at both locations
  • This series will cover IM, Presence, Voice Resiliency, Conferencing, and Unified Messaging
  • Because I don’t have enough subnets in my lab all servers will share the same subnet excluding edge.
  • Edge servers will utilize 1 DMZ interface and 1 Internal interface (2 NICs).  If you are deploying this to your production environment both edge interfaces should be in separate DMZ subnets from each other and all Lync servers and clients.  I’ll cover this more in the Edge section of the article

We’ll begin with a quick overview on the deployment.  This is a lab deployment containing 2 domain controllers (1 per location), 2 Lync Standard Edition servers, and 2 Lync Edge servers.  All servers in this deployment are utilizing Server 2012.  An Active Directory domain named “ocsguy.local” will hold all user accounts and the public SIP and SMTP domain will be “ocsguy.us”.

For server hardware and software specs on Lync 2013 please review the specifications on technet here:

http://technet.microsoft.com/en-us/library/gg398835(v=ocs.15).aspx

and

http://technet.microsoft.com/en-us/library/gg398588(v=ocs.15).aspx

So now that all of that is out of the way, let’s start with the fun stuff.  I’ve downloaded the Lync 2013 RTM bits and placed them on what will be my first front end server (FE1.ocsguy.local).  I’ll being by logging into the Lync server with an account that is a member of “Domain Admins”, “Enterprise Admins” and “Schema Admins”.  Next I install the pre-reqs, I used the command below from an Administrative PowerShell window:

Add-WindowsFeature  MSMQ-Server,MSMQ-Directory,Web-Server,Web-Static-Content,Web-Default-Doc,Web-Scripting-Tools,Web-Windows-Auth,Web-Asp-Net,Web-Log-Libraries,Web-Http-Tracing,Web-Stat-Compression,Web-Default-Doc,Web-ISAPI-Ext,Web-ISAPI-Filter,Web-Http-Errors,Web-Http-Logging,Web-Net-Ext,Web-Client-Auth,Web-Filtering,Web-Dyn-Compression,Web-Mgmt-Console,Windows-Identity-Foundation,rsat-adds,telnet-client,net-wcf-http-activation45,net-wcf-msmq-activation45,Server-Media-Foundation

Once this completes, restart the server and then we can start the Lync install, we’ll start by going to the “Setup\amd64” directory on the install media and running setup.exe

As usual you will be prompted to install Visual C++, click Yes here (unless you don’t want to install Lync )

Once the install completes you will see the Core Components Installer, click Install here:

Next we’ll accept the EULA (make sure to read it) and click OK:

And away we go

Once this installer completes the “Lync Deployment Wizard” opens and the real fun begins.

I like to start by clicking “Install Administrative Tools”

Next I’ll click “Prepare Active Directory”.  I won’t bore you with all of those screen shots, you’ll just run steps 1, 3 and 5 – each one requires you to do a version of the well-known “Next, Next, Finish” dance.  The other steps are just waiting for replication.

Now before we go any further we’ll need to make our administrator account a member of the “CSAdministrators” and “RTCUniversalServerAdmins” groups:

For this to take effect we’ll need to signoff of the server and back on.

Once we’ve signed back in we’ll launch the “Lync Deployment Wizard” again and choose “Prepare First Standard Edition Server”- This time we only do the “Next, Finish” dance.

The wizard will take a few minutes to complete (you’ll notice it taking a while during the SQL parts).  It may be a good time to sip on more of that beverage or even get a refill.

Assuming you see “Complete” as shown in the screenshot above we can move on.

Next we’ll create a folder named “Share” on the C drive and share it out.  This will be used for our file share for this pool/server:

The default permissions will work for this as the Lync Topology Builder will update them with the appropriate settings when we publish our first topology.  Speaking of Topology Builder – now is the time to open it, since we have our shared folder ready we can start on our first topology.  When it opens click the radio button for “New Topology” and click “OK”

You’ll be prompted for a file name for the new topology file, I recommend using a descriptive name and storing all copies of topology in one place:

Now the “Create New Topology” wizard appears and we can start defining our topology, we’ll start with the primary sip domain “ocsguy.us”

We won’t be entering any other domains for this article so we can click next on the “Specify additional supported domains” box (Note – no screenshot shown for boxes that don’t require us to enter information).  Then we’ll define our site:

Specify the site details and click Next:

Now we’ll click “Finish”, notice the “Open the New Front End Wizard when this wizard closes” box is checked for us.

On the “Define the New Front End Pool” page we’ll click Next and be presented with options for our first pool.  Since we are deploying a Standard Edition pool select “Standard Edition Server” and put in the FQDN of our first Front End server (fe1.ocsguy.local) in the “FQDN” box and click Next

We’ll be enabling: Conferencing, Dial-in (PSTN) Conferencing, Enterprise Voce and Call Admission Control

We’ll leave the “Collocate Mediation Server” box check and click next (not shown)

For now we won’t be deploying edge so leave the “Enable an Edge Pool…” box unchecked and click Next (not shown)

SQL information is already populated for us so we can click Next (not shown)

Also, because we used the default file share name of “share” we can click next on the “Define the file store” screen (not shown).

Under Specify the Web Services URL I recommend using a different name than the server name under the “External Base URL” field.  I typically add “extws” for “external web service” to the pool name here, this was required when you were doing mobility in 2010 so I’ve stuck with this on the 2013 deployment.  Also, since our pool name is an internal domain, we’ll need to change that to an external FQDN, I’ve used “fe1extws.ocsguy.us”

For now we won’t be deploying an “Office Web Apps” server so uncheck that box and click “Finish”

Now we’re ready to publish our topology so we can choose Action>Topology>Publish in Topology Builder

A little more of the “Next,Next,Finish” will happen, notice that during the wizard we will see our Front End server “fe1.ocsguy.local” automatically selected as the “Central Management Server”

You’ll see a bunch of text in the “Publishing Topology” box, as long as there is no Red/Errors the topology will be published and we can open the to do list by clicking on the “Click here to open to-do list”.

Our “To-Do” list tells us to setup our simple URLs and run setup on our new server fe1.

We’ll start by running the Lync Deployment Wizard and choosing “Install or Update Lync Server System”

We’re going to run through steps 1 through 4 in order.  Choose “Run” under Step 1 (it will be the only one not greyed out).

On the “Install Local Configuration Store” screen we’ll leave the defaults and click Next

We’ll see the install text fly by again, slowing down for all the SQL services of course (good time to sip on some more of that beverage)

Once it completes click Finish

Now we can click “Run” under Step 2, click “Next” under “Setup Lync Server Components” and the next phase of the install begins.  This will take more than a few minutes so be patient…

Now on to Step 3, we’ll start by requesting the “Default Certificate” which is used for most of the Lync services

When prompted choose “Send the request immediately” and click Next, a CA from your environment should show up in the drop down on the next screen

If your CA is there click Next

You shouldn’t need to specify alternate credentials or alternate certificate templates (unless your organization has special requirements around this) so click Next 2 more times

Now give your certificate a friendly name, I like to make sure it is something meaniful like “FE1 Lync Pool Cert” and I usually mark the certificate key as exportable just in case I need it for troubleshooting later.

Now put in your organization info and location info (not shown)

Make sure to place a check next to your SIP domain in the “Configured SIP domains” box, this adds sip.ocsguy.us to our certificate

Finish through the wizard by pressing Next a few more times and the certificate will be requested

The next screen will allow you to assign the certificate you just requested (Make sure the “Assign this certificate” box is checked).  Couple more “Next, Next, Finish” presses and our certificate is assigned.

Next we’ll request our OAuth certificate, this will be used for server to server communication for things like Exchange and SharePoint.

For the friendly name I typically use something like “OAuth Cert”, something important to note: This certificate will replicate to all of your Lync servers, so you will only have to request it once.

Notice above the only name on the cert is the sip domain. All other fields you can leave defaults and your settings for organization and location will be populated from the last request.

Once the request completes we can assign it automatically just like we did with the last one.

Now we can run Step 4 to start our Lync services. This is another “Next, Finish” type of thing so I haven’t included screenshots.

Once the Service start completes you can open the Services console (services.msc) and verify everything start with “Lync” is running – “Lync Server Front-End” may take a few minutes to go from Starting to Running.  If it doesn’t start check your Event Viewer (eventvwr) for errors.

Now that the Lync install is complete we can move on to DNS.  For this portion of the deployment we will create the following A records in the internal copy of our “ocsguy.us” zone (Note these are the internal IPs and will only be used by internal clients):

A Record IP Address
Sip.ocsguy.us 10.255.106.81
Meet.ocsguy.us 10.255.106.81
Dialin.ocsguy.us 10.255.106.81

We will also need to create the following SRV record:

This record will be used by our Lync clients for sign in.

Now that DNS is ready, we’ll head over to the Lync Control Panel and enable our first user.  You will likely be prompted to install SilverLight (if you hadn’t done so earlier).  If you already have Exchange deployed and an email address policy you can allow the SIP Uri for the user to be based on that, otherwise you’ll want to create the SIP Uri as I have below:

Last but not least, since everything is ready all that’s left to do is sign in from a domain joined machine.

Sign in is now complete and we can call it a day.  Check back next week for part 2, adding our second front end server and configuring pairing.

H/T to Pat Richard for pointing out a typo above.

Posted in Uncategorized | Tagged , , , , , , | 26 Comments

Stare and Compare Utility

While working with a customer I found the need to regularly compare different client, voice and conferencing policies.  This meant a lot of time spent staring at 2 policies to see what was different.  Since I didn’t feel staring at 2 different policies and trying to find the difference was very useful, I create the new tool StareCompare.  This tool allows you to select a type of configuration within Lync and compare the settings for that configuration.

For example, if you would like to see the different between 2 different Voice Policies you can run StareCompare.PS1 and choose “Voice Policy”

Select the first policy you want to compare (in my case Global)

Then select the second policy you want to compare (Tag:Tag Policy)

StareCompare.PS1 then uses the compare-object cmdlet to compare the items, display the differences on top, and then items that are equal on the bottom.

This is a first rev with a limited number of tests, if you would like to see additional items to compare added to StareCompare please comment below and I will add them if I can in upcoming revisions.  The utility will also do text files so you can use it to look at those as well

***Hint if you only see “==” under the side indicator everything is the same***

Download Here

Enjoy!

Posted in Uncategorized | Tagged , , , , , | Leave a comment

Lync Phone Edition: Connection to Microsoft Exchange is unavailable

Today I’d like to talk about certificates and a problem that seems to be becoming more and more common with Lync Phone Edition.  The problem is that Lync Phone Edition devices that are tethered receive the message “Connection to Microsoft Exchange is unavailable. Please contact your support team”.

The most common cause of this error I have seen is having the Lync pool certificates issued from a different Certificate Authority (CA) than the Exchange server certificates.  This problem has become more and more common as organizations move to using certificates from Public CA on their Exchange servers internally.  The root of the problem is the Lync Phone Edition devices will not trust a CA other than the one that issued the certificate to the Lync pool it is connecting to.

To reproduce this issue I have configured a lab environment as follows:

Lync Environment:

  • Enterprise Edition Pool: lyncpool.lyncguys.com
  • Certificate Issued By: Internal Certificate Authority (Lyncguys-LG-DC-CA)
    • Common Name: Pool FQDN (lyncpool.lyncguys.com)

Exchange Environment

  • Single Exchange 2010 Server: LG-Ex.lyncguys.com
  • Public and DNS name: mail.lyncguys.com
  • Certificate Issued By: DigiCert
    • Common Name: Mail.lyncguys.com

I created a test account (kevin@lyncguys.com) and signed in via a tethered CX600 (Aries) phone.  As expected, I immediately saw the error when I tried to access the calendar.

*Note – A Lync Phone Edition device (other than CX700/Tanjay) will require the tethered connection between the PC/Laptop and phone to authenticate to Exchange and view the calendar.  Exchange does not support Pin Auth currently, so this article does not apply to devices that are not tethered.

To work around the limitation in Lync Phone Edition there is a Lync Management Shell command that will allow you to add the public providers root certificate to the Web Services Configuration on the Lync servers.  The command requires you to know the “thumbprint” from the root certificate.

To find the thumbprint, open Outlook Web Access (OWA) from an internet browser, click on the “Lock” icon in the browser and choose “View Certificates”:

 The certificate your Exchange server is using will be displayed:

Next, click on the “Certification Path” tab at the top of the certificate window, click on the top certificate in the list and choose “View Certificate”.

Now click on the “Details” tab and scroll down to “Thumbprint”.  Highlight the thumbprint and press CTRL+C to copy it to your clipboard.

Open notepad or another text editor and paste the thumbprint into the editor.  Next, remove all spaces from the thumbprint as shown in the screen shot below.

Take the thumbprint without any spaces and copy it into the command below, then run the command from Lync Management Shell:

$cert = new-cswebtrustedCACertificate -thumbprint “‎Thumbprint_Here” -castore TrustedRootCA

i.e:

Verify that pasting the command into the Lync Management Shell did not add a “?” to the beginning of your command (shown below):

Now that the certificate information has been stored as a variable ($cert), run the following command to add the certificate to the Web Service Configuration for the Lync servers:

set-cswebserviceConfiguration -trustedCACerts @{Add=$cert}

To verify the command completed successfully run the command:

Get-CSWebServiceConfiguration, the thumbprint of the newly added certificate will appear in the “TrustedCACerts” list.

After this process is complete, reboot the Lync Phone Edition devices and verify the calendar is functional.

*Tip – In my lab the intermediate certificates for DigiCert were not installed correctly on the Exchange server causing the error to still display.  To correct this issue download the DigiCert Certificate Utility and run it on all Exchange servers in the CAS array to verify the certificate chain is installed correctly. *

One other thing to note, there is no command to remove a single certificate from the “TrustedCACerts list in the Lync Web Services Configuration.  However, you can use the replace option with the Set-CSWebServiceConfiguration command to add a new CA Certificate to the store and remove all others.

i.e.

$cert = new-cswebtrustedCACertificate -thumbprint “‎Thumbprint_Here” -castore TrustedRootCA

set-cswebserviceConfiguration -trustedCACerts @{Replace=$cert}

If you need to remove all CA Certificates from the store and would not like to use a new one, you can use the command “Remove-CSWebServiceConfiguration” which will set all of the Web Service Configuration back to default.  This will remove all configured settings for the Web Service Configuration, so if you have modified any other settings, you will have to update them again after running this command.

H/T to My co-worker Randy Wintle for his help tracking this down

Reference Microsoft Article

Hope this helps!

Posted in Uncategorized | Tagged , , , , , , | 16 Comments

Push Notification Fails with a 504 Server Time Out

While troubleshooting push notification failure issues with a client, I found an interesting problem.  The client had already configured the SRV record as required (http://blog.ucmadeeasy.com/), and disabled the URL filtering as required (http://support.microsoft.com/kb/2664650), but push notification was still failing with a 504 error code.  To take it one step further, we completely disable all IM filtering just in case.  However, we still received a 504 error (server timeout) from the Push service.

As background, the Push Notification Clearing House (PNCH) runs in the office 365 Cloud using Lync Edge servers and dynamic federation.  For more information on the 3 types of federation, refer to the article I wrote here: http://ocsguy.com/2011/04/20/a-few-words-on-federation/.

We were unable to troubleshoot the issue from the Office 365 side, so I decided to reconfigure my company’s Edge server with dynamic federation (it was configured with direct federation) and see if I could find any errors related to the customers configuration.

I began by removing the customer’s domain information from the Federated Domains tab within the Lync Server Control Panel in my Lync environment.  Next I signed into a test account on the customers Lync environment (jsmith@contoso.com) and attempted to IM an account in my environment (kevinp@tailspintoys.com).  The IM failed immediately and I began reviewing the UCCAPI log from my client and the SIP Stack logs from my Edge servers.

It didn’t take long to find the 504 error in the logs, including some useful diagnostic information:

In the “ms-diagnostics” line we see “No match for domain in DNS SRV results” followed by the domain name (contoso.com) and the A record usim.us.contoso.com.

The problem lies within the A record the Federation SRV record is using.  It doesn’t match the SIP domain.

Now you may be thinking “they are both contoso.com”, and if you are, you are not alone!  The catch however, is there is a subdomain in the A record (usim.US.contoso.com) that does not exist in the SIP domain.  This causes a failure to match the SIP domain with the SRV record.  Since they don’t match, you would have to use Direct Federation instead of Enhanced or Dynamic Federation to federate with this organization.  This would seem to be an easy fix, but since Office 365 only supports Dynamic Federation, the fix is a configuration change on the customer side.

To resolve the issue we created a new DNS A record to use for Federation (sip.contoso.com).  We also updated the Access Edge certificate to include this name in the SAN field. Once these steps were completed, Push Notification began working on the mobile clients.

Lesson learned: As a best practice, make sure you’re DNS A records for Federation don’t have a subdomain unless your sip domain does as well.

Posted in Uncategorized | Tagged , , , , , , , | 4 Comments