Archive for the ‘Uncategorized’ Category

Office Communications Server Deployment, Day 2

Tuesday, May 13th, 2008

Edit (21 May 2008):

Apparently encryption algorithms > SHA 1 will prevent any Server 2003 or less, or Windows XP or less, machine from obtaining a certificate.  I implemented my root CA with a SHA512 hash algorithm and my subordinate CAs with a SHA256 hash algorithm.  I now get to redeploy the entire PKI.

6:41 AM : Back to Work

I actually got here over an hour ago, but have been catching up on my e-mail and such.  It’s almost 7am, and I’m ready to tackle Configuration Manager again.  There will probably be fewer updates today as I think I need to spend some time watching a few Webcasts this morning.  I checked briefly into the error with the Management Point and it seems like I recall something I needed to do in Active Directory with permissions.  I’ll try to find that and fix that problem first.

7:31 AM : Coffee Break

Taking a breather from the Webcast I’m watching (on System Center Configuration Manager 2007 SP1 and R2 upcoming releases) to grab a small cup of coffee.

7:37 AM : PKI

Another tangent, but I received the number I was waiting for to deploy our PKI.  I’m going to deviate from Configuration Manager long enough to get the PKI going, and then when I get around to it I can switch Configuration Manager over to Native Mode instead of Mixed Mode.  Again, I’m using Brian Komar’s Windows Server 2008 PKI and Certificate Security to make sure I follow updated best practices for deploying the PKI.

8:02 AM : Root CA capolicy.inf

I’m using the following configuration to initialize my enterprise root CA:

[Version]
Signature = “$Windows NT$”

[BasicConstraintsExtension]
PathLength = 3
Critical=true
[Certsrv_Server]
RenewalKeyLength = 4096
RenewalValidityPeriodUnits = 20
RenewalValidityPeriod = years
CRLPeriod = days
CRLPeriodUnits = 7
CRLDeltaPeriod = hours
CRLDeltaPeriodUnits = 4
DiscreteSignatureAlgorithm = 1

8:20 AM : Root CA Installed

I’m using the following script after installation to guarantee settings:

::Declare Configuration NC
certutil -setreg CA\DSConfigDN CN=Configuration,DC=extendhealth,DC=com

::Define CRL Publication Intervals
certutil -setreg CA\CRLPeriodUnits 52
certutil -setreg CA\CRLPeriod “Weeks”
certutil -setreg CA\CRLDeltaPeriodUnits 0
certutil -setreg CA\CRLDeltaPeriod “Days”
certutil -setreg CA\CRLOverlapPeriod “Weeks”
certutil -setreg CA\CRLOverlapUnits 2

::Apply the required CDP Extension URLs
certutil -setreg CA\CRLPublicationURLs “1:%windir%\system32\CertSrv\CertEnroll\%%3%%8%%9.crl\n10:ldap:///CN=%%7%%8,CN=%%2,CN=CDP,CN=Public Key Services,CN=Services,%%6%%10″

::Apply the required AIA Extension URLs
certutil -setreg CA\CACertPublicationURLs  “1:%windir%\system32\CertSrv\CertEnroll\%%1_%%3%%4.crt\n2:ldap:///CN=%%7,CN=AIA,CN=Public Key Services,CN=Services,%%6%%11″

::Enable all auditing events for the Extend Health Root CA
certutil -setreg CA\AuditFilter 127

::Set Validity Period for Issued Certificates
certutil -setreg CA\ValidityPeriodUnits 10
certutil -setreg CA\ValidityPeriod “Years”

:: Enable discrete signatures in subordinate CA certificates
Certutil -setreg CA\csp\DiscreteSignatureAlgorithm 1

::Restart Certificate Services
net stop certsvc & net start certsvc

certutil –crl

8:26 AM : Root CA Configuration Complete

Everything seems good on the root CA, moving on to the policy CA.

10:21 AM : Back on Task

I was distracted for a couple of hours talking to Microsoft and taking care of some tasks around the office, but am back on task.  I just imported the certificate revocation lists onto the policy CA.  I wasn’t able to make Brian’s command line (page 125) work, so I just right-clicked the certificate and allowed them to import the way they wanted to.  I’m a bit concerned since that adds them to the user account’s stores, but we’ll see if it causes a problem.

11:02 AM : Policy CA capolicy.inf

I’m using the following configuration to initialize the policy CA:

[Version]
Signature = “$Windows NT$”

[PolicyStatementExtension]
Policies = ExtendHealthCPS

[ExtendHealthCPS]
OID = 1.3.6.1.4.1.31088.1.1
Notice = “By enrolling a certificate from this certificate server, you agree to the posted legal notice.”
URL = “http://capolicies.extendhealth.com/defaultCps.aspx”

[Certsrv_Server]
RenewalKeyLength = 2048
RenewalValidityPeriodUnits = 10
RenewalValidityPeriod = years
CRLPeriod = days
CRLPeriodUnits = 7
CRLDeltaPeriod = hours
CRLDeltaPeriodUnits = 4
DiscreteSignatureAlgorithm = 1

I also just realized that I was supposed to save capolicy.inf to the %WINDIR% (usually C:\Windows) folder, not the system32 folder.  Maybe that’s why it didn’t work last time.

11:12 AM : Policy CA Installed

I’m using the following script after installation to guarantee settings:

::Declare Configuration NC
certutil -setreg CA\DSConfigDN CN=Configuration,DC=extendhealth,DC=com

::Define CRL Publication Intervals
certutil -setreg CA\CRLPeriodUnits 52
certutil -setreg CA\CRLPeriod “Weeks”
certutil -setreg CA\CRLDeltaPeriodUnits 0
certutil -setreg CA\CRLDeltaPeriod “Days”
certutil -setreg CA\CRLOverlapPeriod “Weeks”
certutil -setreg CA\CRLOverlapUnits 2

::Apply the required CDP Extension URLs
certutil -setreg CA\CRLPublicationURLs “1:%windir%\system32\CertSrv\CertEnroll\%%3%%8%%9.crl\n10:ldap:///CN=%%7%%8,CN=%%2,CN=CDP,CN=Public Key Services,CN=Services,%%6%%10″

::Apply the required AIA Extension URLs
certutil -setreg CA\CACertPublicationURLs  “1:%windir%\system32\CertSrv\CertEnroll\%%1_%%3%%4.crt\n2:ldap:///CN=%%7,CN=AIA,CN=Public Key Services,CN=Services,%%6%%11″

::Enable all auditing events for the Extend Health Root CA
certutil -setreg CA\AuditFilter 127

::Set Validity Period for Issued Certificates
certutil -setreg CA\ValidityPeriodUnits 5
certutil -setreg CA\ValidityPeriod “Years”

:: Enable discrete signatures in subordinate CA certificates
Certutil -setreg CA\csp\DiscreteSignatureAlgorithm 1

::Restart Certificate Services
net stop certsvc & net start certsvc

certutil –crl

11:32 AM : Publish to Active Directory Complete

I just finished publishing all CRLs and relevant certificates to Active Directory so that they are still available when I take the root and policy CAs offline.  I’m taking the root CA down and beginning installation of an issuing CA.

11:50 AM : Issuing CA capolicy.inf

I’m using the following configuration to initialize the policy CA:

[Version]
Signature = “$Windows NT$”

[Certsrv_Server]
RenewalKeyLength = 2048
RenewalValidityPeriodUnits = 5
RenewalValidityPeriod = years
CRLPeriod = days
CRLPeriodUnits = 3
CRLOverlapPeriod = hours
CRLOverlapPeriodUnits = 4
CRLDeltaPeriod = hours
CRLDeltaPeriodUnits = 12
DiscreteSignatureAlgorithm = 1

11:59 AM : Issuing CA Installed

I’m using the following script after installation to guarantee settings:

::Declare Configuration NC
certutil -setreg CA\DSConfigDN CN=Configuration,DC=extendhealth,DC=com

::Define CRL Publication Intervals
certutil -setreg CA\CRLPeriodUnits 3
certutil -setreg CA\CRLPeriod “Days”
certutil -setreg CA\CRLDeltaPeriodUnits 12
certutil -setreg CA\CRLDeltaPeriod “Hours”
certutil -setreg CA\CRLOverlapPeriod “Hours”
certutil -setreg CA\CRLOverlapUnits 4

::Apply the required CDP Extension URLs
certutil -setreg CA\CRLPublicationURLs “1:%windir%\system32\CertSrv\CertEnroll\%%3%%8%%9.crl\n10:ldap:///CN=%%7%%8,CN=%%2,CN=CDP,CN=Public Key Services,CN=Services,%%6%%10″

::Apply the required AIA Extension URLs
certutil -setreg CA\CACertPublicationURLs  “1:%windir%\system32\CertSrv\CertEnroll\%%1_%%3%%4.crt\n2:ldap:///CN=%%7,CN=AIA,CN=Public Key Services,CN=Services,%%6%%11″

::Enable all auditing events for the Extend Health Root CA
certutil -setreg CA\AuditFilter 127

::Set Validity Period for Issued Certificates
certutil -setreg CA\ValidityPeriodUnits 2
certutil -setreg CA\ValidityPeriod “Years”

:: Enable discrete signatures in subordinate CA certificates
Certutil -setreg CA\csp\DiscreteSignatureAlgorithm 1

::Restart Certificate Services
net stop certsvc & net start certsvc

certutil –crl

12:10 PM : PKI Complete

For all intents and purposes, I believe the PKI deployment to be complete.  Going to lunch and then back to Webcasts.

1:50 PM : Finished Webcasts

Just finished watching a Webcast on Configuration Manager 2007 SP1 and R2, and a two-part series on using Configuration Manager to deploy operating systems.

4:09 PM : End of Day

Watched several more Webcasts and tried to fix some errors in Configuration Manager’s status view.  No luck tonight.  Will start again tomorrow.

Office Communications Server Deployment, Day 1

Monday, May 12th, 2008

And so it begins.  As promised, I plan to chronicle in detail my journey through deploying Office Communications Server 2007.  The first several days will be filled with deploying supporting infrastructure.  We have made the decision to cut over from a separate internal domain name to a domain name that aligns with our external e-mail address domain and what will be our SIP domain (extendhealth.com).  All times are in MST.

11:50 AM : Current State

I should probably start by detailing my starting environment.  As I said above, we are cutting over to a new domain.  As such, we have a completely clean domain to work with in a brand new forest.  The new domain is called extendhealth.com.  I haven’t done anything to the domain outside of creating a user account for installation.  The user account is a domain, enterprise, and schema admin in anticipation of the OCS deployment (where I’ll need rights from all three groups).  In general, that’s not the recommended action.  Various tasks should be delegated to different personnel, and the permissions should be locked down much more tightly than they currently are in this domain.

The domain controller is running Windows Server 2008 Standard x64 (RTM).  It is up-to-date with all patches.  The domain controller is also virtualizing four virtual machines (VM) at this point, three of which will go offline shortly.  Two of the machines are an enterprise root certification authority (CA) and a policy server for the public key infrastructure (PKI).  Both of these machines will be taken offline and only be brought online when the issuing CAs need to have their certificates renewed.  [Side note: I knew quite a bit about setting up a PKI before starting this process, but am referencing Brian Komar's Windows Server 2008 PKI and Certificate Security for any questions I have.]  The PKI is currently pending a private OID request from the IANA.  This will allow us to use certificates in a manner that caters to external publishing of those certificates.  The certificates will not be used for public certificate chains – we have a wildcard certificate for that – but the official OID makes it easier to publish certificate policies and have them accepted by other parties.

The other machine that will go offline shortly is a temporary database server running SQL Server 2008 x64 February CTP.  When the next CTP comes out, we will install a database cluster and move our databases to the cluster.  In case I need to reference it, the name of this server is currently dbcluster1 (even though it’s not clustered).  The SQL Server install is complete and has all features installed.  I used a service account with a 64-character password generated by an online password generator.  Because these passwords are so random, we actually store them in a database that is particularly locked down.  Only one or two people in our entire organization have access to this database.

12:07 PM : Configuration Manager Intro

I’m currently looking at the pre-validation screen for System Center Configuration Manager 2007.  Configuration Manager is the fourth virtual machine (the only one that won’t go offline) on the domain controller.  I’ve allocated four processors and eight gigs of RAM for this server, which is currently named mgr1.  The Configuration Manager installation isn’t related to OCS; it’s more of a general infrastructure setup that I want to get out of the way.  We will be using Configuration Manager to deploy operating systems and updates.  I picked up a book on Configuration Manager the other day at the Microsoft Company Store.  (I was up in Redmond for the mid-sized market CIO summit.)  The plan is to set up very simple deployment of Configuration Manager before deploying OCS.  I have to deploy operating systems all along the way, so there’s no telling how much time it will actually take.  Configuration Manager should facilitate the deployment of those operating systems and their updates.  My naive estimate would be that it will take today and tomorrow to install and play with Configuration Manager, and then deployment of OCS will start Wednesday.  I feel very well prepared on my OCS deployment.  Configuration Manger scares me, however.  The deployment doesn’t seem especially streamlined, meaning I may make a significant mistake and have to redeploy.  We’ll see how things actually turn out.

12:13 PM : Configuration Manager Pre-validation Run 1

I haven’t done anything at all to this server (other than updates, joining to the domain, and enabling Remote Desktop).  It’s listing two warnings (I haven’t run the schema extensions and I don’t have the WSUS SDK) and several errors related to IIS, BITS, and WebDAV not being installed/running.  The only error that actually surprises me is the SQL Server sysadmin rights error.  Aside from that, I’m going to fix the other errors before looking into that one.

12:22 PM : Fixing Validation Errors

I installed a default installation of the IIS and Application Server Roles, and am downloading/installing Windows Software Update Services (WSUS) 3.0 SP1 from http://www.microsoft.com/downloads/details.aspx?FamilyId=F87B4C5E-4161-48AF-9FF8-A96993C688DF&displaylang=en.  I’m also downloading and installing the 64-bit version of WebDAV for IIS7.  Last but not least, I’m deciding whether or not I need to extend the Active Directory schema by reading this article.

12:34 PM : Schema Extensions and WSUS

I’ve decided to extend the schema per Microsoft’s recommendation and am adding some functionality to IIS per WSUS installation requirements in the ReadMe.  This is what I need to verify is enabled/installed in IIS:

· Windows Authentication

· Static Content

· ASP.NET

· 6.0 Management Compatibility

· 6.0 IIS Metabase Compatibility

I also noticed the BITS Server Extensions wasn’t enabled when I went into the Features part of Windows Server 2008, so I enabled them.  After installing those pieces, WSUS still alerts me that I don’t have the Microsoft Report Viewer 2005 Redistributable installed, but I don’t care about that until I need it.

12:55 PM : Installing and Updating WSUS

Still working on installing WSUS.  I had to provision a drive for updates on our SAN, which took a bit, and I’ve worked through all the other issues that I know of.  The installer is currently running.

1:26 PM : Lunch

Frustrated.  WSUS installed successfully and BITS and WebDAV certainly seem to be installed, but the Prerequisite Checker doesn’t seem to see them.  Rebooting and breaking for lunch.

2:02 PM : Back to Work

No change on reboot.

2:19 PM : Success!

Extended the Active Directory schema using ExtADSchema.exe in SMSSETUP/I386.  Installed a couple of additional IIS components (WMI compatibility, console) that cleared up the errors regarding BITS and WebDAV.  All systems are go at this point, but I’m a bit leery of what will be installed on dbcluster1 (my temporary SQL Server).  I had to turn off the firewall to get all checks to pass.  I’ll re-enable it after the install is complete, but having to turn it off to get the Prerequisite Checker to work doesn’t seem like a good sign.

2:22 PM : Configuration Manager Installation

Step by step:

  1. Selected “Install a Configuration Manager site server”
  2. Agreed to license terms
  3. Selected “Custom settings” (largely because the book recommends it)
  4. Selected “Primary site” since this is my first (and only) site
  5. Agreed to Customer Experience Improvement Program – I want Microsoft to improve installation environment awareness
  6. Product key was read-only
  7. Left default path (C:\Program Files (x86)\Microsoft Configuration Manager)
  8. Entered site code (DC1) and name
  9. Chose “Configuration Manager Mixed Mode”*
  10. Added NAP to selected client agents
  11. Specified SQL Server (dbcluster1) and database (sccm2007_dc1)
  12. Left default location (mgr1) for SMS provider – since the database will eventually be on a cluster, I can’t install the SMS provider there
  13. Left defaults for management point (install a management point on mgr1)
  14. Left defaults for port settings (HTTP/80 since I selected Mixed Mode)
  15. Allowed checking for updated prerequisite components
  16. Specified a download path for prerequisite components

2:33 PM : Settings Complete

After downloading a number of unnecessary prerequisites (multiple languages for Windows XP and Server 2003, neither of which are running), settings are complete and installation is ready to begin.  Installer, however, complains that the machine account for mgr1 does not have admin privileges on the SQL Server.

2:36 PM : Settings Complete, Take 2

Added computer account for mgr1 to the Administrators group on dbcluster1.  Prerequisite check has passed.  Install began at 2:37 PM.

2:40 PM : Fatal Error

Fatal errors during database initialization.  Not sure what that means since it created the database and tables.  Some tables are also populated.  (I looked at dbo.Agents.)  Great.  I have a message that says: Setup has detected an incomplete primary site installation on this computer.  You must uninstall the incomplete installation before continuing.  Here we go.

2:48 PM : Fatal Error, Take 2

Again with the fatal error.  Log (C:\ConfigMgrSetup.log) says: <05-12-2008 14:38:58> ***SqlError: [42000][650][Microsoft][ODBC SQL Server Driver][SQL Server]You can only specify the READPAST lock in the READ COMMITTED (if not based on row versioning) or REPEATABLE READ isolation levels. : sp_SetupSDMPackage

Googling it.

2:52 PM : Not Good

https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=329707

Starting over with a SQL Server 2005 SP2 database.  Back in a couple of hours.

5:24 PM : A New Error

After installing SQL Server 2005 as the SQL2005 instance and trying to bind it to the standard SQL port (1433), I couldn’t get the Configuration Manager installer to see the instance, so I uninstalled both SQL 2008 and SQL 2005, and then reinstalled SQL 2005 and SP2 for the second time today.  That means the bulk of my time today has been spent installing and uninstalling SQL Server.  I’m now on to a new error: the error message says “Setup failed to install SMS Provider.”  Logs give me the following errors:

<05-12-2008 17:24:14> CompileMOFFile: Failed to compile MOF C:\Program Files (x86)\Microsoft Configuration Manager\bin\i386\smsRprt.mof, error -1
<05-12-2008 17:24:14> Setup cannot compile MOF file C:\Program Files (x86)\Microsoft Configuration Manager\bin\i386\smsRprt.mof.  Do you want to continue?
<05-12-2008 17:24:14> Setup failed to install SMS Provider.  For more information about this error, see Microsoft Knowledge Base at http://microsoft.com or contact Microsoft Technical Support for further assistance.

Other .mof files apparently compiles successfully before this one.  Back to Google.

6:28 PM : Finished?

I’ve finally made it through the wizard (it only took most of the day).  I have some pretty serious complaints.  The first would be that things like extending the schema should be part of the wizard.  The second was the problem I just spent an hour on: Kerberos issues.  I did eventually find my answer at http://myitforum.com/cs2/blogs/rcrumbaker/archive/2007/10/12/system-center-configuration-management-with-remote-sql-installations.aspx.  That happens to be the clearest explanation of a couple of really complex issues – SPNs and delegation.  We’ve had a ticket open with Microsoft for over 18 months regarding a particular Kerberos issue and have had many, many people unsuccessfully try to fix the issue.  Anyway, I had to set up two SPNs, one each for the NETBIOS and FQDNs of dbcluster1.  The commands I ran (from the domain controller) were:

  1. setspn -A MSSQLSvc/dbcluster1.extendhealth.com:1433 extendhealth\sqlservice
  2. setspn -A MSSQLSvc/dbcluster1:1433 extendhealth\sqlservice
  3. setspn -l extendhealth\sqlservice

Two notes: first, the last command runs setspn in “list” mode, so that you don’t have to run adsiedit.msc.  Don’t get me wrong, I actually think adsiedit.msc is much better (and faster) at editing SPNs – but I thought I didn’t have it available, which brings me to my second note.  Setspn is available from the command line on Windows Server 2008 domain controllers (more accurately, computers with the AD DS role installed).  Adsiedit is also apparently available there, but doesn’t bind to your directory root by default.

It seems to me that the Prerequisite Checker should have caught the problem if the SPNs weren’t configured properly.  Whining aside, I did make it the rest of the way through the wizard and only one thing had a red X by it: the management point.  After reviewing the log, it seems that just the monitoring of the management point failed, and when I open the console everything seems to be functional.  I think I’ll leave it at this point (when I can be optimistic) and pick it up again tomorrow.

Going underground

Monday, April 21st, 2008

Just a quick post to let readers know that I’m headed underground for the next couple of weeks as I thoroughly read the available documentation for OCS.  I plan to journal in near-real-time my OCS deployment experience, which will likely start on approximately May 1.  Stay tuned.

What is it about language?

Thursday, April 17th, 2008

Update: A commenter on the previous version of this post noted that I may have intended to use “notwithstanding” which is indeed spell checker approved.

I was writing a blog post the other day and used the word, “notwitholding.”  The spell checker immediately underlined the word as misspelled, so I checked what the suggestions were and the word I was looking for didn’t appear in the list.  A cursory check of dictionary.com and m-w.com didn’t turn up any results, but a Google search showed the word being used in multiple places, some of which seem to be pretty credible.  I don’t have an Oxford Unabridged handy, so the question hangs in my mind: is it, or is it not a word?

Language is a funny thing.  The English language, in particular, has a number of really strange artifacts.  Although we have a language largely descended from the same family as German and heavily influenced by Greek, we’ve lost the case endings that allow Greeks to move words around in the sentence to stress importance.  In English, the structure of the sentence largely dictates the case of the word.  Nominative case usually comes near the beginning of the sentence, dative frequently follows a preposition.  There are all sorts of complex rules that I remember having difficulty learning in grade school; I can’t imagine what it’s like for a foreigner.  Granted, we don’t have to learn any declensions, but I think we lose something at the same time.

Add to that our verb system.  In many languages (Romance languages, German, Greek, at least), verbs have different conjugations for different tenses.  In English, we largely use the same conjugation (or a very few conjugations) and fill in the gaps with auxiliaries (helping verbs).  Verbs are already one of the most difficult parts of language – most learners of a foreign language don’t struggle with vocabulary as much as they do proper verb conjugations.

The thing that really gets my goat (we’re leaving idioms out of this rant), though, are words like inflammable.  Dictionary.com has a good note on usage, but at first blush the average person aware of prefixes in the English language would think that inflammable and flammable are opposites.  They are, in fact, synonyms – both mean combustible.  Other prefix problems like inhabiting a habitation have always befuddled me.

I suppose I’m just getting a little bit of this off my chest, but there are things that worry me: as I was growing up, my mother was very meticulous about grammar.  I read quite a bit and became comparatively articulate.  However, I am finding it increasingly difficult to communicate.  In many instances, I present a statement in what I believe to be very clear terms, and the statement is misunderstood.  Is it my presentation, or their understanding, or both that are at fault?  Or does language have some part to play in that miscommunication?  I’ve always been thankful that English doesn’t have a board of people admitting and dismissing words from the official language, but sometimes I wonder where we’ll be in 10, 20, 100 years.

Interact 2008 Summary, Day 3

Wednesday, April 16th, 2008

Thursday

Birds of a Feather

The third and final day of Interact 2008 kicked off with a Birds of a Feather session rather than a keynote.  This morning, I sat at the Blogging table for a while before moving to the Voice Infrastructure table.  At the blogging table, I met Scott Schnoll and Nino Bilic.  Scott writes technical documentation for Exchange Server and struck me as a very friendly person.  Nino manages the Exchange Team blog in addition to his day job, which apparently has to do with supportability for Exchange.  The discussion started with several people, and one of the questions we kicked around was whether anyone’s job description actually included blogging.  Although no one sitting at the table at that time had blogging in their job description, a Mindsharp representative showed up shortly thereafter to fulfill my prediction that even if it wasn’t happening today, it would happen.  We also talked about where to find inspiration for blog posts and other forms of collaboration before I moved on to the Voice Infrastructure table.

At the Voice Infrastructure table, there were representatives of some larger firm talking about call center uses for OCS, which I obviously found interesting.  Although I missed their introduction, I gathered from the conversation that they were using it in a very different way than we were: they wanted a telephony solution that would serve as a telephony solution; we want a telephony solution that integrates deeply into our existing software to bring together telephony and computing such that it is difficult to distinguish between them.  Wajih Yahyaoui was managing the table and asked a number of questions about what we (customers) would like to see in the voice infrastructure.  The one other noteworthy event was that I ran into a Steven White, who came out of Cisco and offered to answer some of my questions about the Cisco AS5400XM.

Topologies and Routing for Microsoft Exchange Server 2007 (Todd Luttinen)

I don’t have much to say about this particular session.  I thought that it would be more about single-site topologies (meaning configuration of server roles) and the routing through those different roles.  Rather, this session was about enterprise-level, multi-site routing and its intricacies.  Aside from finding the discussion interesting, it wasn’t particularly relevant to me.

High Availability in Microsoft Exchange Server 2007 SP1 Part 1 – CCR vs other HA solutions (Ayla Kol; Alex Wetmore)

I had a great chance to speak with Alex before this session, and was pleasantly surprised to find that he was a very down-to-earth guy.  Alex was the lead developer for cluster continuous routing, one of the high availability features in Microsoft Exchange.  After listening to my description of our needs, he suggested that standby continuous replication would probably be sufficient for our situation if our service agreement could tolerate 40 minutes worth of downtime.  Of course, that all revolves around the Service Level Agreement that we should have in place (but don’t).  What I really appreciated about this session was the atmosphere in the room.  There were several Exchange Services team members in the room, and there was actually a fair amount of friendly heckling that went on.  Ayla saw a couple of them come in and immediately told them (in a joking manner) that they weren’t allowed to ask any questions.  The community atmosphere contributed significantly to the discussion, and I found that there were a number of people who were able to speak to various situations using real-life stories that made it to top-level Exchange support.

High Availability in Microsoft Exchange Server 2007 SP1 Part 2 – Disaster Recovery and SCR Deep Dive (Scott Schnoll)

The last session I went to, this was a great way to close out the conference.  The room had the same atmosphere (which isn’t surprising considering it had the same people) for this session.  Scott had a ton of material to work through, but he made it through efficiently and delivered all of the knowledge necessary to recover from a disaster in the event that one should occur.  As I recall (and this is almost a week ago now, so it’s not word-for-word, but it does have the concept), Scott said that losing a datacenter was a good thing.  There was one other notable quote that fired up the hecklers, but what I appreciated most about this situation was the anecdotal experience Microsoft had to share in testing their replication scenario between Singapore and Puget Sound.  Apparently many more things went wrong than they wanted to go wrong, but they were still able to recover within three hours.

Conclusion

When all is said and done, I’d go back to the comments I made when starting this summary: this has been one of the best-spent weeks of my life as far as careers are concerned.  I was able to network, dig deep, and find answers and vision for the solutions I need to design this year.  I sincerely hope to make it back to this conference next year, and highly recommend that anyone interested in Microsoft OCS do the same.

Interact 2008 Summary, Day 2

Tuesday, April 15th, 2008

Wednesday

Keynote (Terry Myerson)

In my opinion, this keynote was better than Tuesday’s keynote, which is an odd thing to say considering that Gurdeep overall did a better job of presenting.  I think what I liked about this presentation were the number of times that I felt that Microsoft made themselves truly vulnerable.  No big company is perfect, and when you ship the amount of code that Microsoft ships, you probably ship significantly more bugs than the average company.  Terry was straightforward about a number of problems that have happened in the past year (they even showed the code that shipped that caused the leap year bug), but he also said that they were lessons learned for the Exchange team.  The overall feeling that I had walking out of the keynote was that the Exchange team treated me very much like an old friend: they were affable, and comfortable talking frankly about very sensitive matters.

Microsoft’s Quality of Experience: Defending, Deploying, and Succeeding (Neil Deason; Sam Chon)

Neil Deason has been involved in a number of the sessions I’ve attended, and I’ve been very impressed with him every time he speaks.  In contrast to my previous comment about an argument relating to ports on firewalls, Neil’s presentation today was very solid (the same argument re: ports notwitholding).  The emphasis of this session was not conveyed well by the title; the real thing we learned here was how to ensure that we have a consistent and good QoE.  There was at least one interesting discussion about why audio conferences use the Siren codec rather than the typical RTAudio codec.  The response was that computing power isn’t yet at the point where RTAudio streams can be decoded and encoded in scale amounts.  When computing power reaches that point, Microsoft will definitely look at using the RTAudio codec in favor of Siren.  Until then, those of us who use audio conferencing and have a discerning ear will notice a drop in call quality when shifting to using Siren.

Probably the thing that I found most interesting was something anecdotal.  There was a conversation about ambient noise on the voice conversation, and Mu Han related that they had received so much feedback from customers stating that the ambient noise level was so low that they couldn’t tell whether or not the person was still on the other end of the line.  His statement, and I’m not at all qualified to rate the degree to which he was being facetious, was that they were seriously thinking about reintroducing a higher level of ambient noise in future versions of OCS.  I’ve never heard of a product purposefully introducing more noise – that speaks pretty highly for call quality and their existing noise filtering mechanisms.

Planning and Deploying Voice Routes in Microsoft Office Communications Server 2007 using Enterprise Voice Route Helper (Byron Spurlock)

This was another excellent session with Byron, this time about using the Enterprise Voice Route Helper.  The Route Helper tool is designed to help configure dial plans and other facets of outbound routing.  My development background has involved a fairly significant amount of experience with regular expressions, which was what Byron emphasized several times as one of the most important features of this tool.  If you have regular expressions experience (even if it’s only a little experience), you might find other facets of the tool more helpful.  I especially liked the ability to set up and save test cases, which may help to validate that changes to the outbound routing achieve the expected result.  Another tidbit from this session is the best practice of setting up your phone usages to align with actual phone usages.  Microsoft’s recommended best practice is to create a local, national, and international usage at a minimum.  Routes, on the other hand, should align with their “break-out point”, the physical location where a media gateway connects to the PSTN.  There should be at least one route for each distinct break-out point.

Diagnose and Solve Voice Quality Issues with Microsoft OCS 2007 Quality of Experience Monitoring Server (Wajih Yahyaoui; Jisun Park)

I sat and had lunch with Ji Sun before this session.  We talked about what he had done before coming to Microsoft (he had just finished his PhD in Texas) and what he was doing now.  My impression overall, which was only reinforced during the session, is that Ji Sun is profoundly intelligent but very quiet.  This session was primarily a further exposition on Microsoft’s philosophy of QoE, and an introduction to one of the tools available to diagnose QoE: the Monitoring Server.  The QoE Monitoring Server shipped after OCS was released to manufacturing.  At the most fundamental level, the QoE Monitoring Server is a database which collects metrics sent in by OCS endpoints.  These metrics primarily revolve around MOS scores.  In most cases, MOS scores are subjective opinions (hence the name) of call quality.  Microsoft made an attempt at a more objective score, but maintained the legacy name “MOS”.  Furthermore, Microsoft breaks MOS down into distinct categories of sending, receiving, network, et cetera.  Some basic analytics are run on the metrics after they are collected, but for the most part the OCS administrator should expect to spend some time deciphering MOS scores, dropped packets, and jitter by looking at lots of numbers.  The top layer for the QoE Monitoring Server consists of a report pack for SQL Server Reporting Services that give a nicer visual indicator of overall QoE health.  As a personal anecdote, I’ll add that developing reports in SSRS is a headache, but once they are developed, the reports support e-mail based subscriptions so that a user may receive a report in his/her inbox every hour, day, etc.

Advanced Troubleshooting for Voice in Microsoft Office Communications Server 2007 (Byron Spurlock; Roy Kuntz)

This session was really a wrap-up of Byron’s other two sessions re: tools for managing and troubleshooting OCS problems.  We reviewed and went into more depth on Snooper, we looked more at the Enterprise Voice Route Helper, and we talked about the Office Communications Server 2007 VoIP Troubleshooting Guide.  The VoIP Troubleshooting Guide was authored in large part by Roy Kuntz and is an excellent document for identifying available tools for troubleshooting VoIP issues.

Birds of a Feather

Birds of a feather was designed to be a small roundtable discussion that ran from six to nine.  David and I spent most of our time at the QoE vs. QoS discussion.  Most of the people at the table noted that the session was pretty poorly named.  Microsoft as an entity does not see QoE as an alternative to QoS, but they do embrace the philosophy that you should only implement QoS if necessary.  In an ideal world, QoE and proactive monitoring will preempt the need for QoS.  The discussion at the table was largely driven by us – there were four or five Microsoft personnel, a Psytechnics representative, and an occasional conference attendee.  We were able to go through some of our concerns in detail and have a large amount of time for Microsoft to specifically address our concerns.

Overall, a great second day, but there was so much information and so little time.

Interact 2008 Summary, Day 1

Monday, April 14th, 2008

I am now wrapping up the third and final day of the Interact 2008 (not to be confused with Interact 2008) conference.  I don’t know what I can say about it other than to say that is has been time extraordinarily well spent.  There are many things that you do in life that are worthwhile days; as far as careers go, this ranks among the most useful days I’ve ever had.  I don’t intend that statement to be either hyperbole or summarily discardable.  There has been absolutely fantastic face time with Microsoft employees and a wonderful opportunity to interact (no pun intended) with key vendors and other attendees.  Although I’m completely saturated and exhausted, I’ll try to give a rundown of the sessions and events.  First up, the sessions that I attended.

Tuesday

Keynote (Gurdeep Singh Pall)

Overall a great keynote, but very similar in content to the keynote delivered at VoiceCon 2008.  I can’t hold that too much against him since I’m sure that writing new keynote speeches for every event he speaks at probably isn’t and shouldn’t be his priority.  That said, I recommend you watch the actual keynote from VoiceCon rather than reading my poor summary of what Gurdeep said.

Panel on Planning Voice Architecture and Deployment in Microsoft Office Communications Server 2007 (Mahendra Sekaran; Sean Olson; John Kenerson; Francois Doremieux; Russell Bennett; Jens Trier Rasmussen; Ken Ewert)

I tremendously enjoyed this session; it was an open forum for anyone with a question to address some of the brightest minds on the OCS team.  The people that stood out to me in particular were Mahendra Sekaran, who answered my question about topologies with 100-person outsource shops (and whether we needed to deploy a pool/enterprise voice equipment at that location); Sean Olson, who answered most of the questions about general vision; Francois Doremieux, who handled many questions about actual deployments; and Russell Bennett, who contributed intelligent comments to several questions.  I particularly enjoyed the range of knowledge available in the room, it seemed that there were answers for every question asked.

Microsoft Office Communications Server 2007 Edge Drill Down (Wajih Yahyaoui)

This was a very good session on details for the various edge servers.  Wajih has a very noticeable accent, but was obviously passionate about the subject matter, so it was a real pleasure to listen to him.  He was able to handle most of the questions in the room, but it was helpful that several other knowledgeable Microsoft personnel were there also.  The conversation had one major interruption when a guy who I considered to be acting very belligerently.  The contention was based around whether OCS’s requirement to open port ranges in external firewalls unnecessarily creates security vulnerabilities.  Neil Deason responded that a firewall is only ultimately secure if all ports are closed: it makes no difference whether there are 10,000 ports or one port open, your network is vulnerable if you have ports open in your firewall.  While I don’t think that response was a good response overall, the point of the response should be considered sufficient.  The point of what Neil was saying is that firewalls are only one element of a properly hardened network’s defenses.  Hardening a network involves hardening multiple elements, not just the firewall.  The firewall-only approach is typically described like an M&M: crunchy on the outside, chewy on the inside.  If you have a network defended only by a firewall, your network is vulnerable to internal attacks.  Although research shows that most attacks originate from outside the network, the same research shows that an alarming percentage of breaches originate from inside the network.  See http://answers.google.com/answers/threadview?id=15439 for links to credible sources.

Advanced Validation and Troubleshooting for OCS 2007 (Byron Spurlock, Tom Laciano)

Byron and Tom handled this session on how to ascertain the source of an OCS 2007 problem.  Both presenters were enjoyably humorous, considering the amount of time we’d been sitting that day.  We got a chance to see Byron use a number of tools such as the Snooper tool, validation wizards and more.  As I sat there, I realized how much I would have benefited from knowing that those tools existed a few weeks ago, when I spent a significant amount of time using Wireshark to diagnose what was wrong with OCS.  Afterwards, I went to one of the Coffee Chats with Tom and sat for a while as he explained in detail what subject names and subject alternative names are necessary for certificates in various scenarios.

Evening Event: Surfing @ Wave House

In the evening, we relaxed on the beach at the Wave House.  It was a great time breathing the (very) cool salt air, throwing back a few drinks, and doing some surfing!  I stunk (figuratively), but here’s a shot, courtesy of my colleague David DeWinter who went with me:

Mark Surfing

OCS Roles Primer, Part 2

Tuesday, April 8th, 2008

In part 1 of this post, we examined the core pool roles for Microsoft Office Communications Server 2007.  Specifically, we covered front-end servers, directors, the three variants of conferencing servers, and the archiving and CDR server.  There are still several key roles to be covered to understand the full breadth of the OCS offering.  These roles fit into three key areas: edge servers, telephony servers, and “other”.  Before we get into the specifics of the roles, please take a brief moment to review the vocabulary from part 1:

  • Office Communications Server: A Microsoft product designed to facilitate communications both inside and outside the office.
  • Presence: A metric that takes into account both your availability (available, idle, away) and your willingness (available, busy, on a call) to communicate.
  • Endpoint: Any device (SIP phone) or software package that registers itself with Office Communications Server as belonging to a user, meaning that the user can be contacted through the device or software package.
  • Enterprise Voice: Probably the most noteworthy addition to the product since 2005; allows calls to enter and exit Office Communications Server.  This means that from any endpoint, users can make or receive calls to traditional phone numbers.
  • Public Switched Telephone Network (PSTN): The traditional telephone network that delivers telephone service over dedicated copper cables.

Edge Servers

Access Edge Server

The access edge server provides three very key services: authenticating and enables connectivity for remote users, negotiating federated communications, and connecting to public IM services such as MSN, AOL, and Yahoo.  Authentication and communications with remote users is unequivocally the most common usage of access edge server.  This server is critical whenever an employee needs to use Communicator but is outside of the corporate LAN.  Traveling sales representatives with Communicator Mobile, home-based employees and other situations are supported when using Access Edge Server.  Federation is the term used to refer to two Active Directory domains that have set up a federated relationship.  Note that a federated relationship is not the same thing as a domain trust, but is similar.  Generally federation happens along corporate boundaries.  Two companies in a strategic alliance or other partnership will federate to allow key contacts greater visibility and easier access to communications.  Microsoft OCS can also allow connectivity with public IM services, enabling communications from Communicator to MSN Messenger, AIM, or Yahoo! Messenger.

Personal Side-note: Access edge server is one of the most amazing roles in my opinion.  I have been witness to quite literally taking an OCS endpoint, moving it outside of the network, and having it seamlessly connect back up to OCS without any additional configuration.  Imagine being able to grab your desk phone and go home for the day!  We currently have a Cisco UCCX system in place.  In order to take my phone home, I have to take a hardware VPN home, hook it up directly to my cable modem (in the basement) and then hook my phone straight to that.  With OCS, I was able to take my laptop home, turn on my wireless and connect immediately.  If I can say one thing that would be the most important thing for a Cisco customer to hear, it’s this:

Our Cisco system is technically capable of achieving everything we need it to achieve, but our experience with OCS has blown us away.  Actually getting your hands on to a sample OCS setup is the best thing that you can do for yourself.

To summarize, the access edge server:

  • Authenticates and enables connectivity for remote users
  • Allows two entities to federate, which in turn allows greater visibility for communications
  • Allows connectivity to public IM networks
A/V Edge Server

The A/V edge server enables audio and/or video conferences to happen with users outside of the corporate LAN.  It is important to note that telephony conferences are considered distinct from this scenario and are covered by the telephony conferencing server (see part 1).  The A/V edge server allows remote users authenticated by access edge server to establish internal audio or video calls, or VoIP calls for enterprise telephony scenarios.

Web Conferencing Edge Server

Similar to the A/V edge server, the web conferencing edge server enables Live Meeting 2007 sessions to include users outside of the corporate LAN.  Many companies will use this role slightly differently than they will the other edge server roles.  Where access edge server and A/V edge server are deployed to allow external known users to connect and conference, Web conferencing edge server may arguably be used to conference in more anonymous users (who are still actually authenticated by digest authentication) than known users.  This allows companies an internally controlled, paid-for mechanism similar to WebEx that allows public sharing of desktops and other information.

Requirements** (for all edge servers):

  • Dual processor, dual core 3.0GHz+ processor
  • 2 x 18GB HDD
  • 4GB+ RAM
  • 2 x Gigabit NIC
  • Windows Server 2003 SP1+*

* I was not able get the OCS primary installer to run successfully on Windows Server 2008 RTM.  It may be that the individual installers would run successfully, but I have not confirmed this.  The only role I have successfully installed on Windows Server 2008 is Speech Server 2007.

** The work of mixing audio channels is intense; A/V servers will benefit from more robust hardware.

Communicator Web Access

Communicator Web Access (CWA) is to Office Communications Server what Outlook Web Access is to Exchange Server 2007.  It provides an attractive, AJAX (slick update without refresh) based interface for internal or external users to use.  CWA functions much like the director role in that it proxies connections, but differs in that it also proxies internal connections.  Also, CWA is restricted to communicating via instant messaging.  There is no support for audio/video conferences, Live Meeting, or enterprise voice.

Requirements:

  • Dual processor, 3.2GHz+ processor
  • 1 x 36GB HDD
  • 4GB+ RAM
  • Gigabit NIC
  • Windows Server 2003 SP1+*

* I have not yet attempted to install this role on Windows Server 2008.

Web Components Server

This role has probably the least visible functionality of all server roles: it’s primary responsibilities are to allow users to join Web conferences by clicking a URL, allow download of Address Book data, and expand membership in distribution groups (in ways, simply an expansion of the Address Book functionality).

Requirements:

  • Dual processor, dual core 2.6GHz+
  • 2 x 18GB HDD
  • 2GB+ RAM
  • Gigabit NIC
  • Windows Server 2003 SP1+*

* I have not yet attempted to install this role on Windows Server 2008.

Mediation Server

Contrary to the Web components server, the mediation server role has profound visibility and is arguably as important as a front-end, back-end, or edge server.  The mediation server is what makes enterprise voice possible.  When Microsoft implemented enterprise voice, they elected to use proprietary codecs (RTAudio and RTVideo) in order to overcome some significant hurdles such as inconsistent bandwidth.  However, their choice to use these proprietary codecs meant that right from the beginning, Microsoft wasn’t able to play nicely with many pieces of PSTN hardware.  In their defense, the enterprise voice market is very confused right now.  There are many competing standards such as ICE, SIP, and others that still aren’t fully or consistently supported.  Microsoft saw this and decided that it would be easier to simply draw a strong line between external and internal voice traffic.  That line is drawn right through mediation server.

Microsoft states that there are three ways to connect Office Communications Server to the PSTN.  The first is through a basic media gateway.  A basic media gateway is simply a piece of hardware that terminates PSTN lines (whether in FXS/FXO or T/E/DS form).  The media gateway’s responsibility is to accept incoming calls on the PSTN lines and hold the line open until the call is complete.  To know when the call is completed, the basic media gateway talks to the mediation server, generally via G.711.  The mediation server does the job of decoding G.711 voice traffic and encoding into RTAudio (and vice versa, for outbound voice traffic).

A basic hybrid gateway does essentially the same thing except that it merges the mediation server role directly onto the media gateway.  The benefit of a basic hybrid gateway over a basic media gateway fundamentally boils down to TCO: it’s cheaper and easier to manage one box than it is to manage two.

The final means of connecting Office Communications Server to the PSTN is for the media gateway itself to directly support the native OCS protocols (like RTAudio and ICE).  Microsoft calls this an advanced media gateway.  Please note that the difference between the advanced media gateway and the basic hybrid gateway is that in the basic hybrid scenario there are two functions coexisting on one box – they are still distinguishable functions.  With advance media gateways, the functions are no longer distinguishable.  The media gateway natively speaks OCS’ language.

In the next post in this series, we’ll consider a final smattering of server roles that don’t always require a full server, consider coexistence scenarios and some final “gotchas” that I wish I’d known about when I started deploying OCS.

Queues are for the Brits, Part 2

Wednesday, April 2nd, 2008

In part 1 of this article, I raised some issues with traditional ACD algorithms.  The issues raised are best summarized by generalizing* ACD algorithms as FIFO queues with the only variances being skill levels and the actual agent allocation algorithm.  Agent allocation algorithms generally break down into something fairly simple such as which agent has been off the phone the longest, taken the fewest calls, or spoken to the person before.  There is a lack of intelligence when it comes to considering a many-to-many call/agent match.

* I really do understand that there are likely algorithms out there that do achieve 90% of what it is that we need to achieve.  The question is not, “How much does solution x achieve for company y out of the box?”, the question is, “To what level does solution x allow all companies to customize the algorithm to achieve 100% of their needs?”

A Proposed Solution

Before I propose a solution, I should make two things clear.  First, I am by no means an expert in contact center theory.  There is a world of things I don’t know – contact center theory is one of them.  Second, I am not fully educated on the existing solutions out there.  Due to the proprietary nature of algorithms and the obvious interests of the companies in protecting them, the best I can do is find descriptions of how algorithms currently work.  I can also see the flaws in our current UCCX system.

So how do we determine which is the best agent to assign to a call?  Skills-based routing excels at limiting the pool of agents to available agents who are capable of taking the call.  We want to break past that barrier to achieve the following:

A desirable algorithm for contact centers will create an optimal call-agent match, adjusted for time considerations.

Note that the statement above does not de facto consider availability.  Availability certainly should be part of the equation, but only part.  For contact centers who have frequent repeat callers, agent consistency may be a desirable trait.  If agent consistency is desirable, the algorithm may state that if the caller’s assigned agent will be available in less than a minute, the caller will be placed on hold until the agent is available.

Base Match Score

Skills-based routing has achieved the first part of the equation: making sure that someone capable of handling the call is assigned to the call.  Because there are only the few dimensions involved (call disposition, available agents, and agent skills), call match may not be entirely optimal.  The first step to creating an optimal call-agent match is to increase the number of dimensions that factor into the final assignment.  As was stated in the introduction to this post, we have designed an algorithm that takes a number of factors into consideration.  A sample list of considerations may include the following factors:

clip_image001[9]

** Note that unless an alternative receptor has been configured for calls with a zero match score for all agents, all multipliers should be greater than zero.

The above chart over-simplifies in one significant respect: it treats ranges as a flat match or non-match score.  In our actual algorithm, we support ranged multipliers, but this example needs to be easy to understand.  I also recommend this type of a chart for gathering requirements from the business; it’s easier to understand and easier to supply rows rapidly.  That said, this may be the match score matrix for a fictional company.  The first row indicates that, as with many companies, skill match is critical to agent assignment.  The non-match multiplier minimizes the chance of a agent without a skill match being assigned to this call.  Unless voice mail or an alternate “queue” is configured to handle calls with a zero match score, I highly recommend having all multipliers be greater than zero.  The second and third rows assign a significant priority to agents who have spoken with this person before.  The third to fifth rows flatten a ranged multiplier.  In true implementation, I recommend attempting to use an actual mini-algorithm to deal with these types of issues as the result is a more accurate, less “bucketed” match score.  In this case, we are attempting to de-prioritize any agent that is not currently available.  The longer it will take the agent to become available, the less likely it is that they will be assigned to the call.

It is critical to note at this point that we feel this is one of the key differentiators of this algorithm.  Most algorithms only consider the available agent pool and rescan every few seconds if a match cannot be found.  In this case, a match can be made even before an agent finishes a call if the reason (previous contact, for instance) for assigning the call to that agent is compelling enough.  The IVR could even theoretically ask the person if they wanted to speak to their assigned agent and alter the match/non-match multipliers for assigned agent based upon their answer.

The final row of the table assigns a value to variable cost.  If our company handles many of the calls in-house, those calls probably have a very low variable cost (most of the costs would be considered fixed or sunk costs).  If additional calls can be routed to an outsourcer for $20/call, it is important to know the value of routing that call.  In general, the algorithm should prefer routing to agents with the lowest variable cost.

Once the match score criteria have been determined, an example matrix can be set up.  In this case, we are attempting to assign four incoming calls to three agents.  We consider each criterion for the match score and evaluate a base match score for all agents for all calls.  [We will continue to build on this same matrix when we introduce the timeline modifier.]

image

We calculated the match score by multiplying one times each of the values for the match/non-match values as appropriate for each match factor.

Timeline Modifier

Now that we have a concept of a base match score, we need to introduce a timeline modifier.  The timeline modifier ensures that calls with poor match scores across the board eventually get picked up.  How compressed the timeline modifier is depends upon your business model.  If you wish to have a good match and have a high call volume, a longer timeline may make sense.  If you don’t care about match and just want to ensure that calls are picked up, a more compressed timeline may make sense.  You could even replicate a FIFO queue by using an extremely compressed timeline modifier.  We currently use this equation to calculate the timeline modifier: if the call is not yet ready to be transferred, we divide 1.25 by the number of minutes until transfer.  If the call is ready to be transferred, we add three to the number of minutes it has been ready for transfer.  If we treat negative numbers as calls that are ready to be transferred and positive numbers as calls that are not ready to be transferred, extending our example above yields the following timeline modifiers for our calls.

image

One of the most interesting things about the timeline modifier is that it really boils down to just another match factor, but we treat it differently for two reasons: first, it’s more critical that we have a smooth range rather than buckets when referring to the amount of time a call has been on hold.  Second, business people understand timelines to be distinct from other match criteria.

Modified Match Score

We then use the timeline modifier to calculate the modified match score.  Multiplying the base match score by the timeline modifier yields the modified match score, as shown:

clip_image001[7]

Call Assignment

Finally, we assign calls by maximizing the sum of agent-call matches.  In this case, our sum is maximized at 55.8125 by assigning Call 4 to Agent 1, Call 2 to Agent 2, and Call 3 to Agent 3.

Implications and Conclusion

There are a number of implications to how we have assigned calls.  Note that calls 2 and 3, which are not ready to be transferred, are already assigned to an agent by the algorithm.  We distinguish between optimal match and actual assignment.  For actual assignment, we prevent calls more than two minutes from transfer from being assigned.  We could achieve the same thing by tweaking our timeline modifier equation to yield a different timeline modifier.  This brings up perhaps the most important differentiator of this algorithm, however.  Because we consider calls that aren’t yet ready to be transferred, we have some level of predictive ability that allows a better call-agent match.  If we go back to part 1 of this post, it is easy to see why predictive ability is important.  Rather than just looking at the here-and-now, we look at the soon-to-be and are able to reserve agents whose skills match with calls that will soon be ready to transfer.  The key to getting this information is to have the IVR send the ACD periodic notifications of calls en route, their current disposition and probable end state.  Finally, it is important that we consider not only the best agent for the call, but also the best call for the agent.  Looking at the call assignment grid above, you’ll note that we assigned Call 3 to Agent 3.  However, Call 3 had a higher match score with Agent 1.  The reason we assign Call 3 to Agent 3 is because Agent 1 has a better match with a different call, meaning that Agent 3 gets Call 3 by process of elimination.

The result of our experimentation with OCS has been this: last year, we struggled for months to hook into UCCX’s ACD in order to direct calls to the right destination based solely on one piece of information.  In our pilot with OCS, we were able to achieve multi-factor routing in a matter of days for a small fraction of the cost we incurred last year.  It is entirely accurate to say that Office Communications Server does not ship with an ACD.  The sleeper, however, is that they do ship a platform that is simple to hook into and allows development of a very complex and highly customized ACD that fits your business model.  For us, unless we find a blocking problem with OCS the choice is simple.

In future posts, we will start to lay out a high-level architectural diagram of how the various pieces work, where the messaging links are, and any gotchas that we find.

Queues are for the Brits, Part 1

Thursday, March 27th, 2008

First off, I need to say that I sincerely hope that no one takes offense to the title of this post.  The title is meant to be a play on the phrase, “gone to the birds,” not an ethnocentric slur.  The alert reader will also pick up on the double entendre that in Britain, queue is a much more common word.

I have stated in earlier posts that I lead a team of top-notch developers.  My primary responsibility is early research and architecture; my team are the ones to transform the vision into reality.  We recently completed our pilot project, which I plan to blog about at some length.  One of the requirements that we met was the ability to design an advanced ACD that takes into account any number of variables and implements a “best-match” algorithm.  We believe this to be a significant advance over traditional algorithms of the skills-based routing.  In fact, a multiple-hour search yielded an awfully slim amount of relevant data.

The Problem with Skills-Based Routing

The primary problem with queues is implicit in the name: most queues operate as first-in-first-out mechanisms.  Before we can address why FIFO is a bad idea for contact centers, we need to have some context.

There is an even bigger problem that is a fundamental part of the queue problem.  This problem is in the existing call distributor algorithms.  A simple example will illustrate:

Assume we have three available agents.  Agent 1 is licensed to sell insurance in Michigan and Indiana.  Agent 2 is licensed to sell insurance in Indiana and Ohio.  Agent 3 is licensed to sell insurance in Indiana only.  Assume also that 80% of our inbound calls are from Michigan residents, 10% from Indiana residents, and 10% from Ohio residents.  If a call comes in from a resident of Indiana, any of the three may be able to sell an insurance policy to the caller.  However, only one is best suited in this context to sell the insurance policy to the caller: Agent 3.  There is a 90% chance that the next call will be from either Michigan or Ohio, meaning that Agent 3 would not be able to handle the call.  That means that we want to reserve those agents whose skills are most in demand for the calls that demand those skills.

Many contact and call center software providers trumpet their algorithms for call distribution, but most of the algorithms are some variant of, “How do I spread the load of calls evenly over my agent pool?”  The question that should be asked is, “How do I find the best possible agent to handle this call?”  Some software providers have started to deal with this question and have introduced a second factor into skills-based routing.  By assigning a rating to an agent’s skill, there is indeed additional intelligence available to the call distributor.  Consider the following example:

Assume we have three available agents, with skills rated as shown in the matrix.

Agent

Windows skill level

Office skill level

Internet skill level

Agent 1

70

40

10

Agent 2

10

70

40

Agent 3

40

10

70

Upon receipt of an inbound call, the skill matrix is analyzed and the agent with the highest rating in the skill necessary is assigned to the call.  Therefore, if we receive a Windows call, the call is assigned to Agent 1.  If we receive another Windows call while Agent 1 is still occupied, Agent 3 is assigned.  If all agents become available again and an Internet call is received, the call is assigned to Agent 3.

In many senses, the second example is still vulnerable to the problem noted in the first example.  Assume our call load is 90% Windows, 5% Office and 5% Internet.  If a call comes in for Windows, it should be rightly assigned to Agent 1.  However, if a subsequent call comes in for Internet support, it might be more appropriate to assign Agent 2 to the Internet call and reserve Agent 3 for the next Windows call.

The fundamental problem with traditional skills-based routing is that it takes so few factors into account.  In part 2, we will consider a potential solution to this problem.