Nodes in a load balancer cannot connect to themselves MIM/FIM service fails

Another interesting caveat in an Azure Load Balancer configuration if you have a say a Load Balanced server address of portal.mim.ninja and that within the fim service configuration and web.config you have the dns record portal.mim.ninja you will probably find this doesn’t work in an Azure portal configuration. You will get service unavailable page and the event of:

The Portal cannot connect to the middle tier using the web service interface. This failure prevents all portal scenarios from functioning correctly.

The cause may be due to a missing or invalid server url, a downed server, or an invalid server firewall configuration.

Ensure the portal configuration is present and points to the resource management service.

This is fixed by adding an entry to the hosts file of portal.mim.ninja with the servers own ip address…

But why is this happening?

Well, when the server send a syn packet out it expects a synack packet back but it gets sent its own syn back, which it doesn’t like and keeps sending this out…why this behaviour only seem to happen on the default MS Azure Load Balancer I’m not sure…but it does. So the workaround is modifying hosts as above this will however only ever use the service instance on that server but the actual connection will will be load balanced so it shouldn’t cause any problems.

Azure Load Balancer client Session Persistence setting for FIM/MIM

Since Azure is becoming more and more relevant when it comes to deployment solutions I thought I would give a quick overview on the settings to enable FIM/MIM portals to work ok behind an Azure NLB. Most of the settings are straightforward, load balance ports 80,443,5725,5726

Now 80 may or may not be needed if you are using my redirect script they are needed even though no client actually connects on port 80 the redirection needs to occur so the port needs to be open for the initial redirection.

5725 and 5726 are used by the FIM Service now as stated in the very succinct MS article on Load Balancers Here it explains that there needs to be client affinity between these 2 ports:

For password reset client it is important as well to keep session on the same server across the ports 5725 and 5726.
Why is that?
Simply because when password reset client connect to the QA gate and after successful user identification gets token from the Security Token Service on the 5126 port it has to request for password reset thru the Resource Management Service on the same server (but on the port 5725). If it will go to different server password reset will be unsuccessful.

Ok so what is that setting in Azure there is no “Sticky” term? well Azure LB uses tuples to work out which server to load balance the request to its explained well here.

But in essence

None:
5 tuple hash destination is based on Source ip, Source port, Destination ip, Destination port, Protocol
A good distribution so the client will only stay on the same server if all the above remain the same

ClientIP,Protocol:
3 tuple hash Source ip, Destination ip, protocol
Less distribution, connection will remain on same server if source ip and destination ip and protocol remain the same

ClientIP:
2 tuple hash Source IP dest IP
The least distribution affinity will remain as long as source and destination ip don’t change

So for FIM/MIM we need to setup ClientIP will will ensure the connection doesn’t jump between nodes when the protocol changes from 5725 to 5726…..

Make sure you check out my other posts for potential caveats:
Here and here.

FIM/MIM times out or SSL connection error behind Azure Load Balancer

Recently I had been setting up a load balanced MIM Portal solution in the Azure cloud and the setup seemed to work ok, after a while I got reports it was working for some people. It turned out if you were connected via WIFI it worked fine but if you connected via LAN it didn’t…….How bizarre! All the networks, gateways and site to site VPN’s were setup correctly and indeed if you tried to access any of the portal servers directly it worked only access via the Load Balancer was problematic.

The solution ended up being the setting “Large Send Offload” and I will explain why…

I collected the traces both on the Azure Server and my client pc. Initially I saw the 3-way handshake was successful, which means the port is opened and packet is reaching the backend VM. I then noticed that to complete SSL handshake, my client PC sends Client Hello, but never receives Server Hello.
then the Azure Server logs were checked, I noticed that the Server Sends Server hello but with a payload larger than 1350 MTU……but why is that a problem?

Below is the architecture

Azure Server –> Load balancer –> Azure gateway–> Source client PC

When the Azure Sever sends payload greater than 1350 , the Azure gateway sends the Destination Unreachable message to the Load balancer. Stating that it should decrease the Payload and send it again. The Destination Unreachable message is sent to Load balancer, but the load balancer never sends to the Backend VM, because it never knows the packet is for the particular target VM. It always thinks that the packet is for itself. Due to which the target backend VM never reduces the payload and subsequently doesn’t re-send it to the gateway.

Finally the Solution!

• Go to the Network adaptor
• Go to properties
• Click on configure

• Go to Advanced option and Disable the below 2 options

Disable the “Large Send Offload”
Check the status of “Jumbo frames” also but this usually disabled by default anyway

After disabling those settings everything started working apparently this mechanism is by design, can only be resolved by disabling the above 2 features. Apparently Microsoft are working on making the Azure Load Balancers accept and forward large packets.