Today I made a mistake and renamed a server without removing it from the AppFabric cache cluster. This caused some trouble getting the cache service back to a running and healthy state as well as some ugly error messages in the ULS, such as:
Unexpected Exception in SPDistributedCachePointerWrapper::InitializeDataCacheFactory for usage 'DistributedLogonTokenCache' - Exception 'Microsoft.ApplicationServer.Caching.DataCacheException: ErrorCode:SubStatus:There is a temporary failure. Please retry later. (One or more specified cache servers are unavailable, which could be caused by busy network or servers. For on-premises cache clusters, also verify the following conditions. Ensure that security permission has been granted for this client account, and check that the AppFabric Caching Service is allowed through the firewall on all cache hosts. Also the MaxBufferSize on the server must be greater than or equal to the serialized object size sent from the client.). Additional Information : The client was trying to communicate with the server : net.tcp://old_server_name:22233 at Microsoft.ApplicationServer.Caching.DataCache.ThrowException(ResponseBody respBody, RequestBody reqBody) at Microsoft.ApplicationServer.Caching.DataCacheFactory.GetCacheProperties(RequestBody request, IClientChannel channel) at Microsoft.ApplicationServer.Caching.DataCacheFactory.GetCache(String cacheName) at Microsoft.SharePoint.DistributedCaching.SPDistributedCachePointerWrapper.InitializeDataCacheFactory()'.
or
Token Cache: Failed to initialize SPDistributedSecurityTokenCache Exception: 'Microsoft.ApplicationServer.Caching.DataCacheException: ErrorCode:SubStatus:There is a temporary failure. Please retry later. (One or more specified cache servers are unavailable, which could be caused by busy network or servers. For on-premises cache clusters, also verify the following conditions. Ensure that security permission has been granted for this client account, and check that the AppFabric Caching Service is allowed through the firewall on all cache hosts. Also the MaxBufferSize on the server must be greater than or equal to the serialized object size sent from the client.). Additional Information : The client was trying to communicate with the server : net.tcp://old_server_name:22233 at Microsoft.ApplicationServer.Caching.DataCache.ThrowException(ResponseBody respBody, RequestBody reqBody) at Microsoft.ApplicationServer.Caching.DataCacheFactory.GetCacheProperties(RequestBody request, IClientChannel channel) at Microsoft.ApplicationServer.Caching.DataCacheFactory.GetCache(String cacheName) at Microsoft.SharePoint.DistributedCaching.SPDistributedCachePointerWrapper.InitializeDataCacheFactory() at Microsoft.SharePoint.DistributedCaching.SPDistributedCache..ctor(String name, TimeSpan timeToLive, SPDistributedCacheContainerType containerType, Boolean encryptData) at Microsoft.SharePoint.IdentityModel.SPDistributedSecurityTokenCache..ctor(String name, TimeSpan timeToLive, SPDistributedCacheContainerType containerType, Boolean encrptyData, TimeSpan minimumTokenExpirationWindow) at Microsoft.SharePoint.IdentityModel.SPDistributedSecurityTokenCacheInitializer.Init(Object state)'.
Checking the available AppFabric cache hosts gave this result, the old name was still listed:
PS C:\Users\sp_farm> get-cachehost
get-cachehost : ErrorCode:SubStatus:Cache host old_server_name.domain.local is not reachable. At line:1 char:1 + get-cachehost + ~~~~~~~~~~~~~ + CategoryInfo : NotSpecified: (:) [Get-AFCacheHostStatus], DataCacheException + FullyQualifiedErrorId : ERRCAdmin039,Microsoft.ApplicationServer.Caching.Commands.GetAFCacheHostStatusCommand
HostName : CachePort Service Name Service Status Version Info -------------------- ------------ -------------- ------------ old_server_name.domain.local:22233 AppFabricCachingService UNKNOWN 0 [0,0][0,0]
I tried to remove the old host but that’s not that easy when the machine is not reachable because its name has changed.
I have found several Blog posts where people advise to export the cache configuration to a file, then removing the old AppFabric host manually from the configuration file and to re-import it. But that didn’t work for me right away. When I tried to import the customized configuration file I got an error like that:
PS C:\Users\sp_farm> Import-CacheClusterConfig c:/config.xml Import-CacheClusterConfig : Object reference not set to an instance of an object. At line:1 char:1 + Import-CacheClusterConfig c:/config.xml + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + CategoryInfo : NotSpecified: (:) [Import-AFCacheClusterConfiguration], NullReferenceException + FullyQualifiedErrorId : System.NullReferenceException,Microsoft.ApplicationServer.Caching.Commands.ImportAFCacheClusterConfiguration Command
Finally I came up with a solution. Some steps might be unnecessary, but I was too happy to have this problem finally fixed to investigate it further. So here you go:
1. Edit your HOSTS, add:
- ‘IP_of_the_machine old_server_name’
2. Add the current computer as a host:
- Add-CacheHost -providertype …* -connectionstring …*
3. Register the current computer as a host:
- Register-CacheHost -providertype …* -connectionstring …*
4. Stop cluster:
- Stop-CacheCluster
5. Export configuration:
- Export-CacheClusterConfig “c:/config.xml”
6. Edit the configuration file:
- remove the ‘host’-tag that contains the old server name and save the file
<hosts>
<host replicationPort=”22236″ arbitrationPort=”22235″ clusterPort=”22234″
hostId=”808274813″ size=”8191″ leadHost=”true” account=”domain\username”
cacheHostName=”AppFabricCachingService” name=”new_server_name.domain.local”
cachePort=”22233″ />
<host replicationPort=”22236″ arbitrationPort=”22235″ clusterPort=”22234″
hostId=”238287772″ size=”819″ leadHost=”true” account=”domain\username”
cacheHostName=”AppFabricCachingService” name=”old_server_name.domain.local”
cachePort=”22233″ />
</hosts>
7. Update configuration settings:
- Set-CacheHostConfig -HostName server name.domain.local -Port 22233
8. Change the context:
- use-CacheCluster
9. Import edited configuration file:
- Import-CacheClusterConfig -file “c:/config_edited.xml”
10. Start cluster:
- Start-CacheCluster
11. Remove HOSTS entry
*: Provider- and Connectionstring-attribute can be found here:
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\AppFabric\V1.0\Configuration
Now the old AppFabric cache host should be gone and your cache up and running.
Hope it helps!
Thank you for this genius post, it worked like a charm. For the record: I didn't have to do step 7 (Set-CacheHostConfig).
Well done and thanks! This issue was really bothering me until I found your site.
"*: Provider- and Connectionstring-attribute can be found here:"
Are these the full strings found in the registry or part of the string? I've been struggling to get beyond this step (meaning I don't see the new server in config.xml when exported).
Thank you for the post :-)
Ian
Thanks for this post. I was able to follow it right up until step 10 when I was unable to start the cluster. My server had been moved from a different domain rather than just being renamed. Not sure if that was the difference. However, since I had renamed the server in the config file, I was able to delete and then reprovision the service after I deleted the hosts file entry. All is good now. Cheers!
this save my weeeknd after the migration to the new domain