Navigation and service panel


Content

Issues with Site Resolving or: Six Stages of Debugging

By Pascal Mathys on 8. February 2017, No comments

After upgrading a Sitecore solution from 7.2 to 8.2 initial release, we experienced a strange behavior on the delivery servers on the customer environments. The site resolving of internal links to other sites did not work on some machines. It didn't happen on all delivery servers and it never happened on authoring servers. It didn't happen all the time and it never happened on my local environment or the dev environment.

Links to items in other site trees were resolved like <domain>/<contextVirtualFolder>/sitecore/content/path/to/other/item instead of <domain>/<targetVirtualFolder>/other/item, which was really odd, as it "works on my machine".

While digging into this issue, i literally went trough the Six Stages of Debugging.

1. That can't happen.

The issue occured after some time in the testing phase of the upgrade. I never saw this before in other solutions with similar settings applied. Why does this happen only on specific environments and only on delivery systems? And why not everytime? The same server which had issues on one day, worked without any issue the next day.

2. That doesn’t happen on my machine.

After trying to stupidely reproduce the issue on my local machine, i went trough all link-related settings (Rendering.ResolveSite and friends) and checked the site configurations and everything that could possibly be different between the environments and server roles. Everything looked good and there was no specific difference between the production delivery environment and my local settings in this regard.

3. That shouldn’t happen.

As there was no evident configuration issue, i fired up JetBrains dotPeek and went down the rabbit hole the LinkManager.GetItemUrl method is. Knowing most about the resolving process before, i expected the site resolving table to be wrong. This table is built once per AppPool recycle when requesting the url of an item outside of the root path of Sitecore.Context.Site for the first time and contains all sites which are part of the site resolving process which is used by the GetItemUrl method to resolve the target site of the given item based on its item path.

My co-worker told me that it is impossible that the table could be wrong. Damn, i should have made my bet ;-)

Since the site resolving table is a private static Dictionary<LinkProvider.LinkBuilder.SiteKey, SiteInfo> within the LinkBuilder class, which is placed as a child class of the LinkProvider, it is not trivial to debug. It didn't help either that the dictionary key was of type LinkProvider.LinkBuilder.SiteKey, which is declared as internal. With the help of some reflection magic, i was able to create a dirty .aspx file with the following code snippet:

var siteResolvingTableField = typeof(LinkProvider.LinkBuilder).GetField("_siteResolvingTable", BindingFlags.Static | BindingFlags.NonPublic);
if (siteResolvingTableField == null) 
{
    Response.Write("siteResolvingTableField is null<br />");
}
else 
{
    var siteResolvingTableFieldValue = siteResolvingTableField.GetValue(null);
    if (siteResolvingTableFieldValue == null) 
    {
        Response.Write("siteResolvingTableFieldValue is null<br />");
    }
    else 
    {
        var collection = siteResolvingTableFieldValue as IDictionary;
        Response.Write($"Number of entries: {collection.Count}<br /><br /> ");

        foreach (var element in collection) 
        {
            var entry = (DictionaryEntry)element;

            Response.Write($"Key.Path: {entry.Key.GetType().GetProperty("Path").GetValue(entry.Key, null)}<br />");
            Response.Write($"Key.Language: {entry.Key.GetType().GetProperty("Language").GetValue(entry.Key, null)}<br />");
            Response.Write($"Site: {((SiteInfo)entry.Value).Name}<br />");
            Response.Write("<br /><br />");
        }
    }
}

As expected, the output of this code was a list of all custom sites we defined in this solution. To get included in this list, a site must have a valid hostName and targetHostName attribute combination and some basic settings like a root path.

After copying this script to one of the affected servers, i finally saw the root cause of the issue: The table only contained the "website" site of Sitecore. This explained the behavior of the LinkManager class, but not the root cause.

4. Why does that happen?

At this stage, i already invested quite some time to find the culprit. What causes the issue? Now that i know what is happening, i extended my aspx file with my own override of the LinkProvider and LinkBuilder classes (thanks Sitecore for this experience) to be able to debug on the first hit. Strange thing: my version with the same logic produced the correct content of the site resolving table. Side to side with the wrong one on the same server. That made no sense at all.

After some further code inspection, i found the PreviewLinkBuilder class which inherits the LinkBuilder class and provides a specific implementation of the SiteCantBeResolved method (negated logic, yay). This version used the enablePreview attribute on the site to include it in the site resolving table. Wait, the same static site resolving table as generated by the parent LinkBuilder class?? Oh my. On a delivery system, the only site which defines enablePreview = true is the website site. Gotcha!

5. Oh, I see.

What causes this PreviewLinkBuilder to generate the site resolving table instead of the default LinkBuilder? It turns out, that it is used by the (not overridable) LinkManager.GetPreviewSiteContext method. This method is called at some points, but most of them are used within the Content Editor and Experience Editor. Nothing which should trigger on a delivery server.

The only suspect left was the ItemVisualization.Layout property, which can be called trough item.Visualization.Layout. It is called multiple times trough the code base and after some debugging, i found the Sitecore.Pipelines.HttpRequest.LayoutResolver to be the one which calls this property for the first time after Sitecore starts. If any logic before this httpRequestBegin-Processor calls LinkManager.GetItemUrl for an item which is not a child of the current Sitecore.Context.Site root path, the default LinkBuilder class is used for building the unpopulated site resolving table. If this is not the case, the LayoutResolver triggers the PreviewLinkBuilder which will create the table for the first time.

6. How did that ever work?

I'm not sure in which version of Sitecore, this PreviewLinkBuilder logic was introduced. We never experienced this behavior before and we have other 8.2 solutions which never had this problem.

The reason why this issue happened only on the customer delivery servers was the fact that the servers are loadbalanced and provide a status page to tell the world that it is functional. This page is a Sitecore item and contain a very simple MVC "NoLayout". This page is called each second by the load balancer and is likely the first hit which Sitecore recieves after an AppPool recycle. Other than most other calls to our solution, this one had no call to GetItemUrl which would create the "correct" site resolving table. Sometimes it was the external Search server which hit a page before the first status call was made. In those cases, the correct table was built and the solution was resolving the correct sites.

How to make the behavior reliable?

After looking into LinkProvider.GetPreviewSiteContext, there were two possible solutions for this issue:

  1. Set the setting Preview.ResolveSite to false on delivery systems. This prevents the usage of the PreviewLinkBuilder class and returns the Preview.DefaultSite instead. This ensures, that the default LinkBuilder is the first one to use the site resolving table.

  2. Leave the enablePreview attribute of all custom sites with the value true, as this would add all sites to the table, independent of the used link builder. This would add the website site to the table, but this is negligible as it is the last in the resolving chain.

As Sitecore was able to reproduce the behavior in a playground, they acknowledged this as a bug in the current version of Sitecore and told me that solution #1 is the best way to go in this case.

In my opinion, the PreviewLinkBuilder class should use its own site resolving table or create it on the fly each time, as it is unlikely, that it is used that many times. This was one of my biggest debugging odyssey with Sitecore ever. If the accessibility of the classes and methods would be more open, it would have taken much less time to debug. Please Sitecore, make all provider methods virtual!

TL;DR

Set Preview.ResolveSite to false on delivery systems. It CAN cause troubles otherwise.

No comments

Add your comment

Your email address will not be published. Required fields are marked *

*