The key to doing long pans or dolly shots is to "unlink" the shape search area from the tracking data to create an offset. This is a very useful techniqe - as pixels come into the search area, they are offsetting the track. It covers a very useful concept: understanding that the search area (shape) and surface (track data) have a linked relationship, but the search area is very flexible with keyframes etc.
You can apply this same technique to a 3D solve as mocha's 3D camera solver is based on 2D tracked planes.