Pre-migration compatibility checks
It's not possible, or at least not wise, to allow migration from any host to any other host. There are a whole host of things that need to be set up precisely the same on both the source and destination side for migration to have any chance of working. Libvirt should have some sort of pre-migration call that, given a destination host, will tell you whether a migration from here to there is likely to succeed.
For such a call to be widely useful, the libvirt API would have to allow a caller to specify which of the available checks they are interested in, and then have some sort of return value that indicated if there are show-stopper problems, or just problems that may cause things to be sub-optimal on the remote side. The caller can then decide what action it wants to take.
The corollary to this "will migrate succeed" call is a call which, given two hosts A and B, figure out the lowest common denominator a guest needs to be run at so that migration in the future will likely be successful.
There are a lot of things that can/should be checked on both the source and destination side; the list includes:
- Matching hypervisors - obvious, but there are no checks today. Make sure we don't migrate between Xen and KVM (for instance), or vice-versa. Warnings for migrating from a "new" hypervisor (say, Xen 3.2) to an "old" hypervisor (say, Xen 3.1). That should work, in theory, but maybe the caller would prefer not to do that if possible. KVM and Xen are currently *accidentally* incompatible, but this should be explicit
- Matching CPU architectures - Make sure the architecture of the destination can run a guest of the type that is currently running on the source. For instance, an x86_64 hypervisor might be able to run i386 guests (subject to hypervisor versioning, of course), but an ia64 hypervisor can't run an x86_64 guest.
- CPU flags - the CPU flags of the destination must be a superset of the CPU flags that were presented to the guest at startup. Many OS's and applications check for CPU flags once at startup to choose optimized routines, and then never check again. If they happen to select sse3, and sse3 is not there on the destination, they will take an unrecoverable GPF the next time they execute one of those instructions, and crash.
This is where CPU masking technology and the "lowest common denomintor" API can make a difference. If a guest is started with some CPU flags masked off, it widens the potential migration pool for that guest.
- Number of CPUs - for performance reasons, the destination must usually have one physical CPU for each guest virtual CPU. However, for temporary or emergency situations, this may not be a hard requirement, so the caller would choose whether to check for and act on this.
- non-NUMA -> non-NUMA
- non-NUMA -> NUMA
- NUMA -> non-NUMA
- NUMA -> NUMA