POWER9 POWERVM Incomplete State During I/O Concurrent Repair or Dynamic LPAR of IBM i owned hardware

Pete Massiello, President, iTech Solutions
Problem
This is only occurring on Power9 servers, and you need to get the FSP firmware updated, as well as a patch for IBM i. If you don’t have the patches, do not use concurrent maintenance, or dynamic LPAR on POWER9 servers under the following conditions:
- The resource is a physical PCIe adapter, an EMX0 cable, fanout module, chassis management card, or midplane.
- The resource physically owned by an IBM i partition
- The firmware fix is not applied.
Without the fix applied, the server can go to an incomplete state during the operation requiring server level IPL to recover.
An incomplete state can occur only on a POWER9 PowerVM managed server at some time after one of the following operations:
- Dynamic Logical Partition (DLPAR) operation to remove or move a physical PCIe adapter from a logical partition (LPAR).
- Concurrent replacement of a physical PCIe adapter assigned to an LPAR.
- Concurrent replacement of a Cable Card, cable, fanout module, mid-plane, or chassis management card from an EMX0 I/O drawer.
The condition is set up when one of these operations is blocked by the hypervisor because the resource is still in use. When concurrent maintenance is performed, the hypervisor ends with a return code of 0x0300. When the partition is powered down and the operation retried, the server can go to an incomplete state. Currently, the only known trigger for this problem resides in IBM i operating system.
Once the server is in an incomplete state from this defect, the server must be re-IPLed to recover management full operational capability.
Resolving the Problem
A firmware fix can be applied to prevent this incomplete state from occurring. This fix is provided in the following system firmware fix levels:
FW Level | Released or Planned | Date |
FW910.50 Vx910_xxx | Planned | |
FW920.50 Vx920_118 | Released | November 25 2019 |
FW930.20 Vx930_xxx | Planned | |
FW940.00 Vx940_027 | Released | November 22 2019 |
Future firmware releases contain this fix.
The following IBM i APARs must be applied to partitions owning PCIe hardware in order to prevent the problem from being triggered.
APAR MA47837 LIC-OTHER-SRCB6005120-INCORROUT POWER9 DLPAR-add fails
R740 In progress
R730 MF66695 Not yet in a cumulative package
R720 MF66544 Not yet in a cumulative package
APAR MA47943 LIC-OTHER-SRCB6006965-INCORROUT DLPAR remove after add
R740 In progress
R730 MF66865 Not yet in a cumulative package
R720 In progress
If you need help in upgrading your firmware, or applying the PTFs for IBM i, please contact us. We also offer a subscription package where we will do 3 sets of PTFs over 2 years, and 1 OS upgrade over that same 2 year period, all for only $295 a month. Take the worry out of PTFs.
Pete, nice article, but we have this issue with latest vHMC connected to a brand new S914 with latest server firmware. Any clue?