A variant of the M.2 RAID enablement kit does not manifest with nvme
driver. Address this by allowing 'nvm' subsystype. to allow blank driver.
Also, to be on the safe side, have self.driver always be a string,
so it can be 'falsey' but still work as a string.
Provide mechanism for administrator to place a custom
key for potential interactive recovery into
/var/lib/confluent/private/os/<profile>/pending/luks.key
If not provided, generate a unique one for each install.
Either way, persist the key in /etc/confluent/luks.key, to
facilitate later resealing if the user wants (clevis nor systemd
prior to 256 supports unlock via TPM2, so keyfile is required
for now).
Migrating to otherwise escrowed passphrases and/or sealing to
specific TPMs will be left to operators and/or third parties.
Sometimes stateful install can fail if vgchange -a n is run after dd.
Use wipefs instead and fix order of both commands.
Furthermore, use the $INSALLDISK variable.
One issue is that there are multiple networkmanager connections,
clean this up, though this seems not to be a functional issue.
However, sometimes the lldpad usage screws up network configuration,
disable the facility by forcibly disabling fcoe sincec that is what triggers lldpad.
wq
If syncfiles fails, keep it retrying.
Also, slow down sync checking to avoid hammering the system.
Further, randomized delay to spread highly synchronized requestors.
Block attempts to do multiple concurrent syncfile runs.
Some versions start manifesting nvme devnames with 'c', which
are to be used to interact with multipath to have raw devices
backing a traditional nvme device.
When udev is populating the disk hierarchy, it can be a long time
before the 'by-label' is specifically ready.
Wait for that specific entry to come along before continuing to
check if there's an identity image.
confignet is special, it is designed
to work when networking
isn't right. So have it run during firstboot
in case post fouled up
the network for firstboot.
With significant firstboot output, there was a tendency
for tail to be killed before it relayed all the content.
Change to run the firstboot in a subshell in the background,
and have tail explicitly run until that subshell naturally
exits and then tail will cleanly exit
This enables a more manual approach
to indicate the deployment server.
This carries the assumption that a
normal OS autonetwork config
will get the node to the right network.
This is one step toward enabling a scenario where the target is remote and the DHCP is not going to relay, but instead the deployment feeds the DHCP a confluent URL entry point to get going.
Using this parameter precludes:
-Enhanced NIC auto selection. If the OS auto-selection fails to
identify the correct interface, the profile will need nic name baked in.
-Auto-select deployment server from several. This will mean that any
HA will require IP takeover be externally handled
This is of course on top of the manual process of
indicating confluent in kernelargs.
It is likely that a client connects from fe80::, which
is explicitly omitted from ssh principals.
This time, have the client provide all currently set IP addresses
and the server will make a determination.
There remains the possibility it misconfigures a nic and tries to use that,
inducing failure. One strategy would be to filter the addresses and
only provide from the 'current' interface. Another is to just take
the hit as the node is likely going to suffer a lot from such a
misconfiguration anyway.
Some profiles may have all disk support suppressed through blacklist until %pre comes
along to fix it. This avoids /dev/disk ever existing.
Wait up until 10 seconds before giving up. This gives disk subsystem a fair chance to
speed up and avoid a wait, with a fallback worst case of 10 seconds