With significant firstboot output, there was a tendency
for tail to be killed before it relayed all the content.
Change to run the firstboot in a subshell in the background,
and have tail explicitly run until that subshell naturally
exits and then tail will cleanly exit
This allows user to designate certain networks to be treated as
if they were local.
This enables the initial token grant to be allowed to a remote network.
This still requires that the api be armed (which should generally be a narrow window of
opportunity) and that the
request be privileged, it
just allows remote networks to be
elevated to be as trusted as local.
This enables a more manual approach
to indicate the deployment server.
This carries the assumption that a
normal OS autonetwork config
will get the node to the right network.
This is one step toward enabling a scenario where the target is remote and the DHCP is not going to relay, but instead the deployment feeds the DHCP a confluent URL entry point to get going.
Using this parameter precludes:
-Enhanced NIC auto selection. If the OS auto-selection fails to
identify the correct interface, the profile will need nic name baked in.
-Auto-select deployment server from several. This will mean that any
HA will require IP takeover be externally handled
This is of course on top of the manual process of
indicating confluent in kernelargs.
Add a seting to allow user to suppress all DHCP offer during
PXE/HTTP activity. This enables configurations
where users want to externally manage filename explicitly in their own dhcp configuration.
In some environments, there's a desire to manually manage DHCP configuration.
In such a case, provide a url
that can be given to the dhcp server
to allow confluent to control the profile
without updating such a DHCP service.
With this change, a node can be told to boot:
http://confluentserver/confluent-api/booturl/by-node/n123/boot.ipxe
To be redirected to the currently applicable os profile.
The default timeout is overkill in the nodediscover scenario.
Notably, we can receive replies from unreachable IP addresses,
and those will extend rescan to the full timeout. The devices should
comfortably reply within 3 seconds, making scans exit in
a timely fashion.
Various parts of confluent that go to try to use
all the interfaces will now skip bond members.
One example problem is that joining the SSDP multicast
group for SSDP would cause the kernel to IGMPv6 out
on bond members as well as the bond itself. This change
ensures that the bond interface is only used and never
bypassed.
It is likely that a client connects from fe80::, which
is explicitly omitted from ssh principals.
This time, have the client provide all currently set IP addresses
and the server will make a determination.
There remains the possibility it misconfigures a nic and tries to use that,
inducing failure. One strategy would be to filter the addresses and
only provide from the 'current' interface. Another is to just take
the hit as the node is likely going to suffer a lot from such a
misconfiguration anyway.
When a node installs, it may not have it's node mapped address up,
or may not have one at all. Try to use the ip if it would be in the
same set that produced it's ssh certificate.
There remains a gap if a system has no static addressing *and* doesn't
map nodename to IP, but we have an impasse as the situation is too fuzzy
to grant a prinicpal in an SSH cert, and without that we can't securely
attempt rsync. For now, this scenario would still fail and I will
just hope that doesn't come up.
Some profiles may have all disk support suppressed through blacklist until %pre comes
along to fix it. This avoids /dev/disk ever existing.
Wait up until 10 seconds before giving up. This gives disk subsystem a fair chance to
speed up and avoid a wait, with a fallback worst case of 10 seconds