The data length of a log entry must not exceed 65k. If an attempt is
made to log that much, break it up and duplicate the records. It may make
sense to indicate a continuation explicitly, but for now just extend.
str will tend to present a more normal looking error string. Use
that so that a user does not have the impression there is a code
issue on expected errors.
The bay number can be opportunisticly grabbed, provide
that info in the discovery api. In future, should add 'by-bay'
once we have enclosure data as well.
The data length of a log entry must not exceed 65k. If an attempt is
made to log that much, break it up and duplicate the records. It may make
sense to indicate a continuation explicitly, but for now just extend.
If an administrator clears the cert fingerprint, they will
likely set it to ''. In such a case, go down the 'no fingerprint'
path rather than reject it.
enclosure.bay is integer rather than string now. Fix the filter
to use format, which is more robust in numeric versus string anyway.
Also, consistently make the underlying data integer rather than
sometimes string.
Sometimes in a likely mismatched IP situation, some SLP things will manage to reply and slow
down. For now in the case of mismatched IPv4 being likely, provide a mode fixated on link local.
Provide a different scheme that does not involve a wait(), if by chance
the flow dies without getting back to our thread. wait() has no timeout
so this is a strategy to cope by making sure we hang for no longer than
3 minutes, which is well beyond any time a login should possibly take.
While it may not have been possible in eventlet for this to happen,
strictly speaking if it were a thread, it could exit during check for
liveness and leave data on the queue.
To be careful, also drain the queue after all children dead.
Provide a more concrete measurement of
children, rather than relying upon a
sentinel value on the queue. It seems
that even using 'finally' didn't assure
that we always get that sentinel value
before a worker dies. Sentinel value
still used to avoid a long wait in the
usual case.
If something went completely off the rails, it could easily fill up lots of memory with log entries in the 2 seconds it
would buffer. For now disable the buffering on key debug logs, as the main purpose was reducing IOPs in the per-node
console logs anyway. A future behavior may be to also limit the size and/or number of outstanding log entries before
committing to disk.
Most of the time, we don't need this pool. Create when needed,
and clean up after 30 seconds of inactivity. This avoids a slow
shutdown that was due to core python hanging in help_finish_stuff,
and as a bonus means most of the time, one only sees one confluent
process, which has been a source of questions already.
A redacted dump will not have a keys.json file, which
is natural. Replace 'file not found' with a message
indicating the possibility of a redacted dump.