If a node is deleted, act similar to if it were defined with no console.method, to avoid
superfluous trace output. In the future, it may make sense to filter out nodes with no
console.method earlier, since a fair amount of startup work is done that is ultimately ignored
for situations where console is not enabled.
Do a better job of cleanly handling scenarios
where disconnect would come from a session currently
disconnected. Inside the ipmi plugin, suppress a
disconnect event if one has been sent. Inside
consoleserver, surpress logging a disconnect when
already disconnected.
Originally was going to skip the reconnect, but that would
mitigate recovery. Hopefully supressing the duplicate
disconnect in ipmi plugin, and some fixes in pyghmi will
avoid a 'double connect' scenario.
Occasionally it was observed that systems would be just stuck in 'connect',
provide a backup system to detect and forcibly kick the console in such a case.
In theory, pyghmi should be doing a self-health check. It has been discovered at scale that
this self-health check may encounter issues. For now, try to workaround by having another
health check at the confluent level, deferred by console activity. It's also spaced far apart
so it should not significantly add to idle load (one check every ~5 minutes, spread out).
Previously, offline nodes would be rechecked automatically on average every 45 seconds. Extend this
to on average 180 seconds, to reduce ARP traffic significantly when there are a large volume of
undefined nodes. The 'try to connect on open' behavior is retained, so this would mean a longer loss
of connectivity only in a background monitored session.
If exiting from a shell session, the databuffer will contain needed info for the client
to work properly. Preserve databuffer existence. Responsibility for deleting the
object should be in the hands of the caller.
This provides a method for client to request session be closed down. This provides more
immediate responsiveness to the client count when closing such a terminal. With this
both closing a single window and doing a 'logout' immediately impacts clientcount.
Previously, was using counters to track the relation, also had distinct tracking of users versus
callbacks. Unify the callback and user into a single 'session' attach and then use the size
of the set of sessions and their declared users rather than trying to maintain a counter on the side.
This change simplifies the relationship, changes away the logging and clientcount counter for
a more robust strategy, and paves the way for the dependent ShellHandler to terminate connected
sessions when the shell session dies.
When a shell session is initiated, it registers
a recipient at the same time it would be trying
to establish session for not being a 'wait for
recipient'. Aggressively mark the state as connecting
to avoid the recipient erroneously thinking things have
not be set into motion yet. Additionally, have the ssh
plugin avoid a traceback when disconnecting before completing
connection.
console logging assumptions are not valid for shell sessions.
Correct by modifying the buffer init code to be conditional
and adding a stub 'log' to the ShellHandler class.
Provide a common 'shellserver' capability cloned off of 'consoleserver'.
This will enable the concept of per-user shells with option for multiple
shells per. Each user will have their own set of shell sessions rather
than shared across users. Can revisit in future if sharing between
users is desired.
While the client can handle it now, have the server
avoid needless processing of '' data from a console
provider. Address it at the deepest level (the
tlvdata implementation) and a place higher up the stack
to avoid hits to log and such.
The root cause for negative has not been determined,
however reduce the hypothetical exposure to the issue
in the hopes of filtering out extraneous problems.
The buffer age was not working as intended
The fix to exit on error exited overly eagerly
The log replay failed to report a third value if file did not exist.
When logging was changed from none to full, it would always
start and immediately abort connecting to start again.
Change this by deciding which connection liveness strategy
to use based on how many settings changed. If just logging
changes, then connect only if not connected or connecting.
If more changes, then skip that kinder strategy and go straight
to reconnecting.
It turns out that eventlet.green.threading.Event() doesn't behave very efficiently in this context for whatever reason.
Use eventlet.event.Event() instead. It was not used before due to lack of timeout and clear, but that is overcome by
disposing of it rather than reusing and using with eventlet.Timeout() to add timeout to wait that doesn't have built in timeout.