Previously, it would register 2**x attribute watchers by mistake. Exponential
growth of threads trying to talk to one BMC is evidently a bad thing. Fix
this by correctly tracking and cancelling previous attribute watchers.
Additionally, mask a harmless exception brought on by the death of orphaned
pyghmi console objects by having them yell into the endless void rather
than trip on an exception.
The ipmi plugin, at least, is not yet quite right. Need to
continue debugging having a console session open, then changing
the bmc to a bad address, then changing it back. I fixed some of
the easier exceptions, but it is clearly still getting quite confused
to the point where 3 or 4 cycles guarantees the console can not easily heal.
Require user indicate 'console.method' rather than trying to guess.
Notably, console.method might not be desired in a configuration
that wishes only to use remote video.
The strategy was going to allow for a distinct IPMI account for automation
from other protocols. However, this is pretty complicated to explain to
people. The thought before was that the HTTPS/SSH type access could use
a passphrase that is easy to remember whilst ipmi accounts would tend
to be randomized. Instead, have the software managed authentication
info be used across all protocols and avail endpoint of user management
to add human-friendly accounts if needed (disabling IPMI/SNMP by default
in such cases).
Implement 'everything' group behavior
precheck group and node settings
do not create groups or nodes by default
Have httpapi preserve original query in case the plugin modifies it for accurate API
explorer output
Firmware fixes obsolete the need. The bad behavior on older firmware
is sufficiently tolerable that code to workaround that could have bad
side effects can reasonably be abolished.
To do performance optimization in this sort of application, this is
about as well as I have been able to manage in python. I will say perl with
NYTProf seems to be significantly better for data, but this is servicable.
I tried yappi, but it goes wildly inaccurate with this codebase. Because of
the eventlet plumbing, cProfile is still pretty misleading. Best strategy
seems to be review cumulative time with a healthy grain of salt around the
top items until you get down to info that makes sense. For example, trampoline
unfairly gets a great deal of the 'blame' by taking on nearly all the activity.
internal time seems to miss a great deal of important information.
Previously, the state would be seen as 'connected' and then 'disconnected' in event of
connection failing. Rework things such that the console session stays in 'connecting' state
until timeout or success occurs and don't send disconnect, instead raising an exception.
This makes the connection action a bit more intuitive to the user, who would assume a 'connected'
console means the endpoint was reachable. This may not always be possible in a console plugin,
but it's a nice pattern when possible. If a console plugin cannot tell when 'connected' happens, then
the previous behavior of this plugin makes sense as a 'best effort': return 'connected', send
disconnect event when the console turns out to be bad. For example, executable consoles are most
likely going to follow this pattern. An option could be for an executable to have a certain
signature to print to show 'connected' though...
This change causes cfg change notifications to more accurately reflect atomic
expectactions. If multiple fields are changed on multiple nodes that a watcher may
have registered, they will now get that data in one chunk instead of many.
Add ability for code to add watchers on nodes and their attributes. This is likely to
be reworked internally to better aggregate requests, but the code interface
is potentially complete.
It has been expressed that plural form for collection names are preferred. Additionally, tab
completion is nicer if names do not share so much leading characters.
The facility was incorrectly reassembling the text records in reverse order. With this change,
the set buffer from log function seems to be working as intended.
Before, it would delve back to set state if recent entries indicated to not
assert the tracked states. Correct that behavior so only last entry counts.
ESXi requires a distinctly different keypad mode and 'shift in' character set. Track
requests for those states. Reset on 'null' character, which seems to only be emitted by UEFI
so far. Ideally, things change such that we can remove that workaround.
Since the log analysis merely needs to know if a connect/disconnect is redundant,
only report 0, 1, or '2' connections to indicate 2 or greater. log analysis
then would want to seek out a connect with eventdata of '1' and disconnect with
eventdata of '0' and mostly ignore the '2' info. Desire for more data
could be done by actually counting the connects and disconnects, this is
just to provide a fast path to finding the 'first connection' and 'last disconnect'
signatures.
For a log reviewer tool to unambiguously understand whether a given user is conceivably watching,
more data is needed. It doesn't keep track of which disconnect goes with which connection, but
it at least provides a way of detecting whether user is truly disconnected or not.
Implement the bits and pieces that are at least required for conserver like logging.
This has a plaintext file and a binary metadata file. The plaintext file basically
resembles a conserver log, while the binary file facilitates faster seeking to points
of interest with the file and much more precise timestamp information.
'nodepower' is not yet 'rpower' like, but it's a quick
demonstration of using the python confluent client library. It
only supports unix socket as written
The pickling would get horrendously slow as total node count increased. This meant very long time to sync
to disk for just one change out of 65,000. This strategy changes things to more selective and only
do things for the dirty keys rather than everything. Large changes to small amounts of nodes will take
more time (because more calls to dump pickle), but small changes to a small subset of nodes will take much
less time.
There was an optimization to skip examination of groups if it was determined
that the group membership had not changed. However, this erroneously
masked the examination in the case of reordered groups. Skip the
optimization to cover that case at the expense of at least some needless churn.
This only happens when something goes to change group membership in some way, so
this shouldn't be too expensive.
Now when expressions can not be completed, the reason is presented as 'broken'.
Additionally, when unsetting a value that would affect expressions,
perform appropriate changes.