Enable a WebUI to request an auth token. This will allow it to indicate it is running in a browser and have the server implement protections such that
other software in the browser cannot send arbitrary requests into the server API.
This is implemented in a backward compatible fashion, allowing, for example, purely non-browser clients to ignore the CSRF protection as
it doesn't apply to that use case.
Consoles starting up would potentially delay API availaility. Change
by having the API having ample time to startup, then commence the
busy work of starting cnosole sessions.
Do a better job of cleanly handling scenarios
where disconnect would come from a session currently
disconnected. Inside the ipmi plugin, suppress a
disconnect event if one has been sent. Inside
consoleserver, surpress logging a disconnect when
already disconnected.
Originally was going to skip the reconnect, but that would
mitigate recovery. Hopefully supressing the duplicate
disconnect in ipmi plugin, and some fixes in pyghmi will
avoid a 'double connect' scenario.
If python system module had a name that conflicted in some way
with a plugin, the plugin load would fail. Fix this by prioritizing
the plugin path over system locations. Also, to avoid the breakage
going the other way, remove the plugindir from the system path when
that particular directory is done.
HTTP console API did not have a means to send break
or request session reopen. Rectify this discrepency
by adding an 'action' key to request certain console
specific actions. In retrospect, closing the session
should have just been an 'action', but leaving things
as-is.
In the common case, we were falling through the bottom
without an explicit return. Restructure things to both
explicitly return and look a bit more sane.
The Attributes management class was making shared shallow
copies. This caused a problem when attributes class assumed
it could modify the result. Correct by providing a deep copy
of that node's data when it is requested.
This brings things right to the level of xCAT in
terms of underlying capability. mac addresses have both
an all inclusive list of ports it is found on, and any nodes
that it matches. It goes another step further by logging errors
when ambiguity is detected (either verbatim config conflict or
ambiguous result based on 'namesmatch' and the switch config).
One, include a number of 'fellow' mac addresses on the same port.
Another, allow a mac to appear on multiple ports and have that
reflected in the data structure. Also capture errors to trace
log rather than hanging up on unexpected cases.
Wire up the singleton switch search function to a function that
extracts list of switches and relevant auth data from the config
engine. Add attributes to allow indication by hardware management
port connection. The OS nics will be added later for in-band discovery,
but that's of limited value until PXE support anyway.
This time, the update function is a generator that yields as a sign to caller
that the mac map has had at least a partial update to be considered.
As we implement internal processes with automation,
provide a hook for code to convey information about
situations encountered during background activity.
Ultimately, it is intended to hook event forwarders
for things like syslog/email/etc
Refactor the snmputil to be object oriented to simplify upstream code. Implement
a method to generate a mac address to ifName/ifDescr for a given switch.
On the path to instrumenting network switches, first
we'll add some framework for SNMP. Given that we are
using eventlet and thus we need a patchable SNMP,
we employ PySNMP, despite it being a bit peculiar.
This commit tucks away the oddness and makes it
pretty easy to use for our purposes.
This brings things right to the level of xCAT in
terms of underlying capability. mac addresses have both
an all inclusive list of ports it is found on, and any nodes
that it matches. It goes another step further by logging errors
when ambiguity is detected (either verbatim config conflict or
ambiguous result based on 'namesmatch' and the switch config).
One, include a number of 'fellow' mac addresses on the same port.
Another, allow a mac to appear on multiple ports and have that
reflected in the data structure. Also capture errors to trace
log rather than hanging up on unexpected cases.
Wire up the singleton switch search function to a function that
extracts list of switches and relevant auth data from the config
engine. Add attributes to allow indication by hardware management
port connection. The OS nics will be added later for in-band discovery,
but that's of limited value until PXE support anyway.
This time, the update function is a generator that yields as a sign to caller
that the mac map has had at least a partial update to be considered.
As we implement internal processes with automation,
provide a hook for code to convey information about
situations encountered during background activity.
Ultimately, it is intended to hook event forwarders
for things like syslog/email/etc
Refactor the snmputil to be object oriented to simplify upstream code. Implement
a method to generate a mac address to ifName/ifDescr for a given switch.
On the path to instrumenting network switches, first
we'll add some framework for SNMP. Given that we are
using eventlet and thus we need a patchable SNMP,
we employ PySNMP, despite it being a bit peculiar.
This commit tucks away the oddness and makes it
pretty easy to use for our purposes.
This is a resource consumption problem. Defer such measures until later.
Investigation uncovered that there may have been another culprit anyway,
will see if only the other change (to kick a zoned out connection attempt)
suffices.
Occasionally it was observed that systems would be just stuck in 'connect',
provide a backup system to detect and forcibly kick the console in such a case.
In theory, pyghmi should be doing a self-health check. It has been discovered at scale that
this self-health check may encounter issues. For now, try to workaround by having another
health check at the confluent level, deferred by console activity. It's also spaced far apart
so it should not significantly add to idle load (one check every ~5 minutes, spread out).
Many sensors in nodesensors are not useful except when
evaluated as part of nodehealth. Provide an option to allow people
to skip such sensors. Particularly useful in generating time series CSV
data.
Previously, offline nodes would be rechecked automatically on average every 45 seconds. Extend this
to on average 180 seconds, to reduce ARP traffic significantly when there are a large volume of
undefined nodes. The 'try to connect on open' behavior is retained, so this would mean a longer loss
of connectivity only in a background monitored session.
Knowing ahead of time that confluent is the sort of app that, despite
best efforts, is filehandle heavy, auto-attempt to raise soft to
be equal to hard limit. A sufficiently large cluster (i.e. more than 2000
nodes) would still need to have limit adjusted at system level for now.
Sometimes a collection will be slow. Don't inflict the 'cd' with the slowness, defer until actually
asked to do something that would enumerate said collection. Accomplish this by checking for
the 'cd' target in it's parent collection, rather than asking to list its contents.
gdbm backend does not support the 'iterkeys' interface directly,
requiring instead to manually traverse. Unfortunately, dbhash
does not implement the gdbm interface for this, so we have
to have two codepaths.