2
0
mirror of https://github.com/xcat2/confluent.git synced 2025-01-17 21:23:18 +00:00

776 Commits

Author SHA1 Message Date
Jarrod Johnson
7b160bd99c Fix namesmatch to actually return True
In the common case, we were falling through the bottom
without an explicit return.  Restructure things to both
explicitly return and look a bit more sane.
2016-07-15 16:47:42 -04:00
Jarrod Johnson
9516efd74a Merge branch 'master' into switchsupport 2016-07-14 11:01:07 -04:00
Jarrod Johnson
5410b394f2 Fix 'unset' on noderange
The Attributes management class was making shared shallow
copies.  This caused a problem when attributes class assumed
it could modify the result.  Correct by providing a deep copy
of that node's data when it is requested.
2016-07-14 09:56:40 -04:00
Jarrod Johnson
801a4c4b1e Merge branch 'switchsupport' of github.com:jjohnson42/confluent into switchsupport 2016-07-14 09:28:00 -04:00
Jarrod Johnson
29da853bcf Add mac map lookup against config to get node
This brings things right to the level of xCAT in
terms of underlying capability.  mac addresses have both
an all inclusive list of ports it is found on, and any nodes
that it matches.  It goes another step further by logging errors
when ambiguity is detected (either verbatim config conflict or
ambiguous result based on 'namesmatch' and the switch config).
2016-07-14 09:27:15 -04:00
Jarrod Johnson
7a72de6033 Improve behavior of mac map
One, include a number of 'fellow' mac addresses on the same port.
Another, allow a mac to appear on multiple ports and have that
reflected in the data structure.  Also capture errors to trace
log rather than hanging up on unexpected cases.
2016-07-14 09:27:15 -04:00
Jarrod Johnson
b9733b3e0e Provide config enabled switch mapping
Wire up the singleton switch search function to a function that
extracts list of switches and relevant auth data from the config
engine.  Add attributes to allow indication by hardware management
port connection.  The OS nics will be added later for in-band discovery,
but that's of limited value until PXE support anyway.

This time, the update function is a generator that yields as a sign to caller
that the mac map has had at least a partial update to be considered.
2016-07-14 09:27:15 -04:00
Jarrod Johnson
4aeb7e1df5 Provide a simple global 'log' function
As we implement internal processes with automation,
provide a hook for code to convey information about
situations encountered during background activity.
Ultimately, it is intended to hook event forwarders
for things like syslog/email/etc
2016-07-14 09:27:15 -04:00
Jarrod Johnson
147b3952e0 Implement the next layer of switch discovery
Refactor the snmputil to be object oriented to simplify upstream code.  Implement
a method to generate a mac address to ifName/ifDescr for a given switch.
2016-07-14 09:27:15 -04:00
Jarrod Johnson
54e135f210 Add a util function for SNMP
On the path to instrumenting network switches, first
we'll add some framework for SNMP.  Given that we are
using eventlet and thus we need a patchable SNMP,
we employ PySNMP, despite it being a bit peculiar.
This commit tucks away the oddness and makes it
pretty easy to use for our purposes.
2016-07-14 09:27:15 -04:00
Jarrod Johnson
958be7d004 Fix 'cd' to /noderange/nr in confetty
The cd performance optimization caused a problem.  This
commit recognizes /noderange/ as special auto-vivifying
directory that must be 'gotten'.
2016-07-14 09:15:49 -04:00
Jarrod Johnson
7a4c9a1fc0 Add mac map lookup against config to get node
This brings things right to the level of xCAT in
terms of underlying capability.  mac addresses have both
an all inclusive list of ports it is found on, and any nodes
that it matches.  It goes another step further by logging errors
when ambiguity is detected (either verbatim config conflict or
ambiguous result based on 'namesmatch' and the switch config).
2016-07-14 08:55:50 -04:00
Jarrod Johnson
9764a02419 Improve behavior of mac map
One, include a number of 'fellow' mac addresses on the same port.
Another, allow a mac to appear on multiple ports and have that
reflected in the data structure.  Also capture errors to trace
log rather than hanging up on unexpected cases.
2016-06-30 15:54:18 -04:00
Jarrod Johnson
f539a4e4b6 Provide config enabled switch mapping
Wire up the singleton switch search function to a function that
extracts list of switches and relevant auth data from the config
engine.  Add attributes to allow indication by hardware management
port connection.  The OS nics will be added later for in-band discovery,
but that's of limited value until PXE support anyway.

This time, the update function is a generator that yields as a sign to caller
that the mac map has had at least a partial update to be considered.
2016-06-29 16:32:46 -04:00
Jarrod Johnson
6b5f437a1c Provide a simple global 'log' function
As we implement internal processes with automation,
provide a hook for code to convey information about
situations encountered during background activity.
Ultimately, it is intended to hook event forwarders
for things like syslog/email/etc
2016-06-29 11:29:05 -04:00
Jarrod Johnson
8387f0e13e Implement the next layer of switch discovery
Refactor the snmputil to be object oriented to simplify upstream code.  Implement
a method to generate a mac address to ifName/ifDescr for a given switch.
2016-06-29 11:26:46 -04:00
Jarrod Johnson
ee679b745e Add a util function for SNMP
On the path to instrumenting network switches, first
we'll add some framework for SNMP.  Given that we are
using eventlet and thus we need a patchable SNMP,
we employ PySNMP, despite it being a bit peculiar.
This commit tucks away the oddness and makes it
pretty easy to use for our purposes.
2016-06-28 14:21:21 -04:00
Jarrod Johnson
3c876566a6 Switch to green DNS host resolution
The stock getaddrinfo can hang up all of confluent if misbehaving.
Patch pyghmi and switch to using greendns in confluent internal lookups.
1.3.0
2016-06-01 09:15:16 -04:00
Jarrod Johnson
f85ee82df3 Revert last portion of attempt at console auto-health
For now, it's causing more problems than it solved.  Back out until
a more appropriate time to investigate.
2016-05-25 13:09:10 -04:00
Jarrod Johnson
20abffdbee Revert "After 60 seconds of 'connect' limbo, kick a connection attempt"
This reverts commit e4aa8731413b736584e282470b41b35afd75a25d.
There may be some memory consumption issues with this feature.
2016-05-25 10:59:22 -04:00
Jarrod Johnson
2dd44b1725 Correct typo 2016-05-24 14:38:44 -04:00
Jarrod Johnson
f4e8dd497f Add missing utility commands to manifest 2016-05-24 14:36:36 -04:00
Jarrod Johnson
9a93baed0e Fix handling of unicode data in inventory
It is possible for unicode data to appear in some data values.  Use a unicode
string to hold the value, in case of unicode data coming from server.
2016-05-23 15:36:30 -04:00
Jarrod Johnson
41e84c7c47 Remove explict console health check
This is a resource consumption problem.  Defer such measures until later.
Investigation uncovered that there may have been another culprit anyway,
will see if only the other change (to kick a zoned out connection attempt)
suffices.
2016-05-23 13:49:46 -04:00
Jarrod Johnson
a046e4939f Fix ping before connection error
If ping() was called before connect, an exception was raised.  Fix this by
returning false in such an event.
2016-05-19 16:32:56 -04:00
Jarrod Johnson
e4aa873141 After 60 seconds of 'connect' limbo, kick a connection attempt
Occasionally it was observed that systems would be just stuck in 'connect',
provide a backup system to detect and forcibly kick the console in such a case.
2016-05-19 15:39:04 -04:00
Jarrod Johnson
ec02097b52 Explicitly check IPMI console health
In theory, pyghmi should be doing a self-health check.  It has been discovered at scale that
this self-health check may encounter issues.  For now, try to workaround by having another
health check at the confluent level, deferred by console activity.  It's also spaced far apart
so it should not significantly add to idle load (one check every ~5 minutes, spread out).
2016-05-19 14:44:28 -04:00
Jarrod Johnson
5d105c43e5 Add option to skip numberless
Many sensors in nodesensors are not useful except when
evaluated as part of nodehealth.  Provide an option to allow people
to skip such sensors.  Particularly useful in generating time series CSV
data.
2016-05-12 15:53:55 -04:00
Jarrod Johnson
ca91cfb220 Add nodefirmware command
This command currently enumerates current firmware on the target.  In the future it may be extended to update.
2016-05-12 11:04:26 -04:00
Jarrod Johnson
b328c53d91 Fix error handling for nodeinventory command
Cleanly handle error messages from server
2016-05-12 10:25:35 -04:00
Jarrod Johnson
129f034c07 Provide some more friendly string values
Some keys from the API are a little weird, provide a mapping for them.
2016-05-12 09:19:30 -04:00
Jarrod Johnson
b5fbfe730d Add nodeinventory command
Provide a native confluent client alternative to 'rinv'.
Also add missing flags to nodesetboot.
2016-05-11 17:03:05 -04:00
Jarrod Johnson
d9e47824a4 Backoff automatic reconnect interval
Previously, offline nodes would be rechecked automatically on average every 45 seconds.  Extend this
to on average 180 seconds, to reduce ARP traffic significantly when there are a large volume of
undefined nodes.  The 'try to connect on open' behavior is retained, so this would mean a longer loss
of connectivity only in a background monitored session.
2016-05-11 13:33:36 -04:00
Jarrod Johnson
96670784f9 Automatically increase limits
Knowing ahead of time that confluent is the sort of app that, despite
best efforts, is filehandle heavy, auto-attempt to raise soft to
be equal to hard limit.  A sufficiently large cluster (i.e. more than 2000
nodes) would still need to have limit adjusted at system level for now.
2016-05-10 14:44:52 -04:00
Jarrod Johnson
16c7429900 Improve interactive performance of 'cd' to slow collections
Sometimes a collection will be slow.  Don't inflict the 'cd' with the slowness, defer until actually
asked to do something that would enumerate said collection.  Accomplish this by checking for
the 'cd' target in it's parent collection, rather than asking to list its contents.
2016-05-09 15:39:05 -04:00
Jarrod Johnson
14f6fabe0a Do not trigger AttributeError on Null event
In the scenario where event is present but 'None', handle the situation more gracefully, by ignoring it's existance.
2016-05-09 13:59:50 -04:00
Jarrod Johnson
d5e833480e Tolerate gdbm
gdbm backend does not support the 'iterkeys' interface directly,
requiring instead to manually traverse.  Unfortunately, dbhash
does not implement the gdbm interface for this, so we have
to have two codepaths.
2016-05-02 10:44:12 -04:00
Jarrod Johnson
e949ee932a Implement 'persistent' option for nextdevice
Some systems provide the functionality, provide the message support
to do that.
2016-04-28 13:11:25 -04:00
Jarrod Johnson
b524af08b3 Add back explicit patching of portions of pyghmi
The previous commit produced significant problems.  pyghmi
late binds those values, so they must be explicitly patched.
2016-04-22 17:15:17 -04:00
Jarrod Johnson
df74753908 Patch import of pyghmi
Now that the problematic use of an os pipe is no more,
go ahead and patch pyghmi in a straightforward way.  This
was needed for the sake of pyghmi plugins that use a webclient.
2016-04-22 17:01:56 -04:00
Jarrod Johnson
bb0e256a98 Convert datetime objects to ISO8601 on the way out
If a plugin iterates a datetime object, decode to ISO-8601 string
on the way out.  This allows plugins to work directly with datetime
objects and allow the messaging layer to normalize it to ISO-8601
2016-04-20 16:51:01 -04:00
Jarrod Johnson
26da687dc3 Do not organize 'databynode' when not node
Messages that were not a node (e.g. confluent users) erroneously
had data put into 'databynode'.  Correct the mistake by omitting
the insertion of databynode when the message is clearly not a node
related thing.
2016-04-14 13:31:54 -04:00
Jarrod Johnson
fa3a402708 Provide some shortcuts for nodelist
Allow nodelist to request view of a category at a time.
Also recognize 'hm' as shorthand for 'hardwaremanagement'.
2016-04-12 15:18:31 -04:00
Jarrod Johnson
0672666e42 Assure that get_health always updates inhealth
If an unforseen circumstance occurs while trying to get health,
make sure we recognize that scenario.
2016-04-12 14:16:15 -04:00
Jarrod Johnson
d4357c6984 Avoid double-removal of attrib watcher in ipmi
IPMI plugin was issuing redundant calls to remove the same
watcher.  Track that a session has already unhooked to
avoid double unhook (which runs at least a slight risk
of unhooking the wrong handler (*if* it were allowed).
2016-04-12 13:04:02 -04:00
Jarrod Johnson
4ba8a7a997 Dedupe concurrent ipmi health requests
IPMI health requests are relatively expensive.  It's
also pretty popular and therefore prone to be the target of
inadvertantly aggressive concurrent requests.  Mitigate the harm
by detecting concurrent usage and having callers share an answer.
2016-04-12 10:28:01 -04:00
Jarrod Johnson
fa0c0ce81a Add paramiko and update package names in server 2016-04-11 13:06:05 -04:00
Jarrod Johnson
d3bda4217c Add paramiko to the requirements 2016-04-11 11:51:11 -04:00
Jarrod Johnson
22509946c0 Reduce verbosity of audit log
There are a number of pretty innocuous requests that
need not be individually tracked.  For such requests,
we'll abstain from putting it into the log.
2016-04-08 16:51:32 -04:00
Jarrod Johnson
f8b878b5f4 Unhook attribute watch on dead sessions
When a session is dead, it need not be told about
changes to config.  Save time and sanity by reaping
when discarding a dead session.
2016-04-05 13:57:47 -04:00