2
0
mirror of https://github.com/xcat2/confluent.git synced 2025-01-17 21:23:18 +00:00

764 Commits

Author SHA1 Message Date
Jarrod Johnson
9764a02419 Improve behavior of mac map
One, include a number of 'fellow' mac addresses on the same port.
Another, allow a mac to appear on multiple ports and have that
reflected in the data structure.  Also capture errors to trace
log rather than hanging up on unexpected cases.
2016-06-30 15:54:18 -04:00
Jarrod Johnson
f539a4e4b6 Provide config enabled switch mapping
Wire up the singleton switch search function to a function that
extracts list of switches and relevant auth data from the config
engine.  Add attributes to allow indication by hardware management
port connection.  The OS nics will be added later for in-band discovery,
but that's of limited value until PXE support anyway.

This time, the update function is a generator that yields as a sign to caller
that the mac map has had at least a partial update to be considered.
2016-06-29 16:32:46 -04:00
Jarrod Johnson
6b5f437a1c Provide a simple global 'log' function
As we implement internal processes with automation,
provide a hook for code to convey information about
situations encountered during background activity.
Ultimately, it is intended to hook event forwarders
for things like syslog/email/etc
2016-06-29 11:29:05 -04:00
Jarrod Johnson
8387f0e13e Implement the next layer of switch discovery
Refactor the snmputil to be object oriented to simplify upstream code.  Implement
a method to generate a mac address to ifName/ifDescr for a given switch.
2016-06-29 11:26:46 -04:00
Jarrod Johnson
ee679b745e Add a util function for SNMP
On the path to instrumenting network switches, first
we'll add some framework for SNMP.  Given that we are
using eventlet and thus we need a patchable SNMP,
we employ PySNMP, despite it being a bit peculiar.
This commit tucks away the oddness and makes it
pretty easy to use for our purposes.
2016-06-28 14:21:21 -04:00
Jarrod Johnson
3c876566a6 Switch to green DNS host resolution
The stock getaddrinfo can hang up all of confluent if misbehaving.
Patch pyghmi and switch to using greendns in confluent internal lookups.
1.3.0
2016-06-01 09:15:16 -04:00
Jarrod Johnson
f85ee82df3 Revert last portion of attempt at console auto-health
For now, it's causing more problems than it solved.  Back out until
a more appropriate time to investigate.
2016-05-25 13:09:10 -04:00
Jarrod Johnson
20abffdbee Revert "After 60 seconds of 'connect' limbo, kick a connection attempt"
This reverts commit e4aa8731413b736584e282470b41b35afd75a25d.
There may be some memory consumption issues with this feature.
2016-05-25 10:59:22 -04:00
Jarrod Johnson
2dd44b1725 Correct typo 2016-05-24 14:38:44 -04:00
Jarrod Johnson
f4e8dd497f Add missing utility commands to manifest 2016-05-24 14:36:36 -04:00
Jarrod Johnson
9a93baed0e Fix handling of unicode data in inventory
It is possible for unicode data to appear in some data values.  Use a unicode
string to hold the value, in case of unicode data coming from server.
2016-05-23 15:36:30 -04:00
Jarrod Johnson
41e84c7c47 Remove explict console health check
This is a resource consumption problem.  Defer such measures until later.
Investigation uncovered that there may have been another culprit anyway,
will see if only the other change (to kick a zoned out connection attempt)
suffices.
2016-05-23 13:49:46 -04:00
Jarrod Johnson
a046e4939f Fix ping before connection error
If ping() was called before connect, an exception was raised.  Fix this by
returning false in such an event.
2016-05-19 16:32:56 -04:00
Jarrod Johnson
e4aa873141 After 60 seconds of 'connect' limbo, kick a connection attempt
Occasionally it was observed that systems would be just stuck in 'connect',
provide a backup system to detect and forcibly kick the console in such a case.
2016-05-19 15:39:04 -04:00
Jarrod Johnson
ec02097b52 Explicitly check IPMI console health
In theory, pyghmi should be doing a self-health check.  It has been discovered at scale that
this self-health check may encounter issues.  For now, try to workaround by having another
health check at the confluent level, deferred by console activity.  It's also spaced far apart
so it should not significantly add to idle load (one check every ~5 minutes, spread out).
2016-05-19 14:44:28 -04:00
Jarrod Johnson
5d105c43e5 Add option to skip numberless
Many sensors in nodesensors are not useful except when
evaluated as part of nodehealth.  Provide an option to allow people
to skip such sensors.  Particularly useful in generating time series CSV
data.
2016-05-12 15:53:55 -04:00
Jarrod Johnson
ca91cfb220 Add nodefirmware command
This command currently enumerates current firmware on the target.  In the future it may be extended to update.
2016-05-12 11:04:26 -04:00
Jarrod Johnson
b328c53d91 Fix error handling for nodeinventory command
Cleanly handle error messages from server
2016-05-12 10:25:35 -04:00
Jarrod Johnson
129f034c07 Provide some more friendly string values
Some keys from the API are a little weird, provide a mapping for them.
2016-05-12 09:19:30 -04:00
Jarrod Johnson
b5fbfe730d Add nodeinventory command
Provide a native confluent client alternative to 'rinv'.
Also add missing flags to nodesetboot.
2016-05-11 17:03:05 -04:00
Jarrod Johnson
d9e47824a4 Backoff automatic reconnect interval
Previously, offline nodes would be rechecked automatically on average every 45 seconds.  Extend this
to on average 180 seconds, to reduce ARP traffic significantly when there are a large volume of
undefined nodes.  The 'try to connect on open' behavior is retained, so this would mean a longer loss
of connectivity only in a background monitored session.
2016-05-11 13:33:36 -04:00
Jarrod Johnson
96670784f9 Automatically increase limits
Knowing ahead of time that confluent is the sort of app that, despite
best efforts, is filehandle heavy, auto-attempt to raise soft to
be equal to hard limit.  A sufficiently large cluster (i.e. more than 2000
nodes) would still need to have limit adjusted at system level for now.
2016-05-10 14:44:52 -04:00
Jarrod Johnson
16c7429900 Improve interactive performance of 'cd' to slow collections
Sometimes a collection will be slow.  Don't inflict the 'cd' with the slowness, defer until actually
asked to do something that would enumerate said collection.  Accomplish this by checking for
the 'cd' target in it's parent collection, rather than asking to list its contents.
2016-05-09 15:39:05 -04:00
Jarrod Johnson
14f6fabe0a Do not trigger AttributeError on Null event
In the scenario where event is present but 'None', handle the situation more gracefully, by ignoring it's existance.
2016-05-09 13:59:50 -04:00
Jarrod Johnson
d5e833480e Tolerate gdbm
gdbm backend does not support the 'iterkeys' interface directly,
requiring instead to manually traverse.  Unfortunately, dbhash
does not implement the gdbm interface for this, so we have
to have two codepaths.
2016-05-02 10:44:12 -04:00
Jarrod Johnson
e949ee932a Implement 'persistent' option for nextdevice
Some systems provide the functionality, provide the message support
to do that.
2016-04-28 13:11:25 -04:00
Jarrod Johnson
b524af08b3 Add back explicit patching of portions of pyghmi
The previous commit produced significant problems.  pyghmi
late binds those values, so they must be explicitly patched.
2016-04-22 17:15:17 -04:00
Jarrod Johnson
df74753908 Patch import of pyghmi
Now that the problematic use of an os pipe is no more,
go ahead and patch pyghmi in a straightforward way.  This
was needed for the sake of pyghmi plugins that use a webclient.
2016-04-22 17:01:56 -04:00
Jarrod Johnson
bb0e256a98 Convert datetime objects to ISO8601 on the way out
If a plugin iterates a datetime object, decode to ISO-8601 string
on the way out.  This allows plugins to work directly with datetime
objects and allow the messaging layer to normalize it to ISO-8601
2016-04-20 16:51:01 -04:00
Jarrod Johnson
26da687dc3 Do not organize 'databynode' when not node
Messages that were not a node (e.g. confluent users) erroneously
had data put into 'databynode'.  Correct the mistake by omitting
the insertion of databynode when the message is clearly not a node
related thing.
2016-04-14 13:31:54 -04:00
Jarrod Johnson
fa3a402708 Provide some shortcuts for nodelist
Allow nodelist to request view of a category at a time.
Also recognize 'hm' as shorthand for 'hardwaremanagement'.
2016-04-12 15:18:31 -04:00
Jarrod Johnson
0672666e42 Assure that get_health always updates inhealth
If an unforseen circumstance occurs while trying to get health,
make sure we recognize that scenario.
2016-04-12 14:16:15 -04:00
Jarrod Johnson
d4357c6984 Avoid double-removal of attrib watcher in ipmi
IPMI plugin was issuing redundant calls to remove the same
watcher.  Track that a session has already unhooked to
avoid double unhook (which runs at least a slight risk
of unhooking the wrong handler (*if* it were allowed).
2016-04-12 13:04:02 -04:00
Jarrod Johnson
4ba8a7a997 Dedupe concurrent ipmi health requests
IPMI health requests are relatively expensive.  It's
also pretty popular and therefore prone to be the target of
inadvertantly aggressive concurrent requests.  Mitigate the harm
by detecting concurrent usage and having callers share an answer.
2016-04-12 10:28:01 -04:00
Jarrod Johnson
fa0c0ce81a Add paramiko and update package names in server 2016-04-11 13:06:05 -04:00
Jarrod Johnson
d3bda4217c Add paramiko to the requirements 2016-04-11 11:51:11 -04:00
Jarrod Johnson
22509946c0 Reduce verbosity of audit log
There are a number of pretty innocuous requests that
need not be individually tracked.  For such requests,
we'll abstain from putting it into the log.
2016-04-08 16:51:32 -04:00
Jarrod Johnson
f8b878b5f4 Unhook attribute watch on dead sessions
When a session is dead, it need not be told about
changes to config.  Save time and sanity by reaping
when discarding a dead session.
2016-04-05 13:57:47 -04:00
Jarrod Johnson
91a1c0ef7d Fix key registration to happen on success
Key registration was attempted either way, causing bad targets
to fail to return timely error data.
2016-04-05 11:34:23 -04:00
Jarrod Johnson
419fcf1577 Defer key registration until login
Part of key registration is giving the OEM handler
a crack at it.  For that reason, defer the registration
until after login process has occurred.
2016-04-05 10:59:20 -04:00
Jarrod Johnson
06e767e70e Fix handling of error messages in async
ConfluentNodeError branch of messages were not recognized.  Correct the oversight.
2016-03-28 08:54:33 -04:00
Jarrod Johnson
94d2be4a87 Merge pull request #59 from jjohnson42/multiplex
Background HTTP Function
2016-03-26 13:43:10 -04:00
Jarrod Johnson
2ea7ee0dcb Add thread traces to USR1 handler
When receiving a USR1 signal, it did usefully provide
'the' current stack, useful for diagnosing really hard
hangs.  However, it's frequently informative to see all
the thread stack traces, so add that data to the diagnostic
feature.
2016-03-26 13:34:21 -04:00
Jarrod Johnson
417e70e5c1 Tolerate terminal closure
When a terminal closes and notifies server, it was
pulling the rug out from asyncsession consoles.
Make asyncsession aware that the console may be gone
and discard tracking it rather than give a 500.
2016-03-26 10:45:47 -04:00
Jarrod Johnson
03b2cdab5a Assure console sessions get reaped
When an error (to be fixed) happened while updating expiry,
an asyncsession failed to have a reaper scheduled for cleanup.
Correct this by putting the reaper schedule right after the
cancellation.

Further, an async being destroyed did not reap related console
sessions.  Add code to reap related console sessions when
the async session gets destroyed.
2016-03-26 10:26:17 -04:00
Jarrod Johnson
79b1268a75 Tolerate cp437 format text
UEFI output may still be cp437.  Tolerate through
attempting to use it.  UTF-8 continues to be preferred.
2016-03-26 10:02:16 -04:00
Jarrod Johnson
50aefee728 Correct a number of issues
There were a number of careless mistakes in the feature, correct
the bad usage and typos.
2016-03-26 09:34:46 -04:00
Jarrod Johnson
44a5c2b464 Merge branch 'master' into multiplex 2016-03-25 16:47:23 -04:00
Jarrod Johnson
2dd6c31513 Fix deleted logs breaking partial buffer rebuild
When the read_recent_text ran off a cliff looking for buffer data,
it left the current textfile handle in a bad state.  This caused
the buffer rebuild to fail completely in a scenario where all the
current logs put together don't have enough data to satisfy the
buffer.  Fix this by making the handle more obviously broken, and
repairing while seeking out data.
2016-03-25 16:44:28 -04:00
Jarrod Johnson
d753ac2833 Add terminal sessions to async http
This functionality enables a browser to hold more terminals open
than their max connection rating would normally allow.
2016-03-25 14:50:47 -04:00