2
0
mirror of https://github.com/xcat2/confluent.git synced 2025-01-17 21:23:18 +00:00

315 Commits

Author SHA1 Message Date
Jarrod Johnson
d19fdad0ba Avoid double-disconnect behavior
Do a better job of cleanly handling scenarios
where disconnect would come from a session currently
disconnected.  Inside the ipmi plugin, suppress a
disconnect event if one has been sent.  Inside
consoleserver, surpress logging a disconnect when
already disconnected.

Originally was going to skip the reconnect, but that would
mitigate recovery.  Hopefully supressing the duplicate
disconnect in ipmi plugin, and some fixes in pyghmi will
avoid a 'double connect' scenario.
2016-09-12 14:35:27 -04:00
Jarrod Johnson
9e4ee7bb31 Fix conflicts with system python modules
If python system module had a name that conflicted in some way
with a plugin, the plugin load would fail.  Fix this by prioritizing
the plugin path over system locations.  Also, to avoid the breakage
going the other way, remove the plugindir from the system path when
that particular directory is done.
2016-08-29 09:56:21 -04:00
Jarrod Johnson
4d04c1fb18 Add break and reopen to http consoles
HTTP console API did not have a means to send break
or request session reopen.  Rectify this discrepency
by adding an 'action' key to request certain console
specific actions.  In retrospect, closing the session
should have just been an 'action', but leaving things
as-is.
2016-08-23 14:04:20 -04:00
Jarrod Johnson
1085e342fd Make missing NTP server return 404
Before it was returning 500 because of index out of range
if a client was pulling an index unconditionally.
2016-08-08 09:11:59 -04:00
Jarrod Johnson
05e642ada5 Do not overwrite 'login' prompt in ssh plugin
ssh plugin was sending backspaces without bound, causing
deletion of the login prompt.
2016-08-04 16:44:18 -04:00
Jarrod Johnson
00da61b981 Enable backspace for ssh user/pass prompt
When prompting for username and password,
make backspace work fine.
2016-08-03 13:49:27 -04:00
Jarrod Johnson
786a1ec93e Fix a couple of formatting issues 2016-07-19 09:15:45 -04:00
Jarrod Johnson
7b160bd99c Fix namesmatch to actually return True
In the common case, we were falling through the bottom
without an explicit return.  Restructure things to both
explicitly return and look a bit more sane.
2016-07-15 16:47:42 -04:00
Jarrod Johnson
9516efd74a Merge branch 'master' into switchsupport 2016-07-14 11:01:07 -04:00
Jarrod Johnson
5410b394f2 Fix 'unset' on noderange
The Attributes management class was making shared shallow
copies.  This caused a problem when attributes class assumed
it could modify the result.  Correct by providing a deep copy
of that node's data when it is requested.
2016-07-14 09:56:40 -04:00
Jarrod Johnson
29da853bcf Add mac map lookup against config to get node
This brings things right to the level of xCAT in
terms of underlying capability.  mac addresses have both
an all inclusive list of ports it is found on, and any nodes
that it matches.  It goes another step further by logging errors
when ambiguity is detected (either verbatim config conflict or
ambiguous result based on 'namesmatch' and the switch config).
2016-07-14 09:27:15 -04:00
Jarrod Johnson
7a72de6033 Improve behavior of mac map
One, include a number of 'fellow' mac addresses on the same port.
Another, allow a mac to appear on multiple ports and have that
reflected in the data structure.  Also capture errors to trace
log rather than hanging up on unexpected cases.
2016-07-14 09:27:15 -04:00
Jarrod Johnson
b9733b3e0e Provide config enabled switch mapping
Wire up the singleton switch search function to a function that
extracts list of switches and relevant auth data from the config
engine.  Add attributes to allow indication by hardware management
port connection.  The OS nics will be added later for in-band discovery,
but that's of limited value until PXE support anyway.

This time, the update function is a generator that yields as a sign to caller
that the mac map has had at least a partial update to be considered.
2016-07-14 09:27:15 -04:00
Jarrod Johnson
4aeb7e1df5 Provide a simple global 'log' function
As we implement internal processes with automation,
provide a hook for code to convey information about
situations encountered during background activity.
Ultimately, it is intended to hook event forwarders
for things like syslog/email/etc
2016-07-14 09:27:15 -04:00
Jarrod Johnson
147b3952e0 Implement the next layer of switch discovery
Refactor the snmputil to be object oriented to simplify upstream code.  Implement
a method to generate a mac address to ifName/ifDescr for a given switch.
2016-07-14 09:27:15 -04:00
Jarrod Johnson
54e135f210 Add a util function for SNMP
On the path to instrumenting network switches, first
we'll add some framework for SNMP.  Given that we are
using eventlet and thus we need a patchable SNMP,
we employ PySNMP, despite it being a bit peculiar.
This commit tucks away the oddness and makes it
pretty easy to use for our purposes.
2016-07-14 09:27:15 -04:00
Jarrod Johnson
3c876566a6 Switch to green DNS host resolution
The stock getaddrinfo can hang up all of confluent if misbehaving.
Patch pyghmi and switch to using greendns in confluent internal lookups.
2016-06-01 09:15:16 -04:00
Jarrod Johnson
f85ee82df3 Revert last portion of attempt at console auto-health
For now, it's causing more problems than it solved.  Back out until
a more appropriate time to investigate.
2016-05-25 13:09:10 -04:00
Jarrod Johnson
20abffdbee Revert "After 60 seconds of 'connect' limbo, kick a connection attempt"
This reverts commit e4aa8731413b736584e282470b41b35afd75a25d.
There may be some memory consumption issues with this feature.
2016-05-25 10:59:22 -04:00
Jarrod Johnson
41e84c7c47 Remove explict console health check
This is a resource consumption problem.  Defer such measures until later.
Investigation uncovered that there may have been another culprit anyway,
will see if only the other change (to kick a zoned out connection attempt)
suffices.
2016-05-23 13:49:46 -04:00
Jarrod Johnson
a046e4939f Fix ping before connection error
If ping() was called before connect, an exception was raised.  Fix this by
returning false in such an event.
2016-05-19 16:32:56 -04:00
Jarrod Johnson
e4aa873141 After 60 seconds of 'connect' limbo, kick a connection attempt
Occasionally it was observed that systems would be just stuck in 'connect',
provide a backup system to detect and forcibly kick the console in such a case.
2016-05-19 15:39:04 -04:00
Jarrod Johnson
ec02097b52 Explicitly check IPMI console health
In theory, pyghmi should be doing a self-health check.  It has been discovered at scale that
this self-health check may encounter issues.  For now, try to workaround by having another
health check at the confluent level, deferred by console activity.  It's also spaced far apart
so it should not significantly add to idle load (one check every ~5 minutes, spread out).
2016-05-19 14:44:28 -04:00
Jarrod Johnson
d9e47824a4 Backoff automatic reconnect interval
Previously, offline nodes would be rechecked automatically on average every 45 seconds.  Extend this
to on average 180 seconds, to reduce ARP traffic significantly when there are a large volume of
undefined nodes.  The 'try to connect on open' behavior is retained, so this would mean a longer loss
of connectivity only in a background monitored session.
2016-05-11 13:33:36 -04:00
Jarrod Johnson
96670784f9 Automatically increase limits
Knowing ahead of time that confluent is the sort of app that, despite
best efforts, is filehandle heavy, auto-attempt to raise soft to
be equal to hard limit.  A sufficiently large cluster (i.e. more than 2000
nodes) would still need to have limit adjusted at system level for now.
2016-05-10 14:44:52 -04:00
Jarrod Johnson
d5e833480e Tolerate gdbm
gdbm backend does not support the 'iterkeys' interface directly,
requiring instead to manually traverse.  Unfortunately, dbhash
does not implement the gdbm interface for this, so we have
to have two codepaths.
2016-05-02 10:44:12 -04:00
Jarrod Johnson
e949ee932a Implement 'persistent' option for nextdevice
Some systems provide the functionality, provide the message support
to do that.
2016-04-28 13:11:25 -04:00
Jarrod Johnson
b524af08b3 Add back explicit patching of portions of pyghmi
The previous commit produced significant problems.  pyghmi
late binds those values, so they must be explicitly patched.
2016-04-22 17:15:17 -04:00
Jarrod Johnson
df74753908 Patch import of pyghmi
Now that the problematic use of an os pipe is no more,
go ahead and patch pyghmi in a straightforward way.  This
was needed for the sake of pyghmi plugins that use a webclient.
2016-04-22 17:01:56 -04:00
Jarrod Johnson
bb0e256a98 Convert datetime objects to ISO8601 on the way out
If a plugin iterates a datetime object, decode to ISO-8601 string
on the way out.  This allows plugins to work directly with datetime
objects and allow the messaging layer to normalize it to ISO-8601
2016-04-20 16:51:01 -04:00
Jarrod Johnson
26da687dc3 Do not organize 'databynode' when not node
Messages that were not a node (e.g. confluent users) erroneously
had data put into 'databynode'.  Correct the mistake by omitting
the insertion of databynode when the message is clearly not a node
related thing.
2016-04-14 13:31:54 -04:00
Jarrod Johnson
0672666e42 Assure that get_health always updates inhealth
If an unforseen circumstance occurs while trying to get health,
make sure we recognize that scenario.
2016-04-12 14:16:15 -04:00
Jarrod Johnson
d4357c6984 Avoid double-removal of attrib watcher in ipmi
IPMI plugin was issuing redundant calls to remove the same
watcher.  Track that a session has already unhooked to
avoid double unhook (which runs at least a slight risk
of unhooking the wrong handler (*if* it were allowed).
2016-04-12 13:04:02 -04:00
Jarrod Johnson
4ba8a7a997 Dedupe concurrent ipmi health requests
IPMI health requests are relatively expensive.  It's
also pretty popular and therefore prone to be the target of
inadvertantly aggressive concurrent requests.  Mitigate the harm
by detecting concurrent usage and having callers share an answer.
2016-04-12 10:28:01 -04:00
Jarrod Johnson
fa0c0ce81a Add paramiko and update package names in server 2016-04-11 13:06:05 -04:00
Jarrod Johnson
d3bda4217c Add paramiko to the requirements 2016-04-11 11:51:11 -04:00
Jarrod Johnson
22509946c0 Reduce verbosity of audit log
There are a number of pretty innocuous requests that
need not be individually tracked.  For such requests,
we'll abstain from putting it into the log.
2016-04-08 16:51:32 -04:00
Jarrod Johnson
f8b878b5f4 Unhook attribute watch on dead sessions
When a session is dead, it need not be told about
changes to config.  Save time and sanity by reaping
when discarding a dead session.
2016-04-05 13:57:47 -04:00
Jarrod Johnson
91a1c0ef7d Fix key registration to happen on success
Key registration was attempted either way, causing bad targets
to fail to return timely error data.
2016-04-05 11:34:23 -04:00
Jarrod Johnson
419fcf1577 Defer key registration until login
Part of key registration is giving the OEM handler
a crack at it.  For that reason, defer the registration
until after login process has occurred.
2016-04-05 10:59:20 -04:00
Jarrod Johnson
06e767e70e Fix handling of error messages in async
ConfluentNodeError branch of messages were not recognized.  Correct the oversight.
2016-03-28 08:54:33 -04:00
Jarrod Johnson
2ea7ee0dcb Add thread traces to USR1 handler
When receiving a USR1 signal, it did usefully provide
'the' current stack, useful for diagnosing really hard
hangs.  However, it's frequently informative to see all
the thread stack traces, so add that data to the diagnostic
feature.
2016-03-26 13:34:21 -04:00
Jarrod Johnson
417e70e5c1 Tolerate terminal closure
When a terminal closes and notifies server, it was
pulling the rug out from asyncsession consoles.
Make asyncsession aware that the console may be gone
and discard tracking it rather than give a 500.
2016-03-26 10:45:47 -04:00
Jarrod Johnson
03b2cdab5a Assure console sessions get reaped
When an error (to be fixed) happened while updating expiry,
an asyncsession failed to have a reaper scheduled for cleanup.
Correct this by putting the reaper schedule right after the
cancellation.

Further, an async being destroyed did not reap related console
sessions.  Add code to reap related console sessions when
the async session gets destroyed.
2016-03-26 10:26:17 -04:00
Jarrod Johnson
50aefee728 Correct a number of issues
There were a number of careless mistakes in the feature, correct
the bad usage and typos.
2016-03-26 09:34:46 -04:00
Jarrod Johnson
44a5c2b464 Merge branch 'master' into multiplex 2016-03-25 16:47:23 -04:00
Jarrod Johnson
2dd6c31513 Fix deleted logs breaking partial buffer rebuild
When the read_recent_text ran off a cliff looking for buffer data,
it left the current textfile handle in a bad state.  This caused
the buffer rebuild to fail completely in a scenario where all the
current logs put together don't have enough data to satisfy the
buffer.  Fix this by making the handle more obviously broken, and
repairing while seeking out data.
2016-03-25 16:44:28 -04:00
Jarrod Johnson
d753ac2833 Add terminal sessions to async http
This functionality enables a browser to hold more terminals open
than their max connection rating would normally allow.
2016-03-25 14:50:47 -04:00
Jarrod Johnson
3cd96a4f59 Force asyncresponse http to be JSON array
Rather than let it be ambiguous, force it to provide a JSON array.
2016-03-21 10:22:41 -04:00
Jarrod Johnson
2b3d5f7b62 Have async sessions detect logout 2016-03-21 10:22:41 -04:00