2
0
mirror of https://github.com/xcat2/confluent.git synced 2025-01-18 05:33:17 +00:00

267 Commits

Author SHA1 Message Date
Jarrod Johnson
4aeb7e1df5 Provide a simple global 'log' function
As we implement internal processes with automation,
provide a hook for code to convey information about
situations encountered during background activity.
Ultimately, it is intended to hook event forwarders
for things like syslog/email/etc
2016-07-14 09:27:15 -04:00
Jarrod Johnson
147b3952e0 Implement the next layer of switch discovery
Refactor the snmputil to be object oriented to simplify upstream code.  Implement
a method to generate a mac address to ifName/ifDescr for a given switch.
2016-07-14 09:27:15 -04:00
Jarrod Johnson
54e135f210 Add a util function for SNMP
On the path to instrumenting network switches, first
we'll add some framework for SNMP.  Given that we are
using eventlet and thus we need a patchable SNMP,
we employ PySNMP, despite it being a bit peculiar.
This commit tucks away the oddness and makes it
pretty easy to use for our purposes.
2016-07-14 09:27:15 -04:00
Jarrod Johnson
3c876566a6 Switch to green DNS host resolution
The stock getaddrinfo can hang up all of confluent if misbehaving.
Patch pyghmi and switch to using greendns in confluent internal lookups.
2016-06-01 09:15:16 -04:00
Jarrod Johnson
f85ee82df3 Revert last portion of attempt at console auto-health
For now, it's causing more problems than it solved.  Back out until
a more appropriate time to investigate.
2016-05-25 13:09:10 -04:00
Jarrod Johnson
20abffdbee Revert "After 60 seconds of 'connect' limbo, kick a connection attempt"
This reverts commit e4aa8731413b736584e282470b41b35afd75a25d.
There may be some memory consumption issues with this feature.
2016-05-25 10:59:22 -04:00
Jarrod Johnson
41e84c7c47 Remove explict console health check
This is a resource consumption problem.  Defer such measures until later.
Investigation uncovered that there may have been another culprit anyway,
will see if only the other change (to kick a zoned out connection attempt)
suffices.
2016-05-23 13:49:46 -04:00
Jarrod Johnson
a046e4939f Fix ping before connection error
If ping() was called before connect, an exception was raised.  Fix this by
returning false in such an event.
2016-05-19 16:32:56 -04:00
Jarrod Johnson
e4aa873141 After 60 seconds of 'connect' limbo, kick a connection attempt
Occasionally it was observed that systems would be just stuck in 'connect',
provide a backup system to detect and forcibly kick the console in such a case.
2016-05-19 15:39:04 -04:00
Jarrod Johnson
ec02097b52 Explicitly check IPMI console health
In theory, pyghmi should be doing a self-health check.  It has been discovered at scale that
this self-health check may encounter issues.  For now, try to workaround by having another
health check at the confluent level, deferred by console activity.  It's also spaced far apart
so it should not significantly add to idle load (one check every ~5 minutes, spread out).
2016-05-19 14:44:28 -04:00
Jarrod Johnson
d9e47824a4 Backoff automatic reconnect interval
Previously, offline nodes would be rechecked automatically on average every 45 seconds.  Extend this
to on average 180 seconds, to reduce ARP traffic significantly when there are a large volume of
undefined nodes.  The 'try to connect on open' behavior is retained, so this would mean a longer loss
of connectivity only in a background monitored session.
2016-05-11 13:33:36 -04:00
Jarrod Johnson
96670784f9 Automatically increase limits
Knowing ahead of time that confluent is the sort of app that, despite
best efforts, is filehandle heavy, auto-attempt to raise soft to
be equal to hard limit.  A sufficiently large cluster (i.e. more than 2000
nodes) would still need to have limit adjusted at system level for now.
2016-05-10 14:44:52 -04:00
Jarrod Johnson
d5e833480e Tolerate gdbm
gdbm backend does not support the 'iterkeys' interface directly,
requiring instead to manually traverse.  Unfortunately, dbhash
does not implement the gdbm interface for this, so we have
to have two codepaths.
2016-05-02 10:44:12 -04:00
Jarrod Johnson
e949ee932a Implement 'persistent' option for nextdevice
Some systems provide the functionality, provide the message support
to do that.
2016-04-28 13:11:25 -04:00
Jarrod Johnson
b524af08b3 Add back explicit patching of portions of pyghmi
The previous commit produced significant problems.  pyghmi
late binds those values, so they must be explicitly patched.
2016-04-22 17:15:17 -04:00
Jarrod Johnson
df74753908 Patch import of pyghmi
Now that the problematic use of an os pipe is no more,
go ahead and patch pyghmi in a straightforward way.  This
was needed for the sake of pyghmi plugins that use a webclient.
2016-04-22 17:01:56 -04:00
Jarrod Johnson
bb0e256a98 Convert datetime objects to ISO8601 on the way out
If a plugin iterates a datetime object, decode to ISO-8601 string
on the way out.  This allows plugins to work directly with datetime
objects and allow the messaging layer to normalize it to ISO-8601
2016-04-20 16:51:01 -04:00
Jarrod Johnson
26da687dc3 Do not organize 'databynode' when not node
Messages that were not a node (e.g. confluent users) erroneously
had data put into 'databynode'.  Correct the mistake by omitting
the insertion of databynode when the message is clearly not a node
related thing.
2016-04-14 13:31:54 -04:00
Jarrod Johnson
0672666e42 Assure that get_health always updates inhealth
If an unforseen circumstance occurs while trying to get health,
make sure we recognize that scenario.
2016-04-12 14:16:15 -04:00
Jarrod Johnson
d4357c6984 Avoid double-removal of attrib watcher in ipmi
IPMI plugin was issuing redundant calls to remove the same
watcher.  Track that a session has already unhooked to
avoid double unhook (which runs at least a slight risk
of unhooking the wrong handler (*if* it were allowed).
2016-04-12 13:04:02 -04:00
Jarrod Johnson
4ba8a7a997 Dedupe concurrent ipmi health requests
IPMI health requests are relatively expensive.  It's
also pretty popular and therefore prone to be the target of
inadvertantly aggressive concurrent requests.  Mitigate the harm
by detecting concurrent usage and having callers share an answer.
2016-04-12 10:28:01 -04:00
Jarrod Johnson
22509946c0 Reduce verbosity of audit log
There are a number of pretty innocuous requests that
need not be individually tracked.  For such requests,
we'll abstain from putting it into the log.
2016-04-08 16:51:32 -04:00
Jarrod Johnson
f8b878b5f4 Unhook attribute watch on dead sessions
When a session is dead, it need not be told about
changes to config.  Save time and sanity by reaping
when discarding a dead session.
2016-04-05 13:57:47 -04:00
Jarrod Johnson
91a1c0ef7d Fix key registration to happen on success
Key registration was attempted either way, causing bad targets
to fail to return timely error data.
2016-04-05 11:34:23 -04:00
Jarrod Johnson
419fcf1577 Defer key registration until login
Part of key registration is giving the OEM handler
a crack at it.  For that reason, defer the registration
until after login process has occurred.
2016-04-05 10:59:20 -04:00
Jarrod Johnson
06e767e70e Fix handling of error messages in async
ConfluentNodeError branch of messages were not recognized.  Correct the oversight.
2016-03-28 08:54:33 -04:00
Jarrod Johnson
2ea7ee0dcb Add thread traces to USR1 handler
When receiving a USR1 signal, it did usefully provide
'the' current stack, useful for diagnosing really hard
hangs.  However, it's frequently informative to see all
the thread stack traces, so add that data to the diagnostic
feature.
2016-03-26 13:34:21 -04:00
Jarrod Johnson
417e70e5c1 Tolerate terminal closure
When a terminal closes and notifies server, it was
pulling the rug out from asyncsession consoles.
Make asyncsession aware that the console may be gone
and discard tracking it rather than give a 500.
2016-03-26 10:45:47 -04:00
Jarrod Johnson
03b2cdab5a Assure console sessions get reaped
When an error (to be fixed) happened while updating expiry,
an asyncsession failed to have a reaper scheduled for cleanup.
Correct this by putting the reaper schedule right after the
cancellation.

Further, an async being destroyed did not reap related console
sessions.  Add code to reap related console sessions when
the async session gets destroyed.
2016-03-26 10:26:17 -04:00
Jarrod Johnson
50aefee728 Correct a number of issues
There were a number of careless mistakes in the feature, correct
the bad usage and typos.
2016-03-26 09:34:46 -04:00
Jarrod Johnson
44a5c2b464 Merge branch 'master' into multiplex 2016-03-25 16:47:23 -04:00
Jarrod Johnson
2dd6c31513 Fix deleted logs breaking partial buffer rebuild
When the read_recent_text ran off a cliff looking for buffer data,
it left the current textfile handle in a bad state.  This caused
the buffer rebuild to fail completely in a scenario where all the
current logs put together don't have enough data to satisfy the
buffer.  Fix this by making the handle more obviously broken, and
repairing while seeking out data.
2016-03-25 16:44:28 -04:00
Jarrod Johnson
d753ac2833 Add terminal sessions to async http
This functionality enables a browser to hold more terminals open
than their max connection rating would normally allow.
2016-03-25 14:50:47 -04:00
Jarrod Johnson
3cd96a4f59 Force asyncresponse http to be JSON array
Rather than let it be ambiguous, force it to provide a JSON array.
2016-03-21 10:22:41 -04:00
Jarrod Johnson
2b3d5f7b62 Have async sessions detect logout 2016-03-21 10:22:41 -04:00
Jarrod Johnson
75a747a6a2 Amend structure of AsyncMessage
This is an easier structure to traverse for a client.
2016-03-21 10:22:41 -04:00
Jarrod Johnson
8fac1ce5da Fix up the async http to actually function
Still need to review the return data to determine best format
2016-03-21 10:22:41 -04:00
Jarrod Johnson
7d67ea0685 Refine asyncsupport
Asyncsupport progress continues.  Renaming from 'multiplex'
as 'async' seems to describe the pattern better.
2016-03-21 10:22:41 -04:00
Jarrod Johnson
bcb9c2660f Implement a multiplex facility (WIP)
Allow an arbitrary number of HTTP requests using a
small pool of connections, as is likely in a
common web browser.
2016-03-21 10:22:41 -04:00
Jarrod Johnson
6504acecad Change default log retention to be indefinite
Users have noted and complained that log data was lost, and didn't have old data.  This changes
the default behavior to be indefinite retention.  Users noting a lot of logs using space have a nice
intuitive indication of old files to delete, and the option remains for those to request a log expiration.
2016-03-21 09:57:23 -04:00
Jarrod Johnson
d1247cfb37 Restore disconnect notification to ssh plugin
The disconnect notification was erroneously removed in
the previous checkin, this restores it.
2016-03-16 11:20:14 -04:00
Jarrod Johnson
c5e19fe474 Have ssh plugin report on connection error
Before the connection would fail and log to trace without anything
particularly informative for the client (they just saw 'unexpected error'.
Provide a more informative behavior for the client.
2016-03-16 09:50:46 -04:00
Jarrod Johnson
58bf72d5aa Do not remove databuffer on close
If exiting from a shell session, the databuffer will contain needed info for the client
to work properly.  Preserve databuffer existence.  Responsibility for deleting the
object should be in the hands of the caller.
2016-03-16 09:09:24 -04:00
Jarrod Johnson
f15cf014e9 Avoid changing hash size during loop
Coerce iterator into a list so that for loop does not
raise an exception.
2016-03-16 08:40:39 -04:00
Jarrod Johnson
fb1e20906e Do not worry over non-existant debug socket
If the socket was not created, do not error on exit because it isn't there to be cleaned up.
2016-03-15 11:15:15 -04:00
Jarrod Johnson
1bf124494e Add location attributes
Provide data that may be used to track system
locations.
2016-03-14 09:16:46 -04:00
Jarrod Johnson
9d40c67974 Support walking back through multiple logs
The rollback support and replaydid not follow more than one log back.  Do the work to recurse
into older and older files, until big enough buffer or run out of files.
2016-03-13 19:50:02 -04:00
Jarrod Johnson
f75f2cae51 Correct sockapi behavior when user authorize returns None
If a user can connect, but gets removed mid session, traces were
being generated.  Correct by recognizing the circumstance and returning
the appropriate error to the client.
2016-03-13 18:57:27 -04:00
Jarrod Johnson
5ae0f37f97 Do not generate trace on request to delete non-existant session 2016-03-13 18:51:18 -04:00
Jarrod Johnson
0e42e83c50 Restore intended per-user ssh sessions
Each user should have their own ssh sessions, as originally
intended.
2016-03-13 18:43:57 -04:00