confluent

mirror of https://github.com/xcat2/confluent.git synced 2025-01-18 05:33:17 +00:00

Author	SHA1	Message	Date
Jarrod Johnson	7a4c9a1fc0	Add mac map lookup against config to get node This brings things right to the level of xCAT in terms of underlying capability. mac addresses have both an all inclusive list of ports it is found on, and any nodes that it matches. It goes another step further by logging errors when ambiguity is detected (either verbatim config conflict or ambiguous result based on 'namesmatch' and the switch config).	2016-07-14 08:55:50 -04:00
Jarrod Johnson	9764a02419	Improve behavior of mac map One, include a number of 'fellow' mac addresses on the same port. Another, allow a mac to appear on multiple ports and have that reflected in the data structure. Also capture errors to trace log rather than hanging up on unexpected cases.	2016-06-30 15:54:18 -04:00
Jarrod Johnson	f539a4e4b6	Provide config enabled switch mapping Wire up the singleton switch search function to a function that extracts list of switches and relevant auth data from the config engine. Add attributes to allow indication by hardware management port connection. The OS nics will be added later for in-band discovery, but that's of limited value until PXE support anyway. This time, the update function is a generator that yields as a sign to caller that the mac map has had at least a partial update to be considered.	2016-06-29 16:32:46 -04:00
Jarrod Johnson	6b5f437a1c	Provide a simple global 'log' function As we implement internal processes with automation, provide a hook for code to convey information about situations encountered during background activity. Ultimately, it is intended to hook event forwarders for things like syslog/email/etc	2016-06-29 11:29:05 -04:00
Jarrod Johnson	8387f0e13e	Implement the next layer of switch discovery Refactor the snmputil to be object oriented to simplify upstream code. Implement a method to generate a mac address to ifName/ifDescr for a given switch.	2016-06-29 11:26:46 -04:00
Jarrod Johnson	ee679b745e	Add a util function for SNMP On the path to instrumenting network switches, first we'll add some framework for SNMP. Given that we are using eventlet and thus we need a patchable SNMP, we employ PySNMP, despite it being a bit peculiar. This commit tucks away the oddness and makes it pretty easy to use for our purposes.	2016-06-28 14:21:21 -04:00
Jarrod Johnson	3c876566a6	Switch to green DNS host resolution The stock getaddrinfo can hang up all of confluent if misbehaving. Patch pyghmi and switch to using greendns in confluent internal lookups.	2016-06-01 09:15:16 -04:00
Jarrod Johnson	f85ee82df3	Revert last portion of attempt at console auto-health For now, it's causing more problems than it solved. Back out until a more appropriate time to investigate.	2016-05-25 13:09:10 -04:00
Jarrod Johnson	20abffdbee	Revert "After 60 seconds of 'connect' limbo, kick a connection attempt" This reverts commit e4aa8731413b736584e282470b41b35afd75a25d. There may be some memory consumption issues with this feature.	2016-05-25 10:59:22 -04:00
Jarrod Johnson	41e84c7c47	Remove explict console health check This is a resource consumption problem. Defer such measures until later. Investigation uncovered that there may have been another culprit anyway, will see if only the other change (to kick a zoned out connection attempt) suffices.	2016-05-23 13:49:46 -04:00
Jarrod Johnson	a046e4939f	Fix ping before connection error If ping() was called before connect, an exception was raised. Fix this by returning false in such an event.	2016-05-19 16:32:56 -04:00
Jarrod Johnson	e4aa873141	After 60 seconds of 'connect' limbo, kick a connection attempt Occasionally it was observed that systems would be just stuck in 'connect', provide a backup system to detect and forcibly kick the console in such a case.	2016-05-19 15:39:04 -04:00
Jarrod Johnson	ec02097b52	Explicitly check IPMI console health In theory, pyghmi should be doing a self-health check. It has been discovered at scale that this self-health check may encounter issues. For now, try to workaround by having another health check at the confluent level, deferred by console activity. It's also spaced far apart so it should not significantly add to idle load (one check every ~5 minutes, spread out).	2016-05-19 14:44:28 -04:00
Jarrod Johnson	d9e47824a4	Backoff automatic reconnect interval Previously, offline nodes would be rechecked automatically on average every 45 seconds. Extend this to on average 180 seconds, to reduce ARP traffic significantly when there are a large volume of undefined nodes. The 'try to connect on open' behavior is retained, so this would mean a longer loss of connectivity only in a background monitored session.	2016-05-11 13:33:36 -04:00
Jarrod Johnson	96670784f9	Automatically increase limits Knowing ahead of time that confluent is the sort of app that, despite best efforts, is filehandle heavy, auto-attempt to raise soft to be equal to hard limit. A sufficiently large cluster (i.e. more than 2000 nodes) would still need to have limit adjusted at system level for now.	2016-05-10 14:44:52 -04:00
Jarrod Johnson	d5e833480e	Tolerate gdbm gdbm backend does not support the 'iterkeys' interface directly, requiring instead to manually traverse. Unfortunately, dbhash does not implement the gdbm interface for this, so we have to have two codepaths.	2016-05-02 10:44:12 -04:00
Jarrod Johnson	e949ee932a	Implement 'persistent' option for nextdevice Some systems provide the functionality, provide the message support to do that.	2016-04-28 13:11:25 -04:00
Jarrod Johnson	b524af08b3	Add back explicit patching of portions of pyghmi The previous commit produced significant problems. pyghmi late binds those values, so they must be explicitly patched.	2016-04-22 17:15:17 -04:00
Jarrod Johnson	df74753908	Patch import of pyghmi Now that the problematic use of an os pipe is no more, go ahead and patch pyghmi in a straightforward way. This was needed for the sake of pyghmi plugins that use a webclient.	2016-04-22 17:01:56 -04:00
Jarrod Johnson	bb0e256a98	Convert datetime objects to ISO8601 on the way out If a plugin iterates a datetime object, decode to ISO-8601 string on the way out. This allows plugins to work directly with datetime objects and allow the messaging layer to normalize it to ISO-8601	2016-04-20 16:51:01 -04:00
Jarrod Johnson	26da687dc3	Do not organize 'databynode' when not node Messages that were not a node (e.g. confluent users) erroneously had data put into 'databynode'. Correct the mistake by omitting the insertion of databynode when the message is clearly not a node related thing.	2016-04-14 13:31:54 -04:00
Jarrod Johnson	0672666e42	Assure that get_health always updates inhealth If an unforseen circumstance occurs while trying to get health, make sure we recognize that scenario.	2016-04-12 14:16:15 -04:00
Jarrod Johnson	d4357c6984	Avoid double-removal of attrib watcher in ipmi IPMI plugin was issuing redundant calls to remove the same watcher. Track that a session has already unhooked to avoid double unhook (which runs at least a slight risk of unhooking the wrong handler (if it were allowed).	2016-04-12 13:04:02 -04:00
Jarrod Johnson	4ba8a7a997	Dedupe concurrent ipmi health requests IPMI health requests are relatively expensive. It's also pretty popular and therefore prone to be the target of inadvertantly aggressive concurrent requests. Mitigate the harm by detecting concurrent usage and having callers share an answer.	2016-04-12 10:28:01 -04:00
Jarrod Johnson	fa0c0ce81a	Add paramiko and update package names in server	2016-04-11 13:06:05 -04:00
Jarrod Johnson	d3bda4217c	Add paramiko to the requirements	2016-04-11 11:51:11 -04:00
Jarrod Johnson	22509946c0	Reduce verbosity of audit log There are a number of pretty innocuous requests that need not be individually tracked. For such requests, we'll abstain from putting it into the log.	2016-04-08 16:51:32 -04:00
Jarrod Johnson	f8b878b5f4	Unhook attribute watch on dead sessions When a session is dead, it need not be told about changes to config. Save time and sanity by reaping when discarding a dead session.	2016-04-05 13:57:47 -04:00
Jarrod Johnson	91a1c0ef7d	Fix key registration to happen on success Key registration was attempted either way, causing bad targets to fail to return timely error data.	2016-04-05 11:34:23 -04:00
Jarrod Johnson	419fcf1577	Defer key registration until login Part of key registration is giving the OEM handler a crack at it. For that reason, defer the registration until after login process has occurred.	2016-04-05 10:59:20 -04:00
Jarrod Johnson	06e767e70e	Fix handling of error messages in async ConfluentNodeError branch of messages were not recognized. Correct the oversight.	2016-03-28 08:54:33 -04:00
Jarrod Johnson	2ea7ee0dcb	Add thread traces to USR1 handler When receiving a USR1 signal, it did usefully provide 'the' current stack, useful for diagnosing really hard hangs. However, it's frequently informative to see all the thread stack traces, so add that data to the diagnostic feature.	2016-03-26 13:34:21 -04:00
Jarrod Johnson	417e70e5c1	Tolerate terminal closure When a terminal closes and notifies server, it was pulling the rug out from asyncsession consoles. Make asyncsession aware that the console may be gone and discard tracking it rather than give a 500.	2016-03-26 10:45:47 -04:00
Jarrod Johnson	03b2cdab5a	Assure console sessions get reaped When an error (to be fixed) happened while updating expiry, an asyncsession failed to have a reaper scheduled for cleanup. Correct this by putting the reaper schedule right after the cancellation. Further, an async being destroyed did not reap related console sessions. Add code to reap related console sessions when the async session gets destroyed.	2016-03-26 10:26:17 -04:00
Jarrod Johnson	50aefee728	Correct a number of issues There were a number of careless mistakes in the feature, correct the bad usage and typos.	2016-03-26 09:34:46 -04:00
Jarrod Johnson	44a5c2b464	Merge branch 'master' into multiplex	2016-03-25 16:47:23 -04:00
Jarrod Johnson	2dd6c31513	Fix deleted logs breaking partial buffer rebuild When the read_recent_text ran off a cliff looking for buffer data, it left the current textfile handle in a bad state. This caused the buffer rebuild to fail completely in a scenario where all the current logs put together don't have enough data to satisfy the buffer. Fix this by making the handle more obviously broken, and repairing while seeking out data.	2016-03-25 16:44:28 -04:00
Jarrod Johnson	d753ac2833	Add terminal sessions to async http This functionality enables a browser to hold more terminals open than their max connection rating would normally allow.	2016-03-25 14:50:47 -04:00
Jarrod Johnson	3cd96a4f59	Force asyncresponse http to be JSON array Rather than let it be ambiguous, force it to provide a JSON array.	2016-03-21 10:22:41 -04:00
Jarrod Johnson	2b3d5f7b62	Have async sessions detect logout	2016-03-21 10:22:41 -04:00
Jarrod Johnson	75a747a6a2	Amend structure of AsyncMessage This is an easier structure to traverse for a client.	2016-03-21 10:22:41 -04:00
Jarrod Johnson	8fac1ce5da	Fix up the async http to actually function Still need to review the return data to determine best format	2016-03-21 10:22:41 -04:00
Jarrod Johnson	7d67ea0685	Refine asyncsupport Asyncsupport progress continues. Renaming from 'multiplex' as 'async' seems to describe the pattern better.	2016-03-21 10:22:41 -04:00
Jarrod Johnson	bcb9c2660f	Implement a multiplex facility (WIP) Allow an arbitrary number of HTTP requests using a small pool of connections, as is likely in a common web browser.	2016-03-21 10:22:41 -04:00
Jarrod Johnson	6504acecad	Change default log retention to be indefinite Users have noted and complained that log data was lost, and didn't have old data. This changes the default behavior to be indefinite retention. Users noting a lot of logs using space have a nice intuitive indication of old files to delete, and the option remains for those to request a log expiration.	2016-03-21 09:57:23 -04:00
Jarrod Johnson	d1247cfb37	Restore disconnect notification to ssh plugin The disconnect notification was erroneously removed in the previous checkin, this restores it.	2016-03-16 11:20:14 -04:00
Jarrod Johnson	c5e19fe474	Have ssh plugin report on connection error Before the connection would fail and log to trace without anything particularly informative for the client (they just saw 'unexpected error'. Provide a more informative behavior for the client.	2016-03-16 09:50:46 -04:00
Jarrod Johnson	58bf72d5aa	Do not remove databuffer on close If exiting from a shell session, the databuffer will contain needed info for the client to work properly. Preserve databuffer existence. Responsibility for deleting the object should be in the hands of the caller.	2016-03-16 09:09:24 -04:00
Jarrod Johnson	f15cf014e9	Avoid changing hash size during loop Coerce iterator into a list so that for loop does not raise an exception.	2016-03-16 08:40:39 -04:00
Jarrod Johnson	fb1e20906e	Do not worry over non-existant debug socket If the socket was not created, do not error on exit because it isn't there to be cleaned up.	2016-03-15 11:15:15 -04:00

1 2 3 4 5 ...

305 Commits