2
0
mirror of https://github.com/xcat2/confluent.git synced 2024-11-29 13:00:03 +00:00
Commit Graph

2298 Commits

Author SHA1 Message Date
Jarrod Johnson
a1ac234b73 Enhance error message for authentication issue during syncfiles 2023-10-27 15:31:14 -04:00
Jarrod Johnson
d082610678 Add more deep checking of node networking
Whether due to the management node or node IP addresses,
check if deployment can reasonably proceed using IPv4 or IPv6,
and give a warning with some suggestions to check.

Also, add nodeinventory <node> -s as an example resolution for missing
uuid.
2023-10-27 13:34:52 -04:00
Jarrod Johnson
0857716f64 Add support for normalized sensors
This opens the door for normalized common sensors
for clients that care about the semantics but
cannot keep track of inconsistent sensor names from
implementation to implementation.
2023-10-26 08:58:37 -04:00
Jarrod Johnson
9c9d71882c Disable keepalive
Unfortunately, apache can get a bit odd over how it
reports a non-viable open socket for keepalive, which
can happen in certain windows.

Disable the keepalive feature and take some performance penalty in
browsers for the sake of more consistent return behavior and
fewer idle greenthreads doing nothing.
2023-10-19 15:51:40 -04:00
Jarrod Johnson
8b150a9047 Fix for post group failures
A node failure after group failure would
erase the group from range.

Further, correct an issue where an empty nodeset
would trigger
a bad behavior.
2023-10-19 09:25:57 -04:00
Jarrod Johnson
b91a194184 Improve selfselfice performance with yaml
The yaml python default behavior is 'pure python' and is
tortuously slow.

As a test, yaml dump of a 17,000 element list took 70 seconds in default configuration.

Opting into the C functions, that time comes down to 10 seconds, a
nice and easy improvement for generic yaml.

For dumping a simple dumb list (e.g. the nodelist for ssh), a special
case yaml-looking result is done, which hits 0.4 seconds on that same
test. So this special case is added to nodelist, which can be very long
and very in demand at the same time.
2023-10-17 16:29:30 -04:00
Jarrod Johnson
06d18cec63 Fix abbreviation when pad decreases
This is a bizarre way to work, but it should be valid.
2023-10-16 08:29:45 -04:00
Jarrod Johnson
bfbb7c2843 Handle mid-range pad changing, and identical names with only pad difference
This would be painful to operate, but if done at least
reverse noderange will
faithfully honor it now.
2023-10-12 16:09:40 -04:00
Jarrod Johnson
3a6932ea6d Start tracking padding during abbreviation
This will take care of padding when
padding is consistent across a range.

However, we still have a problem with a progression like:
01
02
...
98
099
100

Where numbers in the middle start getting padding unexpectedly without a leading digit.
2023-10-12 15:28:54 -04:00
Jarrod Johnson
6e4d9d9eb4 Address potential slowdowns by misbehaving DNS
For one, shorten the DNS timeout, if the DNS server is completely out, give up quickly.

For another, if a host has a large number of net.X.hostnames, the sequential nature
was intolerable.
Have each network be evaluated in a greenthread concurrently to serve
the DNS latency concurrently.
2023-10-12 14:46:09 -04:00
Jarrod Johnson
e9a2f57ad8 Simplify the noderange abbreviation
Since the multi-iterator ambition is out,
ditch the expensive set wrangling step.

Now the procedure is:
-Suck nodes into groups, as possible
-Separately for groups and nodes:
     -Sort the elements
     -Chunk the elements based on 'non-numberical' situation matching
     -analyze the iterators to apply [] to shorten the name
     -Multi-iterator will cause a discontinuity, and a new ',' delimited name gets constructed
2023-10-10 16:56:32 -04:00
Jarrod Johnson
c254564f02 Fully give up on multi-iterator abbreviation
There's too many cases that can go wrong.

Note that with this lower ambition, it would be possible to
significantly streamline the implementation.

Notably, the 'find discontinuities' approach
was selected to *try* to
support multiple iterators,
but since that didn't pan out,
a more straightforward
numerical strategy can
be used from the onset.
2023-10-10 12:47:19 -04:00
Jarrod Johnson
fe27cdea4a Abbreviate harder, using brackets
Add a round that collapses as is
convenient to bracketed range.
2023-10-09 17:18:44 -04:00
Jarrod Johnson
a4ea5e5c4b Abbreviate sequential nodes
When we have sequential nodes, collapse to ':' delimited range.
2023-10-07 09:51:32 -04:00
Jarrod Johnson
79e3ad53f8 Add server side rack layout organization
The info is hard to put together client side, but
supremely easy server side.

Provide a nice call to
get the layout for a noderange, similar to (but better than) current
GUI code.

Now GUI can get a nice canned JSON
description of the layout.
2023-09-29 16:23:59 -04:00
Jarrod Johnson
d613d0f546 Add openbmc plugin for console 2023-09-18 16:03:48 -04:00
Jarrod Johnson
47fc233cce Fix debian packaging for confluent 2023-09-18 15:48:38 -04:00
Jarrod Johnson
8f80add0f1 Enhance debian packaging for confluent 2023-09-18 15:19:10 -04:00
Jarrod Johnson
37b75ba777 Correct variable name on commit clear 2023-09-15 15:54:35 -04:00
Jarrod Johnson
aa5de3c6a3 Suspend handling of new socket connections while configmanager down 2023-09-15 15:48:37 -04:00
Jarrod Johnson
d4c535d038 Halt autonomous discovery handling while configmanager is down
This avoids triggering a potential large amount of churn on transiently
"unknown" systems
that are actually discovered.
2023-09-15 15:32:33 -04:00
Jarrod Johnson
94b8559777 Declare ready on becoming leader
Provide for leader scenario to correctly
flag configmanager as ready.
2023-09-15 15:28:16 -04:00
Jarrod Johnson
f2f25fe912 Implement ready tracking
When going through the
dramatic scenario of initializing collective,
take _ready down
so that other code can
pause operation appropriately.
2023-09-15 15:25:26 -04:00
Jarrod Johnson
c0629fcce5 Fix invocation of json restore change 2023-09-15 11:41:12 -04:00
Jarrod Johnson
4952e87309 Undo collective manager changes
Abort attempt to avoid duplicate startups, it was incorrect.
2023-09-15 10:52:13 -04:00
Jarrod Johnson
533244458d Do not count as 'initting' until collective starts. 2023-09-15 10:37:51 -04:00
Jarrod Johnson
20f02b5ef7 Avoid searching switches for foreign nodes
Consult collective.manager
to decide to skip
consideration of a node, if
that node shouldn't be managed anyway.

This should avoid "cross-island" behavior for such
environments.
2023-09-15 10:07:14 -04:00
Jarrod Johnson
df47c6d0fd Disable attribute notify during json restore
This is guaranteed to be a lot of churn very quickly, disable it for
now.
2023-09-13 17:03:05 -04:00
Jarrod Johnson
97ee8e2372 Cerrect the logic of duplicate discovery protection 2023-09-13 10:50:21 -04:00
Jarrod Johnson
74c6848a0b Avoid redundant setting of known data
Setting attributes can be a touch expensive, since
there's a high risk
of this being old news,
check that discovery hasn't already set values
before trying to set them again.
2023-09-13 09:59:03 -04:00
Jarrod Johnson
b75979f3ec Insulate confluent from fatal errors from discovery subscription errors 2023-09-12 16:59:53 -04:00
Jarrod Johnson
00eb9e3c9d Fix full_net_config with missing address info 2023-09-12 16:49:15 -04:00
Jarrod Johnson
9441221150 Have cooltera plugin adapt
As new sensors appear, be more adaptive
to continue tracking existing sensors.
2023-09-08 11:30:57 -04:00
Jarrod Johnson
691d92f735 Avoid calling implicit nic config if nowhere to put it
If 'None' attributes are in use,
we'd have no where to
stick implicit configuration anyway.
2023-09-07 14:41:16 -04:00
Jarrod Johnson
8ca1f80ef6 Fix implicit nic in confignet
If the implicit IP
is not in any of the
attribute groups of net,
then auto-vivify from the normal place.
2023-09-07 14:36:56 -04:00
Jarrod Johnson
22cb2bdc40 Handle Ubuntu hardcoded grub cfg
Ubuntu hardcodes grub.cfg to
another location.

Make a stub file as a flag to guide osimage
to know where grub.cfg goes.
2023-08-29 10:57:25 -04:00
Jarrod Johnson
b14b34bdbd Add limited sensor support for Eaton PDUs 2023-08-22 12:28:07 -04:00
Jarrod Johnson
189ba525d3
Merge pull request #91 from sjtstg/ansible-play-fix
fix ansible support when multi stage plays are in playbook
2023-08-15 08:38:56 -04:00
Jarrod Johnson
9a1c9eb43f Improve ssh concurrency on websocket
ssh module was pausing input for the
entire websocket while doing the simple 'write' operation.

Change to background the actual
logon processing,
rather than blocking what should be a fairly trivial write operation.
2023-08-03 09:56:36 -04:00
Jarrod Johnson
89bd798f8b Increasing time again, outlet count didn't factor 2023-08-02 15:20:29 -04:00
Jarrod Johnson
bf10e58f00 Bump version
With recent collective changes, bump the version to block connection with
older collective members until upgraded.
2023-08-02 13:43:41 -04:00
Jarrod Johnson
cbf2cdcdc5 Scale timeout with number of outlets
Delta PDUs seem to serialize outlet operation.
2023-08-01 16:08:51 -04:00
Jarrod Johnson
987587aaf8 Allow custom auth file to define valid roles 2023-07-26 16:37:55 -04:00
Jarrod Johnson
ad25c31d3f Correct error in check_for_yaml function in auth 2023-07-26 16:15:36 -04:00
Jarrod Johnson
b1018d648e Hook loading of /etc/confluent/authorization.yaml
This should permit custom roles to be defined.
2023-07-26 16:05:29 -04:00
Jarrod Johnson
957b979dde Reorder imports in configmanager to mitigate circular import 2023-07-24 13:38:44 -04:00
Jarrod Johnson
48c4a2e062 Have reconnects use new TCP connections
Current code was trying to reuse connections that would be useless,
explicitly go to new TCP connections for reconnection.
2023-07-24 12:31:32 -04:00
Jarrod Johnson
285a159ba5 Implement a number of improvements for collective
For one, remove 'non-voting' members from being leaders.
Large number of leader candidates create long delays for
converging on a valid organization.  Further, some treat 'non-voting'
more roughly, inducing the worst case convergence scenario of unclean
shutdown of leader.
Convergence now happens fairly quickly for collectives with large
number of non-voting members.

During initial DB transfer, the leader would be tied up unreasonably
long handling the jsonification of a large configuration.  Offload to a worker
process to allow the leader to continue operation while this intensive, rare
operation occurs.

Reliably run a reassimilation procedure for the lifetime of the leader.
This allows orphaned members to be prompted to join the correct leader.

Serialize the onboarding of a connecting member, and have redundancy more gracefully
paused. This avoids excessive waiting in lock and more deterministic timing
with respect to timeout expectations by the connecting system.
2023-07-24 11:11:39 -04:00
Jarrod Johnson
8ea2ba046e Sort the IP addresses in nodediscover for consistent UI 2023-07-19 16:16:27 -04:00
Jarrod Johnson
f16daa44dd Handle older python with addrinfo
While newer python omits '%',
older python includes.  Change to handle
either form.
2023-07-19 16:04:25 -04:00