It is possible for activity under 'raw_command' to modify custom
keepalive registry. Tolerate the structure changing in the loop gracefully.
Change-Id: I99c99b52718dff518c303819e7a24085cc6fb97a
If close is called and the remote BMC session no longer works,
do not pass up worrisome trace to a caller, which is calling close()
to try to make sure things are clean and here there is just some part
that was already done.
Change-Id: Ib0c770b57eb0f204bcde6fc786e8f064f02ece1a
On June 10 2014 one condition was addressed that caused infinite
recursion. Then it was an invalid timer that could fire in the midst of
a command. The case where this could validly occur was overlooked.
Address this by deferring invocation of keepalives until after command
exits. If incommand indicated activity advances timeout in non
custom keepalive case, then the keepalive timer will actually be correctly advanced.
Change-Id: Iebe0241c1f928c4187f167f3ffa407f8c6f7fa84
If an _io_apply encounters an exception, the io worker entirely would fall apart. Encompass the key entry
points in try clauses to allow the thread to keep working and the dependent IPMI object to have their
waiter acknowledged. It's still considered a grave bug for this to ever occur, but at least
the application would carry on.
Change-Id: I61b0797025b25c6d9d3e86a5110603a6fc2d67fb
In the IPMI spec, compact sensors have the numeric format reserved
and mandate an implementation set it to '3'. This mandate seems
to have been ignored by some implementations. Force the value to be
3 for all compact sensor records and assume the reserved bits may never
be used in a compact sensor.
Change-Id: I88f5d7b533869809f213ab0c5379b276af50cd23
In IPMI2, there are two modes the BMC can regard the Role parameter.
It can either consider it a 'max privilege' or 'match privilege'.
It defaulted to 'match privilege' in order to enhance compatibility
with some earlier BMC implementations that misinterpreted the
specification in a way that allowed 'match privilege' to work but
'max privilege' to break without a specific workaround for that BMC.
That BMC family is pretty much out of service and if the same issue
arises later, we can put in auto-detect and workaround pretty cheaply.
With this in mind, change the mode to look account up by name only
since that is how 99% of ipmitool invocations are done and it
also is a more straightforward model.
Change-Id: Ibf82b70e1b85e4e05c93365a684e21c434b4d5b4
Since we cannot hope to linearize a linearizable value without
understanding the formula (OEM or future spec), treat all unrecognized
linearizations and non-linearizable and rely upon get sensor reading
factors to determine the value. Add the capability to actually get
the sensor reading factors and then pass the resultant data through
the same decode_formula that would have been use had the factors
been retrieved through the SDR record.
Change-Id: I4c3a6bbbd6c68f7a0d19c2a7a221eb5fb57c99de
The method get() returns a value for the given key. If key is not
available then returns default value None. I think that was the
intention of the initial code.
Change-Id: I974258822d54f7ac09bc4197eb4ec249784012e7
In practice, generic discrete sensors have not indicated good *or* bad
health They have most commonly been used to indicate something like a
particular option being available or user disabled. This does mean
that something trying to use an utterly generic discrete sensor will
not trigger a health issue, but hopefully those cases leverage more
informative events that do have clear 'health' connotations. There
remains the chance that a sensor will rely upon the vocabulary of
the text in SDR and that just cannot be avoided.
Change-Id: I777b2f1300301291ca5a3aa7a6b18de1de6f9d1a
There is some inconsistency in the way BMCs may balk at pursuing a privilege level beyond
the user requesting. Add code to cope with two scenarios:
-RAKP2 returning 0xd
-set session privilege level returning 0x80 or 0x81
Change-Id: I500e5bbdf88b569b1f1c3f8476033be080770871
If two contexts call raw_command concurrently, there was a scenario
where the first to transmit has its result overwritten by the next to
send and corrupts the results of the first command. One scenario
where this was encountered was when a get health call was being
serviced at the same moment SOL attempted to open a console, causing
one of the get sensor readings to complain that 'SOL was already
active'. Address it by storing away lastresponse in a more context
specific place before deasserting 'incommand' and remove instances
that deasserted it earlier.
Change-Id: I504da3f54562a4b65b8f4e9e20c19aed9d21a09f
pyghmi was using any activity to defer any keepalive. If
a caller has a custom keepalive, only keepalive activity
should advance the keepalive expiry. Modify code to defer
keepalive only if it is the generic keepalive.
Change-Id: I852ad7a5de65af60fb8e11580bd2ef32896b71f6
Custom keepalives are called regardless of whether a command is issued
or not. The rationale being that custom keepalives are checking for
something specific rather than just assuring session state. Notably,
SOL uses a custom keepalive to see if the payload is still active.
The resultant problem was that if keepalive expired just at the time
something was in the midst of a command, a session would infinitely
recurse into its own keepalive. The issue was that the keepalive
expiry incorrectly omitted _monotonic_time, causing expiry to
always be far in the past. It normally did not break because if
not incommand, send_payload was setting an appropriate value after
the incorrect setting.
Change-Id: Ie86e49890a6ac96ddf07206fb1b8558161c00a20
Some configurations disable dual stack sockets. For example
if net.ipv6.bindv6only is 1 in linux, pyghmi was failing.
Address by explicitly requesting the converse behavior
on the socket since we explicitly do want to not care
whether a particular socket is engaged in ipv4 or ipv6
activity.
Change-Id: I17a16f0ebe4752ca743f115af39a367670691507
If trying to establish a session and an error occurs, pyghmi
was not decrementing the usage count on the pool, leading for
more than expected filehandles being created over time. Fix
this by correctly decrementing the count in the case where
something is not yet broken, but is also not yet logged in.
Change-Id: I1b27f9b3b902a253d38293182305cc4dac26b765
When open session fails, console continued trying to use the session
even though it is a lost cause. Correct this by bailing out in time.
Change-Id: Icc09514201c948edf21cf7e9e36f0cfe0520a2c9
When broken for whatever reason, we do not want timeout handling
to continue trying to heal the session. Notably bad when login fails
and a _relog is triggered continually.
Change-Id: Id342a7fc1274fe95483f2e5392b04f86d23c2b1a
When fixing the performance by declining to select() on a socket
until a recvfrom() explictly occurs to clear the socket, a problem
was injected where a socket could be made ignored by particular
timing of incoming traffic on the socket. Correct this by having
the _poller function forcefully return True if any sockets
are ignored (which also implies they are ready since they
should be discarded on read).
Change-Id: I6be39d39e4d2ed3b05af9a4c954fb64c993ffb50
If a session is partially ready to go, but not fully, drop packets.
This can occur if just the right packets drop such that the remote
end gets going but the local end does not receive them.
Change-Id: I63ac506484a1792db673f6e90e13cc4b0132719c
If no keepalives are registered, return cleanly from an
attempt to unregister rather than raise an exception.
Change-Id: I0064714af4ba8f1b62f9061dc0dc481116c871fe
When simulating 80% packet loss, it was discovered that a console
being close() due to inability to complete session establishment
will not have keepaliveid registered yet.
Change-Id: I839645b13cbe30ae71e104c44e63896a4802befe
Session establishment would fail to restart on loss due to deciding to
append to pending payload. Fix by having the establishment phase
reset the payload situation. On logout, the false to retry caused
raw_command to wait forever. Fix by having raw_command not bother
waiting for such things.
Change-Id: I26d2116bf78440b3ccfc319094283c5d7a58cc5e
When select() would identify a a socket, it would
potentially call select() on the same socket
before a recvfrom() would happen. In python 2.7,
this caused the IO thread to block other threads
waiting on something the other threads needed to
do. Resolve by explicitly ignoring a socket
where recvfrom() will be pending until recvfrom()
is next called. This reduces one test case from
42,000-47,000 select() calls to just 86.
Change-Id: Ic8ebecfc61d048e537b5d76a6a3f0665fd340a3d
command would return a number rather than a string for set_bootdev.
Correct this by returning the string passed in on success rather
than the resulting number.
Change-Id: I8e76b1ac9d0222630abe6b160e6271b13ef4987d
This installs the samples as utilities that can be invoked in the path.
With this, some testing and exploration may be made easier.
Change-Id: I5b7ae5b6e30eea3070dfbcb93d23802b8308d281
Since they will not be used on a dead session, remove the reference to
mitigate risk of dead references keeping python from recovering used
memory.
Change-Id: Ib33ea32c02d3cc89b0aa62532e51fc1351e26a79
Some BMCs cannot fetch whole SDRs in one chunk. When faced
with such a scenario, back off exponentially until things can work.
Change-Id: Ifd9df93af56e6fedfeb4d46b662937bf8db80b01
The IPMI layer keepalive is sufficient for most scenarios,
but SOL additionally cares about the SOL payload specifically.
During SOL session, use an SOL specific scheme for keepalive.
Change-Id: I23c5b8da4598696aa936274b3e6b527c8204b4db
A logged out session failed to deregister it's keepalive. As a result, the zombie
keepalive executed and failed. The failure path then corrupts the bmc_handlers structure.
Correct this by both deregistering the erroneous keepalive and having the mark_broken
function be more careful about deleting a member of the class hash that it may have
nothing to do with.
Change-Id: I41251309dc27ffaca89cc7deef9bf16a61f1d07e
If a session was not logged, it would still be considered a candidate for
new session objects. Disqualify such sessions so that new session
objects after a 'logout' or similar will be fulfilled.
Change-Id: I7af11a8a300b7aedcadcec7673d6308e3b08f27d
If the keepalive fails, it was causing the library to spin
on expired keepalive attempts. Call mark_broken in order to
avoid that spinning.
Change-Id: I1c7a06ebf7609989ebd6e90d26ac69f3fe7b8699
Given the nature of SOL, it is impossible to control flow of incoming
traffic. This means measures to mitigate the risk of exhausting buffer
memory on the socket cease to be effective. Modify strategy to stop
throttling and instead allocate new sockets to acquire more network
buffer space. This means the footprint of a small scale setup is
actually even lower and a larger setup does get more filehandles,
but still 1/64th the footprint of the usual strategy.
Change-Id: I10698393d31b0c04d0242ff85815239078c076e2
There are a number of issues flagged by code analysis. None of them
are functional, but change to be happier by code analysis.
Change-Id: Id1c2fb9c32c1f7f45cc7cad77c09fb55fb40a8a3
In the attempt to clean up, one change broke how a command
object knows whether it has an SDR yet or not. Correct that
mistake.
Change-Id: I76faaccf15c2dbfa2b7d5a3a4e1665e0cefe4c6d
Command had a number of awkward style issues stemming from the original
design that was heavy in callbacks. Clean up those issues.
Change-Id: I756e41ac7f909813ce6241f0889a85dd06599b2a
A number of copy/paste structural errors existed. Correct them in
accordance with the specification.
Change-Id: I0984f85811744e3100f5990b0606dfbca98a69d4
There was a problem where the io thread could exist, but not yet
be ready. Fix this by adding an iothreadready bool and a list of
events to fire when iothread is ready.
Change-Id: I4eb13e2210fa07bddbe717f56b12c736c99938dc