The data length of a log entry must not exceed 65k. If an attempt is
made to log that much, break it up and duplicate the records. It may make
sense to indicate a continuation explicitly, but for now just extend.
If something went completely off the rails, it could easily fill up lots of memory with log entries in the 2 seconds it
would buffer. For now disable the buffering on key debug logs, as the main purpose was reducing IOPs in the per-node
console logs anyway. A future behavior may be to also limit the size and/or number of outstanding log entries before
committing to disk.
This brings things right to the level of xCAT in
terms of underlying capability. mac addresses have both
an all inclusive list of ports it is found on, and any nodes
that it matches. It goes another step further by logging errors
when ambiguity is detected (either verbatim config conflict or
ambiguous result based on 'namesmatch' and the switch config).
One, include a number of 'fellow' mac addresses on the same port.
Another, allow a mac to appear on multiple ports and have that
reflected in the data structure. Also capture errors to trace
log rather than hanging up on unexpected cases.
As we implement internal processes with automation,
provide a hook for code to convey information about
situations encountered during background activity.
Ultimately, it is intended to hook event forwarders
for things like syslog/email/etc
When the read_recent_text ran off a cliff looking for buffer data,
it left the current textfile handle in a bad state. This caused
the buffer rebuild to fail completely in a scenario where all the
current logs put together don't have enough data to satisfy the
buffer. Fix this by making the handle more obviously broken, and
repairing while seeking out data.
Users have noted and complained that log data was lost, and didn't have old data. This changes
the default behavior to be indefinite retention. Users noting a lot of logs using space have a nice
intuitive indication of old files to delete, and the option remains for those to request a log expiration.
The rollback support and replaydid not follow more than one log back. Do the work to recurse
into older and older files, until big enough buffer or run out of files.
When a rollover event was detected, the offset
of the rollover event itself was being read
from the rolled file erroneously. Skip to
next loop iteration so that the metadata about
the rollover event is properly ignored in building
the text data buffer.
The log format for other pieces of data is JSON.
Change the rollover event to be consistent. Also
do not record the previous name of the log file,
as that isn't used, and the current filename is
likely to change when it too gets rolled over
so there's no practical use of knowing the
no longer valid name for the transaction.
Under windows, we can't use flock. However we can
get locking using msvcrt using different, but related
semantics. Imitate whole file locking by just locking
first byte. We have to make sure we seek() to the same
place when locking and unlocking, as Windows requires
the offset to be same for both operations.
Add TimedAndSizeRotatingFileHandler which mixes together
the RotatingFileHandler and TimedRotatingFileHandler from
python logging module to process the log data.
Add logrollover event to track the renamed information, so
that console session can read the log data from current log
file and last renamed file.
Global configuration is used by the log handler. The format
of the log section in '/etc/confluent/service.cfg' is like:
[log]
when = m
backup_count = 3
max_bytes = 8192
utc = False
The change to allow configmanager to log traces
erroneously broke due to use of 'import .. as' in
circular imports. Skip 'as' and the problem does not occur.
When a log object is used as a 'dumb' file target, show the origin of the
output. The motivation here is that 'print' statements are intended to
be an unusual event that should be easily tracked down and eliminated
once their specific use has concluded.
In xcatd, running '-f' means a lot of mysterious output that is hard
to manage as they frequently print out variable contents without
searchable context. For example, if in xcatd someone randomply prints out
a varibale with a nodename, we might see a stray:
n1
With this change (together with previous changes), the same statement
results in stdout log appearing like:
Jan 19 14:20:54 File "/opt/confluent/lib/python/confluent/plugins/hardwaremanagement/ipmi.py", line 364, in _dict_sensor
print nodename: n1
The buffer age was not working as intended
The fix to exit on error exited overly eagerly
The log replay failed to report a third value if file did not exist.