Make sure confluent has made /etc/confluent, and further always initialize the
encryption key, as it will almost certainly
be needed and easiest to just always
generate on first startup.
During a restart, a client may aggressively trigger
console reconnect before the consoleserver starts.
Make sure that the daemon is running and globals
ready before API could possible ask for console.
If something went completely off the rails, it could easily fill up lots of memory with log entries in the 2 seconds it
would buffer. For now disable the buffering on key debug logs, as the main purpose was reducing IOPs in the per-node
console logs anyway. A future behavior may be to also limit the size and/or number of outstanding log entries before
committing to disk.
Most of the time, we don't need this pool. Create when needed,
and clean up after 30 seconds of inactivity. This avoids a slow
shutdown that was due to core python hanging in help_finish_stuff,
and as a bonus means most of the time, one only sees one confluent
process, which has been a source of questions already.
Consoles starting up would potentially delay API availaility. Change
by having the API having ample time to startup, then commence the
busy work of starting cnosole sessions.
Knowing ahead of time that confluent is the sort of app that, despite
best efforts, is filehandle heavy, auto-attempt to raise soft to
be equal to hard limit. A sufficiently large cluster (i.e. more than 2000
nodes) would still need to have limit adjusted at system level for now.
When receiving a USR1 signal, it did usefully provide
'the' current stack, useful for diagnosing really hard
hangs. However, it's frequently informative to see all
the thread stack traces, so add that data to the diagnostic
feature.
When initializing security key, a background thread may occur. Sometimes,
the system would go to daemonize while that thread was still running, and
the whole system could exit. Leading to incomplete write to globals as well
as leaving the daemon looking at the data copied over from pre-fork and
seeing the last state of that thread forever frozen. Make sure the background
threads are fully done prior to exiting.
If confluent gets stuck, provide a debug facility
to sample where it is stuck. Sending confluent
SIGUSR1 will now cause /var/log/confluent/hangtraces
to get written to.
Add TimedAndSizeRotatingFileHandler which mixes together
the RotatingFileHandler and TimedRotatingFileHandler from
python logging module to process the log data.
Add logrollover event to track the renamed information, so
that console session can read the log data from current log
file and last renamed file.
Global configuration is used by the log handler. The format
of the log section in '/etc/confluent/service.cfg' is like:
[log]
when = m
backup_count = 3
max_bytes = 8192
utc = False
Establish a config file for certain configuration parameters that
control service startup and things that are best managed via out
of band configuration file and easiest to do with a restart. For
now, implement control of http service binding.
From Lucio Seki
Clarify that the data is in UTF-8 where applicable. It is expected
that clients are capable of handling UTF-8 for now. Additionally,
the HTML api explorer handling of numeric data is fixed.
If wanting to run as non-root, mkdir -p /var/run/confluent /var/log/confluent /etc/confluent
and chown those to be owned by confluent user. That is probably path for deb and rpm packaging.