xCAT/xNBA - xNBA - Gitea: Git with a cup of tea

xCAT/xNBA

mirror of https://github.com/xcat2/xNBA.git synced 2025-11-08 15:10:54 +00:00

Author	SHA1	Message	Date
Michael Brown	4e4fc678c2	[intel] Increase receive ring fill level As of commit `d28bb51` ("[tcp] Defer sending ACKs until all received packets have been processed"), increasing the RX ring size will increase the number of received packets per transmitted ACK (since each poll will process up to one complete receive ring). Under KVM, this can make a substantial (up to ~200%) difference to the overall download speed, since transmissions are very expensive. Increase the ring fill level from four to eight packets: this increases the download speed by around 50% at a cost of around 8kB of heap space. Further speedups are possible by increasing the ring size further, but it would be preferable to find alternative methods which do not use noticeable amounts of heap space. Tested-by: Robin Smidsrød <robin@smidsrod.no> Signed-off-by: Michael Brown <mcb30@ipxe.org>	2014-05-16 13:15:40 +01:00
Marin Hannache	ca93505a78	[nfs] Fix an invalid free() when loading a regular (non-symlink) file An invalid free() was ironically introduced by fixing another invalid free in commit `7aa69c4` ("[nfs] Fix an invalid free() when loading a symlink"). Signed-off-by: Marin Hannache <git@mareo.fr> Signed-off-by: Michael Brown <mcb30@ipxe.org>	2014-05-16 11:01:39 +01:00
Michael Brown	f747a00c54	[lkrnprefix] Make real-mode setup code relocatable The bzImage boot protocol allows the real-mode code to be loaded at any segment within base memory. (The fact that both iPXE and recent versions of Syslinux will load the real-mode code at 1000:0000 is a coincidence; it is not guaranteed by the specification.) Fix by making the code relocatable. Reported-by: Andrew Stuart <andrew@shopcusa.com> Signed-off-by: Michael Brown <mcb30@ipxe.org>	2014-05-15 13:04:47 +01:00
Christian Hesse	a8f037a275	[build] Merge util/geniso and util/genliso Rework geniso and genliso to provide a single merged utility for generating ISO images. Modified-by: Michael Brown <mcb30@ipxe.org> Signed-off-by: Michael Brown <mcb30@ipxe.org>	2014-05-14 16:00:58 +01:00
Michael Brown	d31cf2de30	[undi] Apply quota only to number of complete received packets Signed-off-by: Michael Brown <mcb30@ipxe.org>	2014-05-14 13:50:30 +01:00
Michael Brown	779d65222e	[build] Avoid errors when build directory is mounted via NFS Reported-by: Robin Smidsrød <robin@smidsrod.no> Signed-off-by: Michael Brown <mcb30@ipxe.org>	2014-05-13 16:45:57 +01:00
Michael Brown	a8d1b50d8b	[lkrnprefix] Function as a bzImage kernel The .lkrn prefix currently provides a zImage kernel with unused setup sectors and the whole iPXE binary placed within the "protected mode kernel" portion of the zImage. The work carried out years ago to create the .mrom format provides a mechanism allowing the iPXE binary to be split into a small real-mode header and a larger payload. This neatly matches the way that a bzImage is loaded: the "setup sectors" can contain the header and the "protected mode kernel" can contain the payload. This removes the size restrictions on an iPXE .lkrn image (and hence on derived image formats such as .iso). Also remove obsolete copyright information, since none of the original code or functionality now remains. Signed-off-by: Michael Brown <mcb30@ipxe.org>	2014-05-12 23:49:14 +01:00
Michael Brown	d28bb51f44	[tcp] Defer sending ACKs until all received packets have been processed When running inside a virtual machine (or when using the UNDI driver), transmitting packets can be expensive. When we receive several packets in one poll (e.g. because a slow BIOS timer interrupt routine has caused us to fall behind in processing), we can safely send just a single ACK to cover all of the received packets. This reduces the time spent transmitting and allows us to clear the backlog much faster. Various RFCs (starting with RFC1122) state that there should be an ACK for at least every second segment. We choose not to enforce this rule. Under normal operation each poll should find at most one received packet, and we will then not delay any ACKs. We delay (i.e. omit) ACKs only when under sufficiently heavy load that we are finding multiple packets per poll; under these conditions it is important to clear the backlog quickly since any delay may lead to dropped packets. Signed-off-by: Michael Brown <mcb30@ipxe.org>	2014-05-12 17:19:26 +01:00
Marin Hannache	7aa69c4d0d	[nfs] Fix an invalid free() when loading a symlink Signed-off-by: Marin Hannache <git@mareo.fr> Signed-off-by: Michael Brown <mcb30@ipxe.org>	2014-05-12 17:09:37 +01:00
Michael Brown	d42901c4ad	[build] Fix version.o dependency upon git index Commit `8540300` ("[build] Disable ccache for all relevant build targets") attempted to generalise the rule for $(BIN)/version.o to $(BIN)/version.% in order to apply the dependency to all relevant build targets (debug objects, assembly listings, etc). This generalisation appears to work for the ccache override directives, but seems to cause make (at least, GNU make 4.0) to simply ignore the dependency upon the git index. Since version.c contains only some string constants, there is unlikely to be a substantive need for its debug objects, assembly listings, etc. Restore the previous form of the dependency and accept that hypothetical builds with e.g. DEBUG=version will not be handled correctly. Signed-off-by: Michael Brown <mcb30@ipxe.org>	2014-05-12 16:41:06 +01:00
Michael Brown	abf875a2e5	[intel] Exclude time spent in hypervisor from profiling When profiling, exclude any time spent inside the hypervisor responding to our MMIO accesses. This substantially reduces the variance accumulated on many other profilers. Signed-off-by: Michael Brown <mcb30@ipxe.org>	2014-05-06 22:53:33 +01:00
Michael Brown	6f410a16d9	[profile] Allow interrupts to be excluded from profiling results Interrupt processing adds noise to profiling results. Allow interrupts (from within protected mode) to be profiled separately, with time spent within the interrupt handler being excluded from any other profiling currently in progress. Signed-off-by: Michael Brown <mcb30@ipxe.org>	2014-05-04 13:39:42 +01:00
Michael Brown	69313edad8	[undi] Place an upper limit on the number of PXENV_UNDI_ISR calls per poll PXENV_UNDI_ISR calls may implicitly refill the underlying receive ring, and so could continue to retrieve packets indefinitely. Place an upper limit on the number of calls to PXENV_UNDI_ISR per call to undinet_poll(). Signed-off-by: Michael Brown <mcb30@ipxe.org>	2014-05-03 19:52:10 +01:00
Michael Brown	71ed061776	[undi] Do not switch to real mode to check for NIC interrupt Signed-off-by: Michael Brown <mcb30@ipxe.org>	2014-05-03 19:52:10 +01:00
Michael Brown	277f581ac3	[undi] Report any PXENV_UNDI_ISR errors via netdev_rx_err() Signed-off-by: Michael Brown <mcb30@ipxe.org>	2014-05-03 19:52:10 +01:00
Michael Brown	402ce65632	[undi] Profile transmit and receive datapaths Signed-off-by: Michael Brown <mcb30@ipxe.org>	2014-05-03 19:51:38 +01:00
Michael Brown	50689a8974	[undi] Profile all PXE API calls Signed-off-by: Michael Brown <mcb30@ipxe.org>	2014-05-03 19:51:29 +01:00
Michael Brown	206bd7bb64	[pxe] Work around missing PXENV_UNDI_OPEN only when necessary Signed-off-by: Michael Brown <mcb30@ipxe.org>	2014-05-03 18:52:15 +01:00
Michael Brown	90caf71051	[pxe] Profile UNDI transmit datapath Signed-off-by: Michael Brown <mcb30@ipxe.org>	2014-05-03 18:52:15 +01:00
Michael Brown	579337c368	[pxe] Profile all PXE API calls Signed-off-by: Michael Brown <mcb30@ipxe.org>	2014-05-03 18:52:15 +01:00
Michael Brown	be7f35d9c0	[librm] Add profiling self-tests for complete real_call and prot_call cycles Signed-off-by: Michael Brown <mcb30@ipxe.org>	2014-05-03 18:52:12 +01:00
Michael Brown	a0da06c306	[profile] Provide methods for profiling individual stages of operations Signed-off-by: Michael Brown <mcb30@ipxe.org>	2014-05-03 18:50:26 +01:00
Michael Brown	bcfaf119a7	[librm] Speed up protected-mode calls under KVM When making a call from real mode to protected mode, we save and restore the global and interrupt descriptor table registers. The restore currently takes place after returning to real mode, which generates two EXCEPTION_NMIs and corresponding VM exits when running under KVM on an Intel CPU. Avoid the VM exits by restoring the descriptor table registers inside prot_to_real, while still running in protected mode. Signed-off-by: Michael Brown <mcb30@ipxe.org>	2014-05-02 21:00:53 +01:00
Michael Brown	c64747db50	[librm] Speed up real-to-protected mode transition under KVM Ensure that all segment registers have zero in the low two bits before transitioning to protected mode. This allows the CPU state to immediately be deemed to be "valid", and eliminates the need for any further emulated instructions. Load the protected-mode interrupt descriptor table after switching to protected mode, since this avoids triggering an EXCEPTION_NMI and corresponding VM exit. This reduces the time taken by real_to_prot under KVM by around 50%. Signed-off-by: Michael Brown <mcb30@ipxe.org>	2014-05-02 15:23:21 +01:00
Michael Brown	5a08b63cb7	[librm] Speed up protected-to-real mode transition under KVM On an Intel CPU supporting VMX, KVM will emulate instructions while the CPU state remains "invalid". In real mode, the CPU state is defined to be "invalid" if any segment register has a base which is not equal to (sreg<<4) or a limit which is not equal to 64kB. We don't actually use the base stored in the REAL_DS descriptor for any significant purpose. Change the base stored in this descriptor to be equal to (REAL_DS<<4). A segment register loaded with REAL_DS is then automatically valid in both real and protected modes. This allows KVM to stop emulating instructions much sooner. The only use of REAL_DS for memory accesses currently occurs in the indirect ljmp within prot_to_real. Change this to a direct ljmp, storing rm_cs in .text16 as part of the ljmp instruction. This removes the only memory access via REAL_DS (thereby allowing for the above descriptor base address hack), and also simplifies the ljmp instruction (which will still have to be emulated). Load the real-mode interrupt descriptor table register before switching to real mode, since this avoids triggering an EXCEPTION_NMI and corresponding VM exit. This reduces the time taken by prot_to_real under KVM by around 65%. Signed-off-by: Michael Brown <mcb30@ipxe.org>	2014-05-02 15:23:20 +01:00
Michael Brown	03e76c34d8	[librm] Add meaningful labels at section changes The mode-transition code involves paths which switch back and forth between the .text and .text16 sections. At present, only the start of each function is labelled, which makes it difficult to decode addresses within the parts of the function existing in a different section. Add explicit labels at the start of each section change, so that addresses can be meaningfully decoded to the nearest label. Signed-off-by: Michael Brown <mcb30@ipxe.org>	2014-05-02 15:23:20 +01:00
Michael Brown	bd640bc364	[librm] Add a profiling self-test for measuring mode transition times Signed-off-by: Michael Brown <mcb30@ipxe.org>	2014-05-02 15:23:20 +01:00
Michael Brown	9c16548506	[test] Print out profiling statistics after a successful test run Signed-off-by: Michael Brown <mcb30@ipxe.org>	2014-05-02 15:23:20 +01:00
Michael Brown	34eaf69ddf	[pcbios] Do not switch to real mode to sleep the CPU Now that we can handle interrupts while in protected mode, there is no need to switch to real mode just to halt the CPU. Signed-off-by: Michael Brown <mcb30@ipxe.org>	2014-04-29 18:24:10 +01:00
Michael Brown	e4593909a8	[pcbios] Do not switch to real mode to check for timer interrupt The currticks() function is called at least once per TCP packet, and so is performance-critical. Switching to real mode just to allow the timer interrupt to fire is expensive when running inside a virtual machine, and imposes a significant performance cost. Fix by enabling interrupts without switching to real mode. This results in an approximately 100% increase in download speed when running under KVM. Signed-off-by: Michael Brown <mcb30@ipxe.org>	2014-04-29 18:24:10 +01:00
Michael Brown	aaf276ccd4	[comboot] Use built-in interrupt reflector We now have the ability to handle interrupts while in protected mode, and so no longer need to set up a dedicated interrupt descriptor table while running COM32 executables. Signed-off-by: Michael Brown <mcb30@ipxe.org>	2014-04-29 18:24:10 +01:00
Michael Brown	23b671daf4	[librm] Allow interrupts in protected mode When running in a virtual machine, switching to real mode may be expensive. Allow interrupts to be enabled while in protected mode and reflected down to the real-mode interrupt handlers. Signed-off-by: Michael Brown <mcb30@ipxe.org>	2014-04-29 18:24:04 +01:00
Michael Brown	4413ab4f5a	[build] Allow for a debug level of zero Allow for an explicit debug level of zero, which will enable assertions and profiling (i.e. anything controlled by NDEBUG) without generating any debug messages. Signed-off-by: Michael Brown <mcb30@ipxe.org>	2014-04-28 14:45:47 +01:00
Michael Brown	4e78733094	[downloader] Profile receive datapath Signed-off-by: Michael Brown <mcb30@ipxe.org>	2014-04-28 12:31:39 +01:00
Michael Brown	e825a96a25	[http] Profile receive datapath Signed-off-by: Michael Brown <mcb30@ipxe.org>	2014-04-28 12:31:23 +01:00
Michael Brown	767f2acb98	[tcp] Profile transmit and receive datapaths Signed-off-by: Michael Brown <mcb30@ipxe.org>	2014-04-28 12:30:57 +01:00
Michael Brown	f65c81b1d0	[ipv4] Profile transmit and receive datapaths Signed-off-by: Michael Brown <mcb30@ipxe.org>	2014-04-28 12:30:30 +01:00
Michael Brown	6d4deeeb6c	[librm] Use genuine real mode to accelerate operation in virtual machines We currently use flat real mode wherever real mode is required. This guarantees that we will not surprise some unsuspecting external caller which has carefully set up flat real mode by suddenly reducing the segment limits to 64kB. However, operating in flat real mode imposes a severe performance penalty in some virtualisation environments, since some CPUs cannot fully virtualise flat real mode and so the hypervisor must fall back to emulation. In particular, operating under KVM on a pre-Westmere Intel CPU will be at least an order of magnitude slower, to the point that there is a visible teletype effect when printing anything to the BIOS console. (Older versions of KVM used to cheat and ignore the "flat" part of flat real mode, which masked the problem.) Switch (back) to using genuine real mode with 64kB segment limits instead of flat real mode. Hopefully this won't break anything. Add an explicit switch to flat real mode before returning to the BIOS from the ROM prefix, since we know that a PMM BIOS will call the ROM initialisation point (and potentially the BEV) in flat real mode. As noted in previous commit messages, it is not possible to restore the real-mode segment limits after a transition to protected mode, since there is no way to know which protected-mode segment descriptor was originally used to initialise the limit portion of the segment register. Signed-off-by: Michael Brown <mcb30@ipxe.org>	2014-04-28 01:21:08 +01:00
Michael Brown	b2c7b6a85e	[intel] Push new RX descriptors in batches Inside a virtual machine, writing the RX ring tail pointer may incur a substantial overhead of processing inside the hypervisor. Minimise this overhead by writing the tail pointer once per batch of descriptors, rather than once per descriptor. Profiling under qemu-kvm (version 1.6.2) shows that this reduces the amount of time taken to refill the RX descriptor ring by around 90%. Signed-off-by: Michael Brown <mcb30@ipxe.org>	2014-04-27 23:14:48 +01:00
Michael Brown	8a3dcefc0c	[intel] Profile common virtual machine operations Operations which are negligible on physical hardware (such as issuing a posted write to the transmit ring tail register) may involve substantial amounts of processing within the hypervisor if running in a virtual machine. Signed-off-by: Michael Brown <mcb30@ipxe.org>	2014-04-27 23:14:48 +01:00
Michael Brown	2c820d684a	[netdevice] Profile common operations Signed-off-by: Michael Brown <mcb30@ipxe.org>	2014-04-27 23:14:47 +01:00
Michael Brown	7c44fd68f0	[cmdline] Add "profstat" command to display profiling statistics Signed-off-by: Michael Brown <mcb30@ipxe.org>	2014-04-27 23:14:47 +01:00
Michael Brown	e5f6a9be38	[profile] Add generic profiling infrastructure Signed-off-by: Michael Brown <mcb30@ipxe.org>	2014-04-27 23:14:43 +01:00
Michael Brown	d36e814b8a	[libc] Add flsll() Signed-off-by: Michael Brown <mcb30@ipxe.org>	2014-04-27 16:56:09 +01:00
Michael Brown	3ffd309375	[libc] Add isqrt() function to find integer square roots Signed-off-by: Michael Brown <mcb30@ipxe.org>	2014-04-26 18:19:49 +01:00
Michael Brown	9e8c48deea	[test] Check for correct -mrtd assumption on libgcc arithmetic functions As observed in commit `082cedb` ("[build] Fix __libgcc attribute for recent gcc versions"), recent versions of gcc have changed the semantics of -mrtd as applied to the implicit arithmetic functions. It is possible for tests to succeed even if our assumptions about gcc's interpretation of -mrtd are incorrect. In particular, if gcc chooses to utilise a frame pointer in the calling function, then it can tolerate a temporarily incorrect stack pointer (since the stack pointer will shortly afterwards be restored from the frame pointer anyway). Add tests designed specifically to check that our implementations of the implicit arithmetic functions manipulate the stack pointer as expected by gcc. The effect of these tests can be observed by temporarily reverting commit `082cedb` ("[build] Fix __libgcc attribute for recent gcc versions"): without this fix in place, the tests will fail on gcc 4.7 and later. Signed-off-by: Michael Brown <mcb30@ipxe.org>	2014-04-26 16:00:26 +01:00
Michael Brown	082cedb3c3	[build] Fix __libgcc attribute for recent gcc versions We observed some time ago (in commit `4ce8d61` "Import various libgcc functions from syslinux") that gcc seems to treat calls to the implicit arithmetic functions (e.g. __udivdi3()) as being affected by -mregparm but unaffected by -mrtd. This seems to be no longer the case with current gcc versions, which treat calls to these functions as being affected by both -mregparm and -mrtd, as expected. There is nothing obvious in the gcc changelogs to indicate precisely when this happened. From experimentation with available gcc versions, the change occurred sometime between v4.6.3 and v4.7.2. We assume that only versions up to v4.6.x require the special treatment. Signed-off-by: Michael Brown <mcb30@ipxe.org>	2014-04-25 16:06:37 +01:00
Michael Brown	ad7d5af5e1	[test] Add tests for 64-bit division On a 32-bit system, 64-bit division is implemented using the libgcc functions provided in __udivmoddi4.c etc. Calls to these functions are generated automatically by gcc, with a calling convention that is somewhat empirical in nature. Add these self-tests primarily as a check that we are using the correct calling convention. Signed-off-by: Michael Brown <mcb30@ipxe.org>	2014-04-25 01:45:13 +01:00
Michael Brown	dce7107fc0	[libc] Add inline assembly implementation of flsl() using BSR instruction Signed-off-by: Michael Brown <mcb30@ipxe.org>	2014-04-24 14:49:08 +01:00
Michael Brown	8f0e0e1356	[test] Add self-tests for flsl() Signed-off-by: Michael Brown <mcb30@ipxe.org>	2014-04-24 13:40:35 +01:00

1 2 3 4 5 ...

4753 Commits