User memory needs to be zeroed out before it is sent to the user.
To do this, the kernel maps the page, memsets it to zero and then
unmaps it. By virtue of mapping it, this forces us to flush the
dcache to ensure cache coherency between kernel and user mappings.
Originally, the page_alloc loop was using GFP_ZERO (which does a
map, memset, and unmap for each individual page) and then we were
additionally calling flush_dcache_page() for each page killing us
on performance. It is far more efficient, especially for large
allocations (> 1MB), to allocate the pages without GFP_ZERO and
then to vmap the entire allocation, memset it to zero, flush the
cache and then unmap. This process is slightly slower for very
small allocations, but only by a few microseconds, and is well
within the margin of acceptability. In all, the new scheme is
faster than the default for all sizes greater than 16k, and is
almost 4X faster for 2MB and 4MB allocations which are common for
textures and very large buffer objects.
The downside is that if there isn't enough vmalloc room for the
allocation that we are forced to fallback to a slow page by
page memset/flush, but this should happen rarely (if at all) and
is only included for completeness.
Add a guard page on the backside of page_alloc MMU mappings to protect
against an over zealous GPU pre-fetch engine that sometimes oversteps the
end of the mapped region. The same phsyical page can be re-used for each
mapping so we only need to allocate one phsyical page to rule them all
and in the darkness bind them.
Change the vmalloc allocation name to something more appropriate since
we do not allocate memory using vmalloc for userspace driver. We
directly allocate physical pages and map that to user address space. The
name is changed to page_alloc instead of vmalloc. Add sysfs files to
track memory usage via both vmalloc and page_alloc.
Ion carveout and content protect heap buffers do not
have a struct page associated with them. Thus
sg_phys() will not work reliably on these buffers.
Set the dma_address field on physically contiguous
buffers. When mapping a scatterlist to the gpummu
use sg_dma_address() first and if it returns 0
then use sg_phys().
msm: kgsl: Use kzalloc to allocate scatterlists of 1 page or less
The majority of the scatterlist allocations used in KGSL are under 1
page (1 page of struct scatterlist is approximately 1024 entries
equalling 4MB of allocated buffer). In these cases using vmalloc
for the sglist is undesirable and slow. Add functions to check the
size of the allocation and favor kzalloc for 1 page allocations and
vmalloc for larger lists.
For dma_alloc_coherent() you don't need writel/readl because
it's just a plain old void *. Linux tries very hard to make a
distinction between io memory (void __iomem *) and memory
(void *) so that drivers are portable to architectures that
don't have a way to access registers via pointer dereferences.
You can see http://lwn.net/Articles/102232/ and the Linus rant
http://lwn.net/Articles/102240/ here for more details behind
the motivation.
msm: kgsl: Allocate physical pages instead of using vmalloc
Replace vmalloc allocation with physical page allocation. For most
allocations we do not need a kernel virual address. vmalloc uses up
the kernel virtual address space. By replacing vmalloc with physical
page alloction and mapping that allocation to kernel space only
when it is required prevents the kgsl driver from using unnecessary
vmalloc virtual space.
kmalloc allocates physically contiguous memory and
may fail for larger allocations due to fragmentation.
The large allocations are caused by the fact that the
scatterlist structure is 24 bytes and the array size
is proportional to the number of pages being mapped.
The tools that process cff dumps expect a linear
memory region, but the start address of that region can
be configured. As long as there is only a single
pagetable (so that there aren't duplicate virtual
addresses in the dump), dumps captured with the
mmu on are easier to deal with than reconfiguring
to turn the mmu off.
Make the framework for reporting per-process memory statistics a little bit
more generic. This should make it easier to keep track of more external
memory sources as they are added.
Certain memory allocations are not properly tracked by kmemleak tool,
which makes it to incorrectly detect memory leak. Notify the tool by using
kmemleak_not_leak() to ignore the memory allocation so that incorrect leaks
report are avoided.