staging: logger: hold mutex while removing reader
staging: android: logger: clarify non-update of w_off in do_write_log_from_user
staging: android: logger: clarify code in clock_interval
staging: android: logger: reorder prepare_to_wait and mutex_lock
staging: android: logger: simplify and optimize get_entry_len
staging: android: logger: Change logger_offset() from macro to function
Staging: android: fixed white spaces coding style issue in logger.c
android: logger: bump up the logger buffer sizes
android, lowmemorykiller: remove task handoff notifier
staging: android: lowmemorykiller: Fix task_struct leak
staging: android/lowmemorykiller: Don't unregister notifier from atomic context
staging: android, lowmemorykiller: convert to use oom_score_adj
staging: android/lowmemorykiller: Do not kill kernel threads
staging: android/lowmemorykiller: No need for task->signal check
staging: android/lowmemorykiller: Better mm handling
staging: android/lowmemorykiller: Don't grab tasklist_lock
staging: android: lowmemorykiller: Don't wait more than one second for a process to die
Staging: android: fixed 80 characters warnings in lowmemorykiller.c
staging: android: lowmemorykiller: Ignore shmem pages in page-cache
staging: android: lowmemorykiller: Remove bitrotted codepath
staging: android: lowmemkiller: Substantially reduce overhead during reclaim
staging: android: lowmemorykiller: Don't try to kill the same pid over and over
Staging: android: binder: Fix crashes when sharing a binder file between processes
drivers:staging:android Typos: fix some comments that have typos in them.
fs: Remove missed ->fds_bits from cessation use of fd_set structs internally
Staging:android: Change type for binder_debug_no_lock switch to bool
Staging: android: binder: Fix use-after-free bug
Separate ib parse checking from cffdump as it is useful
in other situations. This is controlled by a new debugfs
file, ib_check. All ib checking is off (0) by default,
because parsing and mem_entry lookup can have a performance
impact on some benchmarks. Level 1 checking verifies the
IB1's. Level 2 checking also verifies the IB2.
Add nop packets in ringbuffer at the start and end of IB buffers
subnmitted by user space driver. These nop packets serve as markers
that can be used during replay, recovery, and snapshot to get valid
data for a GPU hang dump
User memory needs to be zeroed out before it is sent to the user.
To do this, the kernel maps the page, memsets it to zero and then
unmaps it. By virtue of mapping it, this forces us to flush the
dcache to ensure cache coherency between kernel and user mappings.
Originally, the page_alloc loop was using GFP_ZERO (which does a
map, memset, and unmap for each individual page) and then we were
additionally calling flush_dcache_page() for each page killing us
on performance. It is far more efficient, especially for large
allocations (> 1MB), to allocate the pages without GFP_ZERO and
then to vmap the entire allocation, memset it to zero, flush the
cache and then unmap. This process is slightly slower for very
small allocations, but only by a few microseconds, and is well
within the margin of acceptability. In all, the new scheme is
faster than the default for all sizes greater than 16k, and is
almost 4X faster for 2MB and 4MB allocations which are common for
textures and very large buffer objects.
The downside is that if there isn't enough vmalloc room for the
allocation that we are forced to fallback to a slow page by
page memset/flush, but this should happen rarely (if at all) and
is only included for completeness.
Add a guard page on the backside of page_alloc MMU mappings to protect
against an over zealous GPU pre-fetch engine that sometimes oversteps the
end of the mapped region. The same phsyical page can be re-used for each
mapping so we only need to allocate one phsyical page to rule them all
and in the darkness bind them.
Change the vmalloc allocation name to something more appropriate since
we do not allocate memory using vmalloc for userspace driver. We
directly allocate physical pages and map that to user address space. The
name is changed to page_alloc instead of vmalloc. Add sysfs files to
track memory usage via both vmalloc and page_alloc.
Memory mapped through kgsl_mmu_map_global() is supposed to have
the same gpu address in all pagetables. And the memdesc will
persist beyond the lifetime of any single pagetable.
Therefore, memdesc->gpuaddr should not be zeroed for these
memdescs.
Given a pagetable base and a GPU address, find the struct kgsl_mem_entry
that matches the object. Move this functionality out from inside another
function and promote it to top level so it can be used by upcoming
functionality.
Previously, memory objects assumed that they remained attached to a
process until they are destroyed. In the past this was mostly true,
but worked by luck because a process could technically map the memory
and then close the file descriptor which would eventually explode. Now we
do the process related cleanup (MMU unmap, fixup statistics) when the
object is released from the process so the process can go away without
affecting the other holders of the mem object refcount.
Postmortem dump was not parsing CP_INDIRECT_BUFFER_PFE commands.
Snapshot was recently fixed to handle this, and this change
extends support to postmortem dump.
Set the correct GMEM and istore sizes for A320 on APQ8064.
The more GMEM we have the happier we are, so the code will
work with 256K, but it will be better with 512K. For the
instruction store the size is important during GPU snapshot
and postmortem dump. Also, the size of each instruction is
different on A3XX so remove the hard coded constants and
add a GPU specific size variable.
This GPU has a larger instruction store, so more memory
needs to be reserved for saving shader state when context
switching.
The initial vertex and pixel partitioning of the
instruction store also needs to be different.
Write the retired timestamp into the expected location. This fixes
userspace crashes after resume when the retired timestamp is read
as 0 instead of the expected last timestamp.
The timestamp memqueue was unsorted, which could cause
memory to not be freed soon enough. The kgsl_event
list is sorted and does almost exactly the same thing
as the memqueue did, so freememontimestamp is now
implemented using the kgsl_event list.
Events need to be cancelled when an fd is released,
to avoid possible memory leaks or use after free.
When the event is cancelled, its callback is called.
Currently this is sufficient since events are used for
resource management and we have no option but to
release the lock or memory. If future uses need to
distinguish between the callback firing and
a cancel, they can look at the timestamp passed to
the callback, which will be before the timestamp they
expected. Otherwise a separate cancel callback can
be added.
There are a some workloads where interrupts do not
always get generated, and as a result the timestamp
work was not triggered often enough.
Queue timestamp expired work from adreno_waittimestamp(),
when the timestamp expires while we are not waiting.
It is possible in this case that no interrupt fired
because no processes were waiting.
Queue timestamp expired work when freememontimestamp
is called, which reduces the amount of memory
built up by applications that use this api often.
Ion carveout and content protect heap buffers do not
have a struct page associated with them. Thus
sg_phys() will not work reliably on these buffers.
Set the dma_address field on physically contiguous
buffers. When mapping a scatterlist to the gpummu
use sg_dma_address() first and if it returns 0
then use sg_phys().
msm: kgsl: Use kzalloc to allocate scatterlists of 1 page or less
The majority of the scatterlist allocations used in KGSL are under 1
page (1 page of struct scatterlist is approximately 1024 entries
equalling 4MB of allocated buffer). In these cases using vmalloc
for the sglist is undesirable and slow. Add functions to check the
size of the allocation and favor kzalloc for 1 page allocations and
vmalloc for larger lists.
For dma_alloc_coherent() you don't need writel/readl because
it's just a plain old void *. Linux tries very hard to make a
distinction between io memory (void __iomem *) and memory
(void *) so that drivers are portable to architectures that
don't have a way to access registers via pointer dereferences.
You can see http://lwn.net/Articles/102232/ and the Linus rant
http://lwn.net/Articles/102240/ here for more details behind
the motivation.
msm: kgsl: Allocate physical pages instead of using vmalloc
Replace vmalloc allocation with physical page allocation. For most
allocations we do not need a kernel virual address. vmalloc uses up
the kernel virtual address space. By replacing vmalloc with physical
page alloction and mapping that allocation to kernel space only
when it is required prevents the kgsl driver from using unnecessary
vmalloc virtual space.
kmalloc allocates physically contiguous memory and
may fail for larger allocations due to fragmentation.
The large allocations are caused by the fact that the
scatterlist structure is 24 bytes and the array size
is proportional to the number of pages being mapped.
The tools that process cff dumps expect a linear
memory region, but the start address of that region can
be configured. As long as there is only a single
pagetable (so that there aren't duplicate virtual
addresses in the dump), dumps captured with the
mmu on are easier to deal with than reconfiguring
to turn the mmu off.
Make the framework for reporting per-process memory statistics a little bit
more generic. This should make it easier to keep track of more external
memory sources as they are added.
Certain memory allocations are not properly tracked by kmemleak tool,
which makes it to incorrectly detect memory leak. Notify the tool by using
kmemleak_not_leak() to ignore the memory allocation so that incorrect leaks
report are avoided.
Poking once during adreno_idle is not enough; a GPU hang may still happen.
Seen on 7x27A. Write a few times during the wait timeout, to ensure that
the WPTR is updated properly.
The existing timestamp_cmp function returns a different
result depending on the order of the input parameters due to
having an asymetric valid window. When no rollover is
detected the window is 2^31 but when a rollover is detected
the window is 25000. This change makes the rollover window
symmetric at 2^31.
This function was incorrectly reporting hangs when an
error such as ERESTARTSYS was returned by
__wait_event_interruptible_timeout().
msm: kgsl: Make sure WPTR reg is updated properly
Sometimes writes to WPTR register do not take effect, causing a
3D core hang. Make sure the WPTR is updated properly when waiting.
msm: kgsl: Set default value of wait_timeout in the adreno_dev struct
Set the initalization value of wait_timeout at compile time in the
declaration of the adreno_device struct instead of at runtime in
adreno_probe.
This function is supposed to return the memdesc that
contains the range gpuaddr to gpuaddr + size. One of the
lookups was using sizeof(unsigned int) instead of size,
which could cause false positive results from this function
and possibly kernel panics in the snapshot or postmortem
code, which rely on it to do bounds checking for them.
Some hangs are fooling the postmortem dump code into
running off the end of a buffer. Fix this by making
its bounds check logic work better by reusing the
logic from kgsl_find_region().