This GPU has a larger instruction store, so more memory
needs to be reserved for saving shader state when context
switching.
The initial vertex and pixel partitioning of the
instruction store also needs to be different.
Write the retired timestamp into the expected location. This fixes
userspace crashes after resume when the retired timestamp is read
as 0 instead of the expected last timestamp.
The timestamp memqueue was unsorted, which could cause
memory to not be freed soon enough. The kgsl_event
list is sorted and does almost exactly the same thing
as the memqueue did, so freememontimestamp is now
implemented using the kgsl_event list.
Events need to be cancelled when an fd is released,
to avoid possible memory leaks or use after free.
When the event is cancelled, its callback is called.
Currently this is sufficient since events are used for
resource management and we have no option but to
release the lock or memory. If future uses need to
distinguish between the callback firing and
a cancel, they can look at the timestamp passed to
the callback, which will be before the timestamp they
expected. Otherwise a separate cancel callback can
be added.
There are a some workloads where interrupts do not
always get generated, and as a result the timestamp
work was not triggered often enough.
Queue timestamp expired work from adreno_waittimestamp(),
when the timestamp expires while we are not waiting.
It is possible in this case that no interrupt fired
because no processes were waiting.
Queue timestamp expired work when freememontimestamp
is called, which reduces the amount of memory
built up by applications that use this api often.
Ion carveout and content protect heap buffers do not
have a struct page associated with them. Thus
sg_phys() will not work reliably on these buffers.
Set the dma_address field on physically contiguous
buffers. When mapping a scatterlist to the gpummu
use sg_dma_address() first and if it returns 0
then use sg_phys().
msm: kgsl: Use kzalloc to allocate scatterlists of 1 page or less
The majority of the scatterlist allocations used in KGSL are under 1
page (1 page of struct scatterlist is approximately 1024 entries
equalling 4MB of allocated buffer). In these cases using vmalloc
for the sglist is undesirable and slow. Add functions to check the
size of the allocation and favor kzalloc for 1 page allocations and
vmalloc for larger lists.
For dma_alloc_coherent() you don't need writel/readl because
it's just a plain old void *. Linux tries very hard to make a
distinction between io memory (void __iomem *) and memory
(void *) so that drivers are portable to architectures that
don't have a way to access registers via pointer dereferences.
You can see http://lwn.net/Articles/102232/ and the Linus rant
http://lwn.net/Articles/102240/ here for more details behind
the motivation.
msm: kgsl: Allocate physical pages instead of using vmalloc
Replace vmalloc allocation with physical page allocation. For most
allocations we do not need a kernel virual address. vmalloc uses up
the kernel virtual address space. By replacing vmalloc with physical
page alloction and mapping that allocation to kernel space only
when it is required prevents the kgsl driver from using unnecessary
vmalloc virtual space.
kmalloc allocates physically contiguous memory and
may fail for larger allocations due to fragmentation.
The large allocations are caused by the fact that the
scatterlist structure is 24 bytes and the array size
is proportional to the number of pages being mapped.
The tools that process cff dumps expect a linear
memory region, but the start address of that region can
be configured. As long as there is only a single
pagetable (so that there aren't duplicate virtual
addresses in the dump), dumps captured with the
mmu on are easier to deal with than reconfiguring
to turn the mmu off.
Make the framework for reporting per-process memory statistics a little bit
more generic. This should make it easier to keep track of more external
memory sources as they are added.
Certain memory allocations are not properly tracked by kmemleak tool,
which makes it to incorrectly detect memory leak. Notify the tool by using
kmemleak_not_leak() to ignore the memory allocation so that incorrect leaks
report are avoided.
Poking once during adreno_idle is not enough; a GPU hang may still happen.
Seen on 7x27A. Write a few times during the wait timeout, to ensure that
the WPTR is updated properly.
The existing timestamp_cmp function returns a different
result depending on the order of the input parameters due to
having an asymetric valid window. When no rollover is
detected the window is 2^31 but when a rollover is detected
the window is 25000. This change makes the rollover window
symmetric at 2^31.
This function was incorrectly reporting hangs when an
error such as ERESTARTSYS was returned by
__wait_event_interruptible_timeout().
msm: kgsl: Make sure WPTR reg is updated properly
Sometimes writes to WPTR register do not take effect, causing a
3D core hang. Make sure the WPTR is updated properly when waiting.
msm: kgsl: Set default value of wait_timeout in the adreno_dev struct
Set the initalization value of wait_timeout at compile time in the
declaration of the adreno_device struct instead of at runtime in
adreno_probe.
This function is supposed to return the memdesc that
contains the range gpuaddr to gpuaddr + size. One of the
lookups was using sizeof(unsigned int) instead of size,
which could cause false positive results from this function
and possibly kernel panics in the snapshot or postmortem
code, which rely on it to do bounds checking for them.
Some hangs are fooling the postmortem dump code into
running off the end of a buffer. Fix this by making
its bounds check logic work better by reusing the
logic from kgsl_find_region().
Userspace will set a flag in the context if preambles are in use. If
they are, we can safely skip save and restore commands for the
context. GMEM save/restore is still required. To improve performance,
preamble commands are skipped when the context hasn't changed since
the last issueibcmds.
from Code Aurora