gpu: nvgpu: cancel job clean up before aborting channel
Deepak Nibade [Fri, 1 Jul 2016 07:05:27 +0000 (12:05 +0530)]
It is possible that when we abort the channel, we have
job clean up worker running, which could race with abort
and sometimes result in below panic

[  245.483566] Unable to handle kernel paging request at virtual address
800000000
...
[  245.548991] PC is at gk20a_channel_abort_clean_up+0xb8/0x140
[  245.554683] LR is at gk20a_channel_abort_clean_up+0xac/0x140
...
[  247.301860] [<ffffffc000479390>]
gk20a_channel_abort_clean_up+0xb8/0x140
[  247.312853] [<ffffffc0004794d4>] gk20a_channel_abort+0xbc/0xc8
[  247.322970] [<ffffffc0004794f8>] gk20a_disable_channel+0x18/0x30
[  247.333267] [<ffffffc000479628>] gk20a_free_channel+0x118/0x584
[  247.343473] [<ffffffc000479aa0>] gk20a_channel_close+0xc/0x14
[  247.353479] [<ffffffc000479b80>] gk20a_channel_release+0xd8/0x104

Fix this by cancelling the job clean up worker before aborting
the channel

Bug 1777281
Bug 200209467

Change-Id: Ic24c7c03b27cfb5cd164a52efdb1e2813a41a10a
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/1174416
(cherry picked from commit 1002f40a3bb54db6e40be77b836437ccb2f3aa96)
Reviewed-on: http://git-master/r/1190946
GVS: Gerrit_Virtual_Submit
Reviewed-by: Bharat Nihalani <bnihalani@nvidia.com>
Tested-by: Bharat Nihalani <bnihalani@nvidia.com>

drivers/gpu/nvgpu/gk20a/channel_gk20a.c

index fad2063..daf5984 100644 (file)
@@ -1,7 +1,7 @@
 /*
  * GK20A Graphics channel
  *
- * Copyright (c) 2011-2015, NVIDIA CORPORATION.  All rights reserved.
+ * Copyright (c) 2011-2016, NVIDIA CORPORATION.  All rights reserved.
  *
  * This program is free software; you can redistribute it and/or modify it
  * under the terms and conditions of the GNU General Public License,
@@ -407,6 +407,8 @@ void gk20a_channel_abort(struct channel_gk20a *ch, bool channel_preempt)
        if (channel_preempt)
                gk20a_fifo_preempt(ch->g, ch);
 
+       gk20a_channel_cancel_job_clean_up(ch, true);
+
        /* ensure no fences are pending */
        mutex_lock(&ch->sync_lock);
        if (ch->sync)