Zoom Icon

Debugging Deadlocks on Android

From UIC Archive

Debugging Deadlocks on Android

Contents


Debugging Deadlocks on Android
Author: Quequero
Email: Que addr.gif
Website: http://quequero.org
Date: 30/01/2012 (dd/mm/yyyy)
Level: Working brain required
Language: English Flag English.gif
Comments: This is just a personal memo, anyway feel free to send me suggestions and corrections



Introduction

Deadlocks are well known to be difficult to debug, especially when you're dealing with a large amount of concurrent code. Fortunately on Android we have some tools that are more than useful to detect where a deadlock is happening.



Debugging Deadlocks

Probably in your developer life you'll end up dealing with deadlocks at least a couple (hundred thousand) times, while a deadlock-situation might not be that hard to detect, sometimes it can be hard to detect where the deadlock happened. This usually lead to questions like "where? and how do I trace it?", so you start writing a new class, if you didn't do that already, to track resource allocation and hunt the bug, a very time consuming process, especially if you're in a rush. Let's see what to do if you're developing on Android platform.

A few days ago I've found that the release version of a program I was writing had a deadlock that wasn't present in the debug version. Quite strange if you consider that the only difference were some log.d(). What to do?


Deadlock Identification

A deadlock wasn't my first guess, so I started checking again the code to see if I was missing something obvious. Eventually, after ruling out any other possibility I had to tell myself that yep, it was a damn deadlock. How to identify it? It's pretty easy, the only thing you need is a rooted phone: first of all deploy your application on the device, run it, wait for the deadlock to happen and then connect to the device using ADB:

c:\> adb shell $ su

Get a list of all processes running on the system and take note of the PID that belongs to your process:

  1. ps

app_59 645 94 137584 18308 ffffffff afd0c52c S com.sec.android.app.samsungapps.una2 app_71 745 94 141712 21352 ffffffff afd0c52c S com.que.gpslogger app_41 772 94 136820 16276 ffffffff afd0c52c S com.sec.android.widgetapp.clockweather.clockwidget

In my case the PID was 745, kill it using the -3 as argument:

  1. kill -3 745

This command creates a trace file into /data/anr/, on some Android versions you'll have to create the directory by yourself otherwise the command won't produce any output file.

  1. cd /data/anr/
  2. ls -l

ls -l -rw-rw-rw- app_71 app_71 2237 2012-01-27 12:06 traces.txt -rw-rw-rw- root root 4364 2012-01-27 12:33 traces.txt.bugreport

At this point all you have to do is pull the files and examine them:

c:\> adb pull /data/anr/traces.txt

Opening the file you'll see something like this:

"Thread-23" daemon prio=5 tid=22 NATIVE

 | group="main" sCount=1 dsCount=0 obj=0x4056fbc0 self=0x1b5da0
 | sysTid=764 nice=0 sched=0/0 cgrp=bg_non_interactive handle=1793248
 at android.os.MessageQueue.nativePollOnce(Native Method)
 at android.os.MessageQueue.next(MessageQueue.java:119)
 at android.os.Looper.loop(Looper.java:117)
 at com.que.gpslogger.module.b.a.run((null):-1)

"Thread-22" prio=5 tid=21 TIMED_WAIT

 | group="main" sCount=1 dsCount=0 obj=0x40573d58 self=0x1b6768
 | sysTid=765 nice=0 sched=0/0 cgrp=bg_non_interactive handle=1796256
 at java.lang.Object.wait(Native Method)
 - waiting on <0x40573cf0> (a com.que.gpslogger.d.q)
 at java.lang.Object.wait(Object.java:395)
 at com.que.gpslogger.a.run((null):-1)
 at java.lang.Thread.run(Thread.java:1019)

"Thread-21" prio=5 tid=20 TIMED_WAIT

 | group="main" sCount=1 dsCount=0 obj=0x4056e678 self=0x1b4b18
 | sysTid=763 nice=0 sched=0/0 cgrp=bg_non_interactive handle=1789008
 at java.lang.Object.wait(Native Method)
 - waiting on <0x4056e5c0> (a com.que.gpslogger.d.n)
 at java.lang.Object.wait(Object.java:395)
 at com.que.gpslogger.a.run((null):-1)
 at java.lang.Thread.run(Thread.java:1019)
 

"Thread-19" prio=5 tid=18 WAIT

 | group="main" sCount=1 dsCount=0 obj=0x4056d860 self=0x1b3f70
 | sysTid=761 nice=0 sched=0/0 cgrp=bg_non_interactive handle=1786024
 at java.lang.Object.wait(Native Method)
 - waiting on <0x40573458> (a java.lang.VMThread)
 at java.lang.Object.wait(Object.java:358)
 at java.lang.Thread.join(Thread.java:914)
 at com.que.gpslogger.d.k.c((null):-1)
 at com.que.gpslogger.a.run((null):-1)
 at java.lang.Thread.run(Thread.java:1019)

This is just an extract of the original file containing the stack trace for every thread associated to your process. Did you notice anything? Methods related to your core will probably be obfuscated by ProGuard: com.que.gpslogger.d.k.c, so it is really hard to understand what a give thread is doing. Fortunately we have a nice tool into the Android SDK called retrace.bat on Windows and retrace.sh on Linux, this tool is able to convert the obfuscated trace into a normal one, the only think you'll have to provide is the mapping.txt file, you can find it into your project directory: <project_root>/bin/proguard/mapping.txt, this file contains the mapping between the normal and obfuscated code:

com.que.gpslogger.Data -> com.que.gpslogger.h:

   int type -> a
   int size -> b
   int accuracy -> c
   long time -> d
   long getType() -> a
   void setTime(int) -> d

We can easily proceed to code-deobfuscation simply running this script from its own directory:

c:\Android-sdk-windows\tools\proguard\bin>retrace.bat -verbose C:\Projects\QueGPS\bin\proguard\mapping.txt C:\Projects\QueGPS\bin\proguard\traces.txt > C:\Projects\QueGPS\bin\proguard\deobfuscated.txt

After this step we'll be able to examine our trace without any obfuscation, so open deobfuscated.txt and look for any thread that's in WAIT or MONITOR state:

"Thread-23" daemon prio=5 tid=22 NATIVE

 | group="main" sCount=1 dsCount=0 obj=0x4056fbc0 self=0x1b5da0
 | sysTid=764 nice=0 sched=0/0 cgrp=bg_non_interactive handle=1793248
 at android.os.MessageQueue.nativePollOnce(Native Method)
 at android.os.MessageQueue.next(MessageQueue.java:119)
 at android.os.Looper.loop(Looper.java:117)
 at com.que.gpslogger.module.position.GPS.void run()((null):-1)

"Thread-22" prio=5 tid=21 TIMED_WAIT

 | group="main" sCount=1 dsCount=0 obj=0x40573d58 self=0x1b6768
 | sysTid=765 nice=0 sched=0/0 cgrp=bg_non_interactive handle=1796256
 at java.lang.Object.wait(Native Method)
 - waiting on <0x40573cf0> (a com.que.gpslogger.d.q)
 at java.lang.Object.wait(Object.java:395)
 at com.que.gpslogger.Main.void run()((null):-1)
 at java.lang.Thread.run(Thread.java:1019)
 

"Thread-21" prio=5 tid=20 TIMED_WAIT

 | group="main" sCount=1 dsCount=0 obj=0x4056e678 self=0x1b4b18
 | sysTid=763 nice=0 sched=0/0 cgrp=bg_non_interactive handle=1789008
 at java.lang.Object.wait(Native Method)
 - waiting on <0x4056e5c0> (a com.que.gpslogger.d.n)
 at java.lang.Object.wait(Object.java:395)
 at com.que.gpslogger.Main.void run()((null):-1)
 at java.lang.Thread.run(Thread.java:1019)
 

"Thread-19" prio=5 tid=18 WAIT

 | group="main" sCount=1 dsCount=0 obj=0x4056d860 self=0x1b3f70
 | sysTid=761 nice=0 sched=0/0 cgrp=bg_non_interactive handle=1786024
 at java.lang.Object.wait(Native Method)
 - waiting on <0x40573458> (a java.lang.VMThread)
 at java.lang.Object.wait(Object.java:358)
 at java.lang.Thread.join(Thread.java:914)
 at com.que.gpslogger.alert.Notification.void stopNotification()((null):-1)
 at com.que.gpslogger.Main.void run()((null):-1)
 at java.lang.Thread.run(Thread.java:1019)

Look for any thread in WAIT state and check if it has to be that state or not, in my case the Thread-19 is blocked in a join() call, something that doesn't have to happen so there it is my problem.

"Thread-19" prio=5 tid=18 WAIT // It is waiting

 - waiting on <0x40573458> (a java.lang.VMThread) // Why threaddy are you here?
 at java.lang.Object.wait(Object.java:358)
 at java.lang.Thread.join(Thread.java:914)
 at com.que.gpslogger.alert.Notification.void stopNotification()((null):-1)
 at com.que.gpslogger.Main.void run()((null):-1)
 at java.lang.Thread.run(Thread.java:1019)

In the end I forgot to close a Looper, or better, the call to close the looper was into the debugging code that was stripped out when compiling in release, that's the reason the debug version was behaving correctly. I hope this short memo will help you save some time :).

Quequero


Disclaimer

I documenti qui pubblicati sono da considerarsi pubblici e liberamente distribuibili, a patto che se ne citi la fonte di provenienza. Tutti i documenti presenti su queste pagine sono stati scritti esclusivamente a scopo di ricerca, nessuna di queste analisi è stata fatta per fini commerciali, o dietro alcun tipo di compenso. I documenti pubblicati presentano delle analisi puramente teoriche della struttura di un programma, in nessun caso il software è stato realmente disassemblato o modificato; ogni corrispondenza presente tra i documenti pubblicati e le istruzioni del software oggetto dell'analisi, è da ritenersi puramente casuale. Tutti i documenti vengono inviati in forma anonima ed automaticamente pubblicati, i diritti di tali opere appartengono esclusivamente al firmatario del documento (se presente), in nessun caso il gestore di questo sito, o del server su cui risiede, può essere ritenuto responsabile dei contenuti qui presenti, oltretutto il gestore del sito non è in grado di risalire all'identità del mittente dei documenti. Tutti i documenti ed i file di questo sito non presentano alcun tipo di garanzia, pertanto ne è sconsigliata a tutti la lettura o l'esecuzione, lo staff non si assume alcuna responsabilità per quanto riguarda l'uso improprio di tali documenti e/o file, è doveroso aggiungere che ogni riferimento a fatti cose o persone è da considerarsi PURAMENTE casuale. Tutti coloro che potrebbero ritenersi moralmente offesi dai contenuti di queste pagine, sono tenuti ad uscire immediatamente da questo sito.

Vogliamo inoltre ricordare che il Reverse Engineering è uno strumento tecnologico di grande potenza ed importanza, senza di esso non sarebbe possibile creare antivirus, scoprire funzioni malevole e non dichiarate all'interno di un programma di pubblico utilizzo. Non sarebbe possibile scoprire, in assenza di un sistema sicuro per il controllo dell'integrità, se il "tal" programma è realmente quello che l'utente ha scelto di installare ed eseguire, né sarebbe possibile continuare lo sviluppo di quei programmi (o l'utilizzo di quelle periferiche) ritenuti obsoleti e non più supportati dalle fonti ufficiali.