Creating Accurate Waterfall Diagrams for Android Traces

Share this post:

Sophisticated Android apps are all but impossible to write without some level of multi-threading and many threads frequently cooperate in any given process. This complex threading is often made simpler by using different tools. BugSnag’s Real User Monitoring solution includes waterfall diagrams, which show the breakdown of an operation into its dependent operations. Multi-threading presents a challenge here as the dependencies can cross threads. We can’t simply track things with a ThreadLocal as the program flow crosses threads, and used threads will likely, subsequently be reused for something else.

Spans, Contexts, and Threads

Spans are the primitives of performance measurements. Fundamentally, a span is a tagged start and end time. Each span belongs to a trace and can optionally have a “parent” span. The two fields identifying these relationships (parent span id and trace id) are a span’s “context”.

These two relationships allow the spans to be presented as a waterfall diagram. For example, here is a simple waterfall diagram of a trivial Activity being created:

This example trace includes 4 spans:

The Activity - MainActivity span is the root for this trace and encompasses the entire activity load.
ActivityCreate, ActivityStart and ActivityResume are sub-spans that measure the time taken for the onCreate, onStart, and onResume invocations respectively.

The “context” of the subspans was inherited from the parent; the trace id is the same subspans and parent, and the parent span id of the subspans is the span id of MainActivity.

BugSnag’s Real User Monitoring solution automatically tracks the local context of a span so as a developer, you don’t need to think about it too much. This is done by keeping a stack of contexts as a ThreadLocal. This works perfectly for single-threaded processes (such as creating an Activity), but anything non-trivial in Android requires multiple threads and, typically, coroutines. This introduces complexity because a trace might measure a single coroutine sequence that hops between several different threads, any of which may generate their own subspans.

Manually Tracking Span Context

We decided to take a bottom-up approach. Instead of trying to automate every possible thread model and toolset (Flow, Rx, Coroutines, Thread pools, etc.) used, we would instead start by building a way to manually control the context and then build out utility libraries on top of it.

Our SpanContext represents the context of a span, which says the id of its trace and its parent span (if any). The SpanContext.current static method returns the ThreadLocal SpanContext, which is the default for any new Span created using BugsnagPerformance.startSpan or measureSpan. The SpanContext object can be passed between threads safely, and a new Span can then be started “within” whatever context you like. A simple, but contrived example:

val parentContext = SpanContext.current

// other things happen that may change SpanContext.current

// childSpan will be a child of the span that was open when parentContext was captured
val childSpan = BugsnagPerformance.startSpan("child operation", SpanOptions.DEFAULTS.within(parentContext)) {
  // code to be measured goes here
}

Theoretically using this toolbox, you can construct spans hierarchies any way you like. But what you really want to do is to mirror the flow of the program, so here’s how we help with that.

Coroutines

With coroutines, flow can hop around threads, through arbitrarily complex nested layers of launch, async and withContext. We provide a BugsnagPerformanceScope class to keep your coroutine spans organized. When a class implements CoroutineScope byBugsnagPerformanceScope() the scope will marshal the SpanContext across threads and the Spans created from within the scope will automatically nest as you expect:

class DashboardViewModel : ViewModel(), CoroutineScope by BugsnagPerformanceScope() {
    // ...
    
    override fun onCleared() {
        super.onCleared()
        cancel()
    }
    
    private fun attemptLogin(email: String, password: String) {
        val loginSpan = BugsnagPerformance.startSpan("Login")
        launch {
            loginSpan.use {
                val loginResult = withContext(Dispatchers.IO) {
                    LoginApi.login(email, password)
                }
                
                // ...

The above code will automatically nest any spans logged by the LoginApi (such as the HTTP requests) as children of the Login span:

To allow fork/join coroutine behavior with async / awaitAll, we made SpanContext compatible with being added to a coroutine context. Just manually pass the current SpanContext when we “fork” the new tasks using async:

val rows = data
    .map { row ->
        async(SpanContext.current + Dispatchers.Default) {
            renderRow(row)
        }
    }
    .awaitAll()

This will yield the following waterfall:

It’s worth changing the custom nested spans to be non “first-class” (so that they don’t clutter your project dashboard):

val NESTED_SPAN = SpanOptions.DEFAULTS
    .setFirstClass(false)
    
private fun renderRow(row: RowInput): RowRenderResult {
    BugsnagPerformance.startSpan("RenderRow", NESTED_SPAN).use {
        // ...

Thread Pools

Coroutines are the async / multithreaded tool of choice in Kotlin. But sometimes one must make use of a Thread Pool instead, especially when using Java libraries not focused on Android. Don’t worry – we’ve got you covered!

We have a couple of ThreadPoolExecutor implementations (ContextAwareThreadPoolExecutor and ScheduledThreadPoolExecutor) which automatically carry the context onto the tasks that are submitted for execution.

val parentSpan = BugsnagPerformance.startSpan("parentSpan")

val executor = ContextAwareThreadPoolExecutor(2, 4, 10, TimeUnit.SECONDS, ArrayBlockingQueue(10))
executor.submit {
    val childSpanOnAnotherThread = BugsnagPerformance.startSpan("childSpanOnAnotherThread")
}
val childSpanOnOriginalThread = BugsnagPerformance.startSpan("childSpanOnOriginalThread")

In the above example, both childSpanOnAnotherThread and childSpanOnOriginalThread will be children of parentSpan.

Runnables and Callables

You can also wrap any Runnable and Callable objects in the SpanContext you want them to run in:

val parentSpan = BugsnagPerformance.startSpan("parentSpan")

mainHandler.post(SpanContext.current.wrap(Runnable {
  // some work to do on the main thread
  val childSpan = BugsnagPerformance.startSpan("childSpan")
}))

Here, childSpan will be a child of parentSpan, despite running on a different thread.

Conclusion

By starting with a lightweight notion of “context” we can easily and safely allow the propagation of span parenthood through traces, providing an intuitive representation of an often-complex, multi-threaded program flow in our waterfall diagrams.

This, coupled with helpers for the various standard ways operations can flow across threads in the Android app, gives the perfect balance of flexibility and ease of use. For the full documentation on the ability to maintain context in Android traces, see our docs. Want to try it for yourself? Try BugSnag free for 14 days, no credit card required.

BugSnag helps you prioritize and fix software bugs while improving your application stability

Request a demo

Creating Accurate Waterfall Diagrams for Android Traces

Spans, Contexts, and Threads

Manually Tracking Span Context

Coroutines

Thread Pools

Runnables and Callables

Conclusion

More from BugSnag

Understanding Server-Side Performance: Your Comprehensive Guide

jQuery is not defined: Common causes and a simple solution

Rapid Incident Response: How to Minimize Downtime in Production