(lldb) thread step-out

Amazon Kindle: iOS App Reverse Engineering for eBooks Leaking

Disclaimer: this was written back in February 2020.

I read a lot, and love non-fiction. But when it comets to ebooks, I prefer native Books.app of iOS - I got used to its controls, animations, and feeling of a "one-stop shop" for all the books I've read lately. So, every time I happen to buy a ebook, first thing I look for is - how can this book be imported into Books.app?

After easy success with Alpina.Books app, I've decided to check out the state of the art - ebooks protection in Amazon Kindle application for iOS.

File System Artefacts

Similarly, analyst starts with pulling app data from iOS device onto my mac. As one can remember from previous blogpost, apps have two readable/writable directories to store data produced during app runtime: directory of the app itself and AppGroups directories for sharing data between multiple apps (or app extensions) of one developer. Access to AppGroups is managed by system AppleMobileFileIntegrity.kext according to entitlements that are baked into code signature. In order to know which AppGroups the app is allowed access to, we too can read entitlements of the bundle.

Fetching app bundle is not a big issue: run Filza file manager on a jailbroken device and find the bundle under /var/containers/Bundle/Application/, and transfer it to the mac via SSH using scp command. A famous ldid tool helps us to dump entitlements of the bundle:

Okay, here we see that app has access to AppGroup with identifier group.com.amazon.Lassen and a similarly named shared Keychain. Unfortunately, in my case directory of this group was empty :)

On the other hand, working directory of the app itself has a lot of stuff and even a folder called eBooks with a number of subfolders:

There's also an SQL database at Library/Preferences/BookData.sqlite with a table named ZBOOK, where each book has a familiar title in ZSORTTITLE and a local URL in ZPATH:

Open one of these URLs and check the contents, and make sure ZMIMETYPE value application/x-kfx-ebook did not lie - the book contents are stored in a proprietary Amazon format with a built-in DRM (witch at the time of writing is not publicly broken):

Okay, this looks like a dead end for me - I'm not a cryptography and DRM expert. Let's try to understand how the app itself decrypts these files maybe?

Static Analysis

All executable binaries of AppStore apps are encrypted and signed - literally part of the Mach-O binary is encrypted and is decrypted only in memory during app launch by launchd system daemon. Hence the easiest way for us to obtain a decrypted binary (that we'll be able to disassemble to peek at its code) is to dump the decrypted part of it from device memory after launch and replace the encrypted chunk in the binary that we've already downloaded to the Mac. Additionally, if we'll want to run this binary later, we need to modify Mach-O launch commands (that sit in the beginning of the binary and explain to the system how this binary should be launched) to say that size of encrypted part is 0. This process is explained in more detail in one of my cards.

For this part we'll need a decrypted binary of the app. First of all, lets run a classic tool class-dump that will read appropriate section of the binary and re-create headers for Objective-C classes mentioned there:abjurato@Macintosh Desktop % ./class-dump -H -o <OURPUT_PATH> <PATH_TO_BINARY>

Okay, good news is that the app is written in Objective-C, but that's a lot of files. Which should we pay attention to? Let's open the binary in Hopper Disassembler and try to find some object or method with a hit in its name, like this one:

Seemingly, object KfxBookBundle represents book at some level of abstraction, let's check it's headers dump: it has a property called allPieces. If we'll run the app on device with a debugger connected, put a symbolic breakpoint at -[KfxBook initWithBundle:] and try to open one of the books, we'll see that this property contains an array of KfxBookPiece objects:

Looking at the code of KfxBookBundle constructor, we can notice that it reads BookManifest.kfx file as a CoreData database with a model named similarly - KfxBookBundle:

Let's try to change .kfx extension to .sqlite in one of the files we've obtained from from application working directory and open it. Looks like it's just a manifest tying together all files in a bundle:
After hours of debugging and reading assembly, I've bumped into non-Objective-C and non-Swift code seemingly responsible for reading the contents of files and settings up environment to decrypt parts of the book file on-demand. Probably, this code is a library shared across all different Kindle readers - from iOS to Android to Amazon devices - and is linked statically into the iOS binary.

Dynamic Analysis

Let's go opposite way - from UI down to business logic. In the end of the day, the text is somehow rendered on the screen! I'll omit the details of connecting debugger to the application - the manual process is complicated but Frida 'just works'.

After launching the app and opening a book, let's stop the process and ask lldb which UIViewController is currently presented:(lldb) po [[[[UIApplication sharedApplication] keyWindow] rootViewController] presentedViewController]<ReaderViewController: 0x1050ca600>

Looking at endless headers of this file, we notice ReaderModel reference, and ReaderModel object has a reference to BookMainData which inherits from NSManagedObject, whose values we've seen in ZBOOK table of the main database:

This binary is a “stripped” one, so we can not use names of methods and classes to set debugger breakpoints. We'll have to call Objective-C runtime objc_getClass and class_getMothodImplementation functions to define the addresses of methods of interest in the process memory instead. Setting a breakpoint in openBookWithBookId method of ReaderViewController looks like this:

From the ReaderViewController header we can see that it has references to ReaderItem and KindleDocument objects:

Where _barePronter looks interesting. Disassembler shows that a property with this names it used in -[BookTextExtractor textInfoForKindleDocument: forBookPositionRange] method:
And if we'll add a breakpoint in the constructor of -[BookTextExtractorInfo initWithText:andWordCount:] and read the argument value, we'll see a chunk of text that presents on the screen:

Can we trick the caller into passing there a longer chunk? After some time I've found a sibling method -[BookTextExtractor textInfoForBookPositionRange:usingIterator:] that is called in order to determine what is the language of the text on the screen and set up a build-in Google translator:

This basically means that we only need to create a BookPositionRange object with valid start and end BookPosition values and pass to this method - and get all the text between these positions (later I'll notice that this text lacks any formatting).

Digging deeper, we'll notice that ReaderViewController has a reference to KfxDocViewController with a reference to KfxPagePreviewModel which knows the positions of the beginnings of all chapters:

Such way, a sequence of lldb commands to dump raw text of a chapter will look like this:

Done!

Summary

Hooray? Probably. The process of breakpoints creation and chapters enumeration may be automated using lldb build-in Python scripting capability, but the lack of formatting is a problem to stay - sentences of dialogues are all merged together, and titles are stuck to the paragraphs, and paragraphs are not separated. I personally was not satisfied with this output and resorted to use the Kindle app for ebooks bought from Amazon.


Sources and Tools

[0] Inside Code Signing

[1] AppleMobileFileIntegrity

[2] About ldid tool for codesigning

[3] Building a class-dump in 2020

[4] Hopper Disassembler

[5] Frida for dynamic analysis

[6] Setting a breakpoint in a stripped binary with LLDB

[7] LLDB - Python Scripting


11/10/2020