Porcupine Programmer: db

Showing posts with label db. Show all posts

Saturday, January 25, 2014

Offline mode in Android apps, part 3 - old db schemas

The first post in this series explained the first consequence on implementing the offline mode - performing the data migrations. In second part I showed a workaround for the rudimentary SQLite's ALTER TABLE syntax. If you have checked the link to MigrationHelper class I mentioned, you migth have noticed that it's just a tiny part of a larger library, which allows you to define database schemas. Note the plural "schemas": the whole point of this library is defining both current schema and the schemas for the older versions of your app. This post explains why do you have to do this.

Let's say in the first version you have the following data structure:

public static class User {
  public long id;
  public String firstName;
  public String lastName;
  public String email;
}

And the table definition for this table in your SQLiteOpenHelper looks like this:

private static final String CREATE_TABLE_USERS = "CREATE TABLE " +
    TABLE_USERS +
    " ( " +
    ID + " INTEGER PRIMARY KEY AUTOINCREMENT " + ", " +
    FIRST_NAME + " TEXT " + ", " +
    LAST_NAME + " TEXT " + ", " +
    EMAIL + " TEXT " +
    " ) ";

In the next version you decide to keep only the first name in a single field, so you change your data structure accordingly and perform the data migration. In the snippet below I used the MigrationHelper, but you might have as well performed the migration by hand:

private static final String CREATE_TABLE_USERS = "CREATE TABLE " +
    TABLE_USERS +
    " ( " +
    ID + " INTEGER PRIMARY KEY AUTOINCREMENT " + ", " +
    NAME + " TEXT " + ", " +
    EMAIL + " TEXT " +
    " ) ";

@Override
public void onUpgrade(SQLiteDatabase db, int oldVersion, int newVersion) {
  MigrationsHelper helper = new MigrationsHelper();
  if (oldVersion < 2) {
    helper.performMigrations(db, 
        TableMigration.of(TABLE_USERS)
            .to(CREATE_TABLE_USERS)
            .withMapping(NAME, FIRST_NAME)
            .build()
    );
  }
}

Then you decide that the email field should be mandatory, so you change the schema and migrate the data again:

private static final String CREATE_TABLE_USERS = "CREATE TABLE " +
    TABLE_USERS +
    " ( " +
    ID + " INTEGER PRIMARY KEY AUTOINCREMENT " + ", " +
    NAME + " TEXT " + ", " +
    EMAIL + " TEXT NOT NULL" +
    " ) ";

@Override
public void onUpgrade(SQLiteDatabase db, int oldVersion, int newVersion) {
  MigrationsHelper helper = new MigrationsHelper();
  if (oldVersion < 2) {
    helper.performMigrations(db,
        TableMigration.of(TABLE_USERS)
            .to(CREATE_TABLE_USERS)
            .withMapping(NAME, FIRST_NAME)
            .build()
    );
  }
  if (oldVersion < 3) {
    db.execSQL("DELETE FROM " + TABLE_USERS + " WHERE " + EMAIL + " IS NULL");
    helper.performMigrations(db,
        TableMigration.of(TABLE_USERS)
            .to(CREATE_TABLE_USERS)
            .build()
    );
  }
}

The code looks fine, but you have just broken migrations from v1 to v3. If there is an user with a null email field, the app will crash in line 13 above. But why, shouldn't the email field in v2 schema be nullable? It should, but this migration uses the constant containing the latest schema definition with different column constraint.

The worst thing about this kind of bugs is that it might slip through your tests, because the crash happens only if you have a specific data before the application update.

You migth be tempted to define separate migrations from every old version to the latest one (in our case migrations from v1 to v3 and from v2 to v3) and always execute only single migration, but this workaround doesn't scale. For each new migration you'd have to check and potentially update every existing migration. When you publish the app twice a month, this quickly becomes a huge problem.

The other solution is to make every migration completely independent from the others, and execute them sequentially. This way, when you define a new migration, you don't have to worry about the previous ones. This means that when you upgrade from v1 to v3, you first upgrade from v1 to v2 and then from v2 to v2 and after the first step the database should be in the same state it were, when the v2 was the latest version. In other words, you have to keep an old database schemas.

As usual there are multiple ways to do this. You can copy the schema definition to another constant and append "ver#" suffix, but it means there will be a lot of duplicated code (although this code should never, ever change, so it's not as bad as the regular case of copypaste). The other way is to keep the initial database state and all the schema updates. The issue here is that you don't have a place in your code with current schema definition. The opposite solution is to keep the current schema and the list of downgrades. Sounds counterintuitive? Don't worry, that's because it *is* counterintuitive.

In android-schema-utils I've chosen the third approach, because in the long run it processes less data than the upgrades solution - in case of upgrade from vN-1 to vN it has to generate only 1 additional schema instead of N-1 schemas. I'm still not sure if the code wouldn't be clearer had I went with duplicated schema definitions approach, but the current approach, once you get used to it, works fine. The schema and migrations for our example would look like this:

private static final MigrationsHelper MIGRATIONS_HELPER = new MigrationsHelper();
private static final Schemas SCHEMAS = Schemas.Builder
    .currentSchema(3,
        new TableDefinition(TABLE_USERS,
            new AddColumn(ID, "INTEGER PRIMARY KEY AUTOINCREMENT"),
            new AddColumn(NAME, "TEXT"),
            new AddColumn(EMAIL, "TEXT NOT NULL")
        )
    )
    .upgradeTo(3,
        new SimpleMigration() {
          @Override
          public void apply(SQLiteDatabase db, Schema schema) {
            db.execSQL("DELETE FROM " + TABLE_USERS + " WHERE " + EMAIL + " IS NULL");
          }
        },
        auto()
    )
    .downgradeTo(2,
        new TableDowngrade(TABLE_USERS, new AddColumn(EMAIL, "TEXT"))
    )
    .upgradeTo(2,
        SimpleTableMigration.of(TABLE_USERS)
            .withMapping(NAME, FIRST_NAME)
            .using(MIGRATIONS_HELPER)
        )
    .downgradeTo(1,
        new TableDowngrade(TABLE_USERS,
            new AddColumn(FIRST_NAME, "TEXT"),
            new AddColumn(LAST_NAME, "TEXT"),
            new DropColumn(EMAIL)
        )
    )
    .build();

There are other benefits of keeping the old schemas in a more reasonable format than raw strings. Most of the schema migrations can be deducted from comparing subsequent schema versions, so you don't have to do it yourself. For example in migration from v2 to v3 I didn't have to specify that I want to migrate the Users table - the auto() migration automatically handles it. If the auto() is the only migration for a given upgrade, you can skip the whole upgradeTo() block. In our case that covered about 50% migrations, but YMMV.

If you go this way, your onUpgrade method, which usually is the most complex part of SQLiteOpenHelper, can be reduced to this:

@Override
public void onUpgrade(SQLiteDatabase db, int oldVersion, int newVersion) {
  SCHEMAS.upgrade(oldVersion, mContext, db);
}

This part concludes the "offline mode" series. Here's the short recap:

If you don't want to compromise on UX, your application should work regardless whether the user is connected to internet or not.
In this case the user may end up in a situation when the application is upgraded, but not all data is synced with the server yet. You *do not* want to lose your users' data. You'll have to migrate them.
If you migrate your data, you should keep the migrations separate from one another, because otherwise maintaining them becomes a nightmare.
The best way to do this that I know of, is keeping the old schemas and always performing all migrations sequentially. To make things simpler, I recommend the android-schema-utils library.

Friday, January 3, 2014

Offline mode in Android apps, part 2 - SQLite's ALTER TABLE

In first part of this series I showed that to implement offline mode in your Android app you have to implement data migrations. If you're using SQLite database, it means you'll have to use (or rather work around) it's ALTER TABLE syntax:

So all you can do with it is adding the column or renaming the table, but in reality you probably need to alter a single column, remove column or change the table constraints. You can achieve this by doing the following operation:

Rename the table T with old schema to old_T.
Create the table T with new schema.
Use "INSERT INTO T (new_columns) SELECT old_columns FROM old_T" query to populate the table T with the data from the renamed table old_T.
Drop old_T.

Doing it manually is quite error prone though: for every migration you have to specify the new_columns and old_columns list. What's worse, in 95% of cases you just want to list the columns common for old and new schema. Fortunately we can automate such trivial migrations by executing SELECT with LIMIT 0 (or PRAGMA TABLE_INFO) for both tables, getting the columns set using Cursor.getColumnNames(), and calculating these columns sets intersection.

You can write a nice wrapper for this yourself, but a) I already did it, so you don't have to and b) there is a very nasty gotcha which would probably cost you few hours of teeth grinding, so do yourself a favor and check this repository out, especially the MigrationsHelper class. It automates the trivial migrations and allows you to define a mappings for situations when you rename the column or add a non-nullable column in new schema.

In the next two posts I'll describe the gotcha I've mentioned in the previous paragraph and show some other non-obvious consequences of doing data migrations.

Thursday, December 26, 2013

Offline mode in Android apps, part 1 - data migrations

This year I gave a talk on Krakdroid conference about offline mode in Android apps. By offline mode I mean implementing the app such way that the network availability is completely transparent to the end users. The high level implementation idea is to decouple the operations changing the data from sending these changes through unreliable network by saving the changes in local database and sending them at convinient moment. We have encountered two major problems when we implemented this behavior in Base CRM: data migrations and identifying entities. This blog post describes the first issue.

It might not be obvious why do you need the data migrations, so let's clear this out. Let's say on your mobile you have some data synced with backend (green squares on left and right) and some unsynced data created locally on mobile (red squares on the left).

Now let's say that we introduce new functionality to our app, which changes the schema of our data models (the squares on the backend side are changed to circles).

The schema of the local database have to be changed as well. The naive way of handling this situation is dropping old database with old schema, creating new one with new schema and resyncing all the data from backend, but there are two issues with this approach: if there is a lot of data the resyncing might take a while, which negates the most important advantage of offline mode - that the app is fully functional all the time.

More serious issue is that dropping the old database means that the unsynced data will be dropped along with it.

The only way to provide the optimal user experience is to perform schema migrations locally for both synced and unsynced data:

Migrating the data doesn't sound like a challenging thing to code, but the combination of obscure SQLite and Android issues complicates the matter. Without proper tools it's quite easy to make your code unmaintainable in the long run. I'll describe this issues and our solutions in the further posts.

Wednesday, November 6, 2013

SQL injection through ContentProvider projection

The SQL injection through query parameters is the common security issue of any system using SQL database. Android is no different than any other system, so if you're using SQLite database in your Android app, you should always sanitize the database inputs.

Obligatory XKCD

If you are also using an exported ContentProvider, you need to take care of one more vector of attack: the projection parameter of the queries. Just like SQLiteDatabase, the ContentProvider allows the users to specify which columns they want to retrieve. It makes sense, because it reduces the amount of data fetched, which might improve performance and reduce the RAM footprint of your app. Unlike the SQLiteDatabase, the ContentProvider might be exported, which means that the external applications can query the data from it requesting an arbitrary projection, which are then turned into raw SQL queries. For example:

'Bobby Tables was here'; DROP TABLE Students; --
* FROM sqlite_master; --
* FROM non_public_table_I_found_out_about_using_previous_query; --

Basically it means that if you exposed a single uri without sanitizing the projection, you have exposed your entire db.

So how do you sanitize your projections? I've given it some thought and it seems that the only sensible thing to do is allowing only subsets of predefined set of columns.

You cannot allow any expression, because you'd allow any expressions, including SELECTs from other tables and allowing certain expressions is not a trivial task.

You shouldn't ignore the provided projection and return all columns, because one of the benefits of using projections is limiting the amount of data retrieved from database. Besides, certain widely used Google application ignores the existence of Cursor.getColumnIndex method and assumes that the columns will be returned in the same order they were specified in projection. The other app won't work correctly, and the users will probably blame you.

Sunday, November 3, 2013

Thneed, notes and db design

We're starting to find more and more interesting use cases for Thneed in Base CRM codebase. The first release using it, and few other libraries we recently developed, was released just before the Halloween and we haven't registered any critical issues related to it. All in all, the results look very promising. I won't recommend using Thneed in your production builds yet, but I urge you to star the project on Github and watch its progress.

The Thneed was created as an answer to some issues we faced when developing and maintaining Base CRM, and this fact is sometimes reflected by the API. The example of this is something we internally called PolyModels.

Let's start with a scenario, where we have a some objects and we'd like to add notes to. It's a classic one-to-many relationship, which I'd model with a foreign key in notes table:

CREATE TABLE some_entity (id INTEGER);
CREATE TABLE notes (
    id INTEGER, 
    some_entity_id INTEGER REFERENCES some_entity(id), 
    content TEXT
);

Now let's introduce another type of objects, which also can have notes attached to it. We have few options now. The simplest thing to do is to keep these notes in a completely separate table:

CREATE TABLE other_entity (id INTEGER);
CREATE TABLE other_enity_notes (
    id INTEGER, 
    other_entity_id INTEGER REFERENCES other_entity(id), 
    content TEXT
);

The issue with this solution is that we have two separate schemas that need to be updated in parallel, and in 95% of cases would be exactly the same. Another approach is making the objects which contain notes sort of inherit a base class:

CREATE TABLE notables (id INTEGER);
CREATE TABLE some_entity (id INTEGER, notable_id INTEGER REFERENCES notables(id));
CREATE TABLE other_entity (id INTEGER, notable_id INTEGER REFERENCES notables(id));
CREATE TABLE notes (
    id INTEGER, 
    notable_id INTEGER REFERENCES notables(id), 
    content TEXT
);

These two solutions work perfectly in the "give me all notes for object X" scenario, but it gets ugly if you want to display a single note with the simple "Associated with object X" info. In this case you have to query every model which can contain notes, to see if this particular association references the objects from this model. On top of that, the Noteable table approach requires some additional work to create the entry in

You can always have a several mutually exclusive foreing keys in your notes:

CREATE TABLE some_entity (id INTEGER);
CREATE TABLE other_entity (id INTEGER);
CREATE TABLE notes (
    id INTEGER, 
    some_entity_id INTEGER REFERENCES some_entity(id), 
    other_entity_id INTEGER REFERENCES other_entity(id), 
    content TEXT
);

But this solution doesn't really scale well as the number of the models which can contain notes increases. Also, your DBAs will love you if you go this way.

The solution to this problem we used in Base was to have two columns in Notes table: one holding the type of the "noteable" object, i.e. and the other for the id of this object:

CREATE TABLE some_entity (id INTEGER);
CREATE TABLE other_entity (id INTEGER);
CREATE TABLE notes (
    id INTEGER, 
    noteable_id INTEGER, 
    noteable_type TEXT, 
    content TEXT
);

The glaring issue with this approach is losing the consistency guarantee - no database I know of support this kind of foreign keys. But when you have SOA on the backend and the notes are stored in a separate database than the noteable objects, this is not your top concern. On mobile apps, even though we have a single database, we use the same structure, because all the other have some implementation issues and worse performance characteristics.

I'm not a db expert, and I haven't found any discussion of similar cases, which means that a) we're doing something very wrong or b) we have just very specific requirements. Let me know if it's a former case.

I needed to model this relationships in Thneed, which tured out to be quite tricky, but that's the topic for another blog post.