Porcupine Programmer: SQLite

Showing posts with label SQLite. Show all posts

Saturday, January 25, 2014

Offline mode in Android apps, part 3 - old db schemas

The first post in this series explained the first consequence on implementing the offline mode - performing the data migrations. In second part I showed a workaround for the rudimentary SQLite's ALTER TABLE syntax. If you have checked the link to MigrationHelper class I mentioned, you migth have noticed that it's just a tiny part of a larger library, which allows you to define database schemas. Note the plural "schemas": the whole point of this library is defining both current schema and the schemas for the older versions of your app. This post explains why do you have to do this.

Let's say in the first version you have the following data structure:

public static class User {
  public long id;
  public String firstName;
  public String lastName;
  public String email;
}

And the table definition for this table in your SQLiteOpenHelper looks like this:

private static final String CREATE_TABLE_USERS = "CREATE TABLE " +
    TABLE_USERS +
    " ( " +
    ID + " INTEGER PRIMARY KEY AUTOINCREMENT " + ", " +
    FIRST_NAME + " TEXT " + ", " +
    LAST_NAME + " TEXT " + ", " +
    EMAIL + " TEXT " +
    " ) ";

In the next version you decide to keep only the first name in a single field, so you change your data structure accordingly and perform the data migration. In the snippet below I used the MigrationHelper, but you might have as well performed the migration by hand:

private static final String CREATE_TABLE_USERS = "CREATE TABLE " +
    TABLE_USERS +
    " ( " +
    ID + " INTEGER PRIMARY KEY AUTOINCREMENT " + ", " +
    NAME + " TEXT " + ", " +
    EMAIL + " TEXT " +
    " ) ";

@Override
public void onUpgrade(SQLiteDatabase db, int oldVersion, int newVersion) {
  MigrationsHelper helper = new MigrationsHelper();
  if (oldVersion < 2) {
    helper.performMigrations(db, 
        TableMigration.of(TABLE_USERS)
            .to(CREATE_TABLE_USERS)
            .withMapping(NAME, FIRST_NAME)
            .build()
    );
  }
}

Then you decide that the email field should be mandatory, so you change the schema and migrate the data again:

private static final String CREATE_TABLE_USERS = "CREATE TABLE " +
    TABLE_USERS +
    " ( " +
    ID + " INTEGER PRIMARY KEY AUTOINCREMENT " + ", " +
    NAME + " TEXT " + ", " +
    EMAIL + " TEXT NOT NULL" +
    " ) ";

@Override
public void onUpgrade(SQLiteDatabase db, int oldVersion, int newVersion) {
  MigrationsHelper helper = new MigrationsHelper();
  if (oldVersion < 2) {
    helper.performMigrations(db,
        TableMigration.of(TABLE_USERS)
            .to(CREATE_TABLE_USERS)
            .withMapping(NAME, FIRST_NAME)
            .build()
    );
  }
  if (oldVersion < 3) {
    db.execSQL("DELETE FROM " + TABLE_USERS + " WHERE " + EMAIL + " IS NULL");
    helper.performMigrations(db,
        TableMigration.of(TABLE_USERS)
            .to(CREATE_TABLE_USERS)
            .build()
    );
  }
}

The code looks fine, but you have just broken migrations from v1 to v3. If there is an user with a null email field, the app will crash in line 13 above. But why, shouldn't the email field in v2 schema be nullable? It should, but this migration uses the constant containing the latest schema definition with different column constraint.

The worst thing about this kind of bugs is that it might slip through your tests, because the crash happens only if you have a specific data before the application update.

You migth be tempted to define separate migrations from every old version to the latest one (in our case migrations from v1 to v3 and from v2 to v3) and always execute only single migration, but this workaround doesn't scale. For each new migration you'd have to check and potentially update every existing migration. When you publish the app twice a month, this quickly becomes a huge problem.

The other solution is to make every migration completely independent from the others, and execute them sequentially. This way, when you define a new migration, you don't have to worry about the previous ones. This means that when you upgrade from v1 to v3, you first upgrade from v1 to v2 and then from v2 to v2 and after the first step the database should be in the same state it were, when the v2 was the latest version. In other words, you have to keep an old database schemas.

As usual there are multiple ways to do this. You can copy the schema definition to another constant and append "ver#" suffix, but it means there will be a lot of duplicated code (although this code should never, ever change, so it's not as bad as the regular case of copypaste). The other way is to keep the initial database state and all the schema updates. The issue here is that you don't have a place in your code with current schema definition. The opposite solution is to keep the current schema and the list of downgrades. Sounds counterintuitive? Don't worry, that's because it *is* counterintuitive.

In android-schema-utils I've chosen the third approach, because in the long run it processes less data than the upgrades solution - in case of upgrade from vN-1 to vN it has to generate only 1 additional schema instead of N-1 schemas. I'm still not sure if the code wouldn't be clearer had I went with duplicated schema definitions approach, but the current approach, once you get used to it, works fine. The schema and migrations for our example would look like this:

private static final MigrationsHelper MIGRATIONS_HELPER = new MigrationsHelper();
private static final Schemas SCHEMAS = Schemas.Builder
    .currentSchema(3,
        new TableDefinition(TABLE_USERS,
            new AddColumn(ID, "INTEGER PRIMARY KEY AUTOINCREMENT"),
            new AddColumn(NAME, "TEXT"),
            new AddColumn(EMAIL, "TEXT NOT NULL")
        )
    )
    .upgradeTo(3,
        new SimpleMigration() {
          @Override
          public void apply(SQLiteDatabase db, Schema schema) {
            db.execSQL("DELETE FROM " + TABLE_USERS + " WHERE " + EMAIL + " IS NULL");
          }
        },
        auto()
    )
    .downgradeTo(2,
        new TableDowngrade(TABLE_USERS, new AddColumn(EMAIL, "TEXT"))
    )
    .upgradeTo(2,
        SimpleTableMigration.of(TABLE_USERS)
            .withMapping(NAME, FIRST_NAME)
            .using(MIGRATIONS_HELPER)
        )
    .downgradeTo(1,
        new TableDowngrade(TABLE_USERS,
            new AddColumn(FIRST_NAME, "TEXT"),
            new AddColumn(LAST_NAME, "TEXT"),
            new DropColumn(EMAIL)
        )
    )
    .build();

There are other benefits of keeping the old schemas in a more reasonable format than raw strings. Most of the schema migrations can be deducted from comparing subsequent schema versions, so you don't have to do it yourself. For example in migration from v2 to v3 I didn't have to specify that I want to migrate the Users table - the auto() migration automatically handles it. If the auto() is the only migration for a given upgrade, you can skip the whole upgradeTo() block. In our case that covered about 50% migrations, but YMMV.

If you go this way, your onUpgrade method, which usually is the most complex part of SQLiteOpenHelper, can be reduced to this:

@Override
public void onUpgrade(SQLiteDatabase db, int oldVersion, int newVersion) {
  SCHEMAS.upgrade(oldVersion, mContext, db);
}

This part concludes the "offline mode" series. Here's the short recap:

If you don't want to compromise on UX, your application should work regardless whether the user is connected to internet or not.
In this case the user may end up in a situation when the application is upgraded, but not all data is synced with the server yet. You *do not* want to lose your users' data. You'll have to migrate them.
If you migrate your data, you should keep the migrations separate from one another, because otherwise maintaining them becomes a nightmare.
The best way to do this that I know of, is keeping the old schemas and always performing all migrations sequentially. To make things simpler, I recommend the android-schema-utils library.

Sunday, January 12, 2014

Android SQLiteDatabase gotcha

In my previous post I mentioned a nasty SQLiteDatabase gotcha and recommended using the MigrationHelper utility I wrote. If you have checked this class's sources, you might have noticed a weird code. Before getting the list of columns the table is renamed to the temporary name and then renamed back:

final String tempTable = "tmp_" + tempTableIndex++;
db.execSQL("ALTER TABLE " + migration.tableName + " RENAME TO " + tempTable);
ImmutableSet<String> oldColumns = getColumns(db, tempTable);

db.execSQL(migration.createTableStatement);
final String tempNewTable = "tmp_" + tempTableIndex++;
db.execSQL("ALTER TABLE " + migration.tableName + " RENAME TO " + tempNewTable);
ImmutableSet<String> newColumns = getColumns(db, tempNewTable);

db.execSQL("ALTER TABLE " + tempNewTable + " RENAME TO " + migration.tableName);

private static ImmutableSet<String> getColumns(SQLiteDatabase db, String table) {
  Cursor cursor = db.query(table, null, null, null, null, null, null, "0");
  if (cursor != null) {
    try {
      return ImmutableSet.copyOf(cursor.getColumnNames());
    } finally {
      cursor.close();
    }
  }
  return ImmutableSet.of();
}

Initially the MigrationHelper's code looked like this:

static final String TEMP_TABLE = "tmp";
db.execSQL("ALTER TABLE " + migration.tableName + " RENAME TO " + TEMP_TABLE);
ImmutableSet<String> oldColumns = getColumns(db, TEMP_TABLE);

db.execSQL(migration.createTableStatement);
ImmutableSet<String> newColumns = getColumns(db, migration.tableName);

It worked for a single migration, but didn't work for multiple migrations - the helper method for getting the column set always returned the columns of first table. Since the query was always the same, I suspected the results are cached somewhere. To verify this hypothesis I added to the temporary table name an index incremented with every migration. It worked, but then I realized I need to do the same for getting the columns of the new schema - otherwise the helper wouldn't work if the same table were migrated twice. This way the weird code was born.

But the same thing could happen outside of MigrationHelper operations, for example if you need to iterate through rows of the same table in two different migrations:

@Override
public void onUpgrade(final SQLiteDatabase db, int oldVersion, int newVersion) {
  if (oldVersion <= 1500) {
    Cursor c = db.query("some_table", /* null, null, null... */);
    // use Cursor c
  }

  // other migrations, including ones that change the some_table table's columns

  if (oldVersion <= 2900) {
    Cursor c = db.query("some_table", /* null, null, null... */);
    // try to use Cursor c and crash terribly
  }
}

So I checked the AOSP code for the suspected cache to see how the entries can be evicted or if the cache can be disabled. There are no methods for this, so you can't do it with straightforward call, but maybe you can exploit the implementation details?

On ICS the cache is implemented as LruCache, so theoretically you could evict old entries by filling the cache with new ones, but there is one hiccup - you don't know the cache size, so you'd always have to go with MAX_SQL_CACHE_SIZE.

Before ICS you couldn't do even that - the implementation of this "cache" is just a fixed size buffer for SQLiteStatements. Once that buffer is full, no more statements are cached. This also has one more consequence - your app might work much slower on Android 2.x after upgrade from old version than after fresh install, because the db cache will be filled with queries used in migrations.

Fortunately the keys of this cache are raw SQL strings, so we can disable cache for migration queries by adding "WHERE n==n" clause with n incremented for every query (note that you musn't pass n as a bound parameter - the whole point of adding this selection is to make the queries different and force SQLiteDatabase to compile another statement).

The question you should ask yourself is why do I have to know and care about all this. Isn't SQLite smart enough to see that I'm trying to access the database using prepared statement compiled against old schema? It turns out the SQLite detects this issues and raises SQLITE_SCHEMA error (commented with "The database schema changed"), but Android's SQLiteDatabase wrapper drops this error and happily uses the old, invalid statements. Bad Android.

Friday, January 3, 2014

Offline mode in Android apps, part 2 - SQLite's ALTER TABLE

In first part of this series I showed that to implement offline mode in your Android app you have to implement data migrations. If you're using SQLite database, it means you'll have to use (or rather work around) it's ALTER TABLE syntax:

So all you can do with it is adding the column or renaming the table, but in reality you probably need to alter a single column, remove column or change the table constraints. You can achieve this by doing the following operation:

Rename the table T with old schema to old_T.
Create the table T with new schema.
Use "INSERT INTO T (new_columns) SELECT old_columns FROM old_T" query to populate the table T with the data from the renamed table old_T.
Drop old_T.

Doing it manually is quite error prone though: for every migration you have to specify the new_columns and old_columns list. What's worse, in 95% of cases you just want to list the columns common for old and new schema. Fortunately we can automate such trivial migrations by executing SELECT with LIMIT 0 (or PRAGMA TABLE_INFO) for both tables, getting the columns set using Cursor.getColumnNames(), and calculating these columns sets intersection.

You can write a nice wrapper for this yourself, but a) I already did it, so you don't have to and b) there is a very nasty gotcha which would probably cost you few hours of teeth grinding, so do yourself a favor and check this repository out, especially the MigrationsHelper class. It automates the trivial migrations and allows you to define a mappings for situations when you rename the column or add a non-nullable column in new schema.

In the next two posts I'll describe the gotcha I've mentioned in the previous paragraph and show some other non-obvious consequences of doing data migrations.

Tuesday, December 3, 2013

SQLite views gotcha

tl;dr: don't left join on view, or you gonna have a bad time.

I have investigated a performance issue of the db in Android app today. The symptoms looked like a classic case of the missing index: the performance degraded with adding more data to certain tables. However, the quick check of sqlite_master table and looking at some EXPLAIN QUERY PLAN queries indicated that everything is properly indexed (which is not very surprising, given that we use android-autoindexer).

I started dumping the explain query plans for every query and it turned out that some queries perform multiple table scans instead of single scan of main table + indexed searches for joined tables. It means that the indices were in place, but they weren't used.

The common denominator of these queries was joining with a view. Here's the simplest schema which demonstrates the issue:

sqlite> create table x (id integer);
sqlite> create table y (id integer, x_id integer);

sqlite> explain query plan select * from x left join y on x.id = x_id;
selectid    order       from        detail
----------  ----------  ----------  ----------------------------------------------------------------
0           0           0           SCAN TABLE x (~1000000 rows)
0           1           1           SEARCH TABLE y USING AUTOMATIC COVERING INDEX (x_id=?) (~7 rows)

sqlite> create view yyy as select * from y;

sqlite> explain query plan select * from x left join yyy on x.id = x_id;
selectid    order       from        detail
----------  ----------  ----------  -------------------------------------------------------------------
1           0           0           SCAN TABLE y (~1000000 rows)
0           0           0           SCAN TABLE x (~1000000 rows)
0           1           1           SEARCH SUBQUERY 1 USING AUTOMATIC COVERING INDEX (x_id=?) (~7 rows)

Of course this behaviour is documented in the SQLite Query Planner overview (point 3 of the Subquery flattening paragraph), and I even remember reading this docs few times, but I guess something like this has to bite me in the ass before I memorize it.

Everything works fine if you copypaste the views selection in place of the joined view, which makes me a sad panda, because I wish SQLite could do this for me. On the other hand it's a very simple workaround for this issue, and, with a right library, the code might even be manageable.

Wednesday, November 6, 2013

SQL injection through ContentProvider projection

The SQL injection through query parameters is the common security issue of any system using SQL database. Android is no different than any other system, so if you're using SQLite database in your Android app, you should always sanitize the database inputs.

Obligatory XKCD

If you are also using an exported ContentProvider, you need to take care of one more vector of attack: the projection parameter of the queries. Just like SQLiteDatabase, the ContentProvider allows the users to specify which columns they want to retrieve. It makes sense, because it reduces the amount of data fetched, which might improve performance and reduce the RAM footprint of your app. Unlike the SQLiteDatabase, the ContentProvider might be exported, which means that the external applications can query the data from it requesting an arbitrary projection, which are then turned into raw SQL queries. For example:

'Bobby Tables was here'; DROP TABLE Students; --
* FROM sqlite_master; --
* FROM non_public_table_I_found_out_about_using_previous_query; --

Basically it means that if you exposed a single uri without sanitizing the projection, you have exposed your entire db.

So how do you sanitize your projections? I've given it some thought and it seems that the only sensible thing to do is allowing only subsets of predefined set of columns.

You cannot allow any expression, because you'd allow any expressions, including SELECTs from other tables and allowing certain expressions is not a trivial task.

You shouldn't ignore the provided projection and return all columns, because one of the benefits of using projections is limiting the amount of data retrieved from database. Besides, certain widely used Google application ignores the existence of Cursor.getColumnIndex method and assumes that the columns will be returned in the same order they were specified in projection. The other app won't work correctly, and the users will probably blame you.

Thursday, June 6, 2013

SQLite type affinity strikes back

About a year ago I have wrote about a certain SQLite gotcha on Android. tl;dr: in some cases when you create a view with unions, SQLite cannot determine a type of the column, and since Android binds all selection arguments as strings, SQLite ends up comparing X with "X", concludes those are not the same thing and returns fewer rows than you'd expect.

Recently the same problem reared it's ugly head. It turns out that it's very easy to create in a view a column with undefined type. It might happen in case of joins, using aggregation functions, subqueries, etc. pretty much anything more fancy than simple select. Therefore I recommend checking the columns type using the pragma table_info(table) command for every view:

sqlite> .head on
sqlite> .mode column
sqlite> pragma table_info (v);

cid         name        type        notnull     dflt_value  pk
----------  ----------  ----------  ----------  ----------  ----------
0           test                    0                       0

If the type of a column is undefined and you need to use this column in your selection arguments, you should add the UNION with an empty row with well defined column types:

sqlite> CREATE TABLE types (i INTEGER, t TEXT);
sqlite> CREATE VIEW vfix AS SELECT i AS test FROM types WHERE 1=0 UNION SELECT * FROM v;
sqlite> pragma table_info (vfix);

cid         name        type        notnull     dflt_value  pk
----------  ----------  ----------  ----------  ----------  ----------
0           test        INTEGER     0                       0

Friday, June 22, 2012

SQLite unions gotcha

Recently I've been tracking the problem with SQLite database used in the Android application I'm working on. The starting point of the whole story is that I've noticed that the cursor created with the SQLiteDatabase.query() method returned smaller data set than the same query executed through sqlite3 command line interface. The query in question looked like this:

SELECT * FROM some_view WHERE (column_a=1 OR column_b=1);

Inside the Android app I was getting rows for the second part of OR clause (i.e. column_b=1), but no rows for the first part.

Quick search through Android sources yielded the clue - I wasn't executing exactly the same query on the command line. Selection arguments are always bound as a strings, so the question marks in query string should be surrounded with quotes. So the Android app was executing the following query:

SELECT * FROM some_view WHERE (column_a="1" OR column_b="1");

So now we have another puzzle: why column_b=1 and column_b="1" give the same results, but the behavior is different for column_a?
Let's try to reproduce the problem:

sqlite> .mode column
sqlite> .headers on
sqlite> CREATE TABLE t (x INTEGER);
sqlite> INSERT INTO t VALUES(1);
sqlite> SELECT COUNT(*) FROM t WHERE x=1;
1
sqlite> SELECT COUNT(*) FROM t WHERE x="1";
1

So far so good, no surprises. Let's create a view similar to the one which causes problems.

sqlite> CREATE VIEW v AS SELECT NULL AS a, x AS b FROM t UNION SELECT x, NULL FROM t;
sqlite> SELECT * FROM v;
a           b
----------  ----------
            1
1

Now let's take a look at counts:

sqlite> SELECT COUNT(*) FROM v WHERE b=1;
COUNT(*)
----------
1
sqlite> SELECT COUNT(*) FROM v WHERE b="1";
COUNT(*)
----------
1
sqlite> SELECT COUNT(*) FROM v WHERE a=1;
COUNT(*)
----------
1
sqlite> SELECT COUNT(*) FROM v WHERE a="1";
COUNT(*)
----------
0

Yay, we reproduced our bug. But why is this happening?

sqlite> PRAGMA TABLE_INFO(v);
cid         name        type        notnull     dflt_value  pk
----------  ----------  ----------  ----------  ----------  ----------
0           a                       0                       0
1           b           integer     0                       0

It seems that the lack of explicitly defined type of the first column prevents type conversion (please note that this is only my assumption based on the observations above; unfortunately the sqlite documentation doesn't cover such cases in detail). How can we work around this issue?

sqlite> CREATE VIEW vfix AS SELECT x AS a, x AS b FROM t WHERE 1=0 UNION SELECT * FROM v;
sqlite> PRAGMA TABLE_INFO(vfix);
cid         name        type        notnull     dflt_value  pk
----------  ----------  ----------  ----------  ----------  ----------
0           a           integer     0                       0
1           b           integer     0                       0

As you can see the column types are correctly copied from the underlying table. Let's check the counts:

sqlite> SELECT COUNT(*) FROM vfix WHERE b=1;
COUNT(*)
----------
1
sqlite> SELECT COUNT(*) FROM vfix WHERE b="1";
COUNT(*)
----------
1
sqlite> SELECT COUNT(*) FROM vfix WHERE a=1;
COUNT(*)
----------
1
sqlite> SELECT COUNT(*) FROM vfix WHERE a="1";
COUNT(*)
----------
1

Looks OK. Pretty? No, but it does the job and that's what matters at the end of the day.