diff options
| author | Niklas Hambüchen <mail@nh2.me> | 2017-12-29 15:49:13 +0100 | 
|---|---|---|
| committer | Niklas Hambüchen <mail@nh2.me> | 2017-12-30 21:37:57 +0100 | 
| commit | 14dbd5da1cae64e6d4d2c69966e19844d090ce98 (patch) | |
| tree | de35778f31d19342ef0b4622d2bc9a66f8c68f12 | |
| parent | 9fd17c1c3c44944ea280c4c15bad0d49b298b8a9 (diff) | |
glusterfind: Speed up gfid lookup 100x by using an SQL index
Fixes #1529883.
This fixes some bits of `glusterfind`'s horrible performance,
making it 100x faster.
Until now, glusterfind was, for each line in each CHANGELOG.* file,
linearly reading the entire contents of the sqlite database in
4096-bytes-sized pread64() syscalls when executing the
  SELECT COUNT(1) FROM %s WHERE 1=1 AND gfid = ?
query through the code path:
  get_changes()
    parse_changelog_to_db()
      when_data_meta()
        gfidpath_exists()
          _exists()
In a quick benchmark on my laptop, doing one such `SELECT` query
took ~75ms on a 10MB-sized sqlite DB, while doing the same query
with an index took < 1ms.
Change-Id: I8e7fe60f1f45a06c102f56b54d2ead9e0377794e
BUG: 1529883
Signed-off-by: Niklas Hambüchen <mail@nh2.me>
| -rw-r--r-- | tools/glusterfind/src/changelogdata.py | 5 | 
1 files changed, 5 insertions, 0 deletions
diff --git a/tools/glusterfind/src/changelogdata.py b/tools/glusterfind/src/changelogdata.py index 3140d945b49..641593cf4b1 100644 --- a/tools/glusterfind/src/changelogdata.py +++ b/tools/glusterfind/src/changelogdata.py @@ -112,6 +112,11 @@ class ChangelogData(object):          """          self.cursor.execute(create_table) +        create_index = """ +        CREATE INDEX gfid_index ON gfidpath(gfid); +        """ +        self.cursor.execute(create_index) +      def _create_table_inodegfid(self):          drop_table = "DROP TABLE IF EXISTS inodegfid"          self.cursor.execute(drop_table)  | 
