Kindle_wordlist

It suddenly hit me one day that I use the wordlist in my Kindle ebook reader quite frequently, but don't necessarily remember what I actually have looked up.

Not knowing anything about the inner workings, I simply connected the Kindle via USB to my computer and manually searched the filesystem until I found system/vocabulary/vocab.db

The file was copied to my laptop for furter investigation

$ file vocab.db
vocab.db: SQLite 3.x database, last written using SQLite version 3026000, page size 1024, file counter 991, database pages 133, cookie 0x9, schema 4, UTF-8, version-valid-for 991

Ok, it's a SQLite3 database. That feels quite managable. Lets see what tables there is in there.

boran@borken> sqlite3 vocab.db
SQLite version 3.37.0 2021-12-09 01:34:53
Enter ".help" for usage hints.
sqlite> .tables
BOOK_INFO  DICT_INFO  LOOKUPS    METADATA   VERSION    WORDS

The next step is to investigate the table structures.

sqlite> .schema
CREATE TABLE WORDS (id TEXT PRIMARY KEY NOT NULL, word TEXT, stem TEXT, lang TEXT, category INTEGER DEFAULT 0, timestamp INTEGER DEFAULT 0, profileid TEXT);
CREATE INDEX wordprofileid ON WORDS (profileid);
CREATE TABLE LOOKUPS (id TEXT PRIMARY KEY NOT NULL, word_key TEXT, book_key TEXT, dict_key TEXT, pos TEXT, usage TEXT, timestamp INTEGER DEFAULT 0);
CREATE INDEX lookupwordkey ON LOOKUPS (word_key);
CREATE INDEX lookupbookkey ON LOOKUPS (book_key);
CREATE TABLE BOOK_INFO (id TEXT PRIMARY KEY NOT NULL, asin TEXT, guid TEXT, lang TEXT, title TEXT, authors TEXT);
CREATE TABLE DICT_INFO (id TEXT PRIMARY KEY NOT NULL, asin TEXT, langin TEXT, langout TEXT);
CREATE TABLE METADATA (id TEXT PRIMARY KEY NOT NULL, dsname TEXT, sscnt INTEGER, profileid TEXT);
CREATE TABLE VERSION (id TEXT PRIMARY KEY NOT NULL, dsname TEXT, value INTEGER);

The above, combined with a few "select * from <table>", revealed the following structure:

BOOK_INFO - Metadata for downloaded books
DICT_INFO - Installed wordlists
LOOKUPS - Looked up words plus context <-- Bingo!
METADATA - ?
VERSION - ?
WORDS - conjugations of the looked up words

I wrote the LOOKUPS table to a file for simpler processing:

sqlite> .output words.txt
sqlite> select * from LOOKUPS;

The file looked like this:

CR!7Z6RCJEMXN1KV7KJX1Y4N6TQJ5MA:AWwLAABZAAAA:314887:10|en:ignoble|CR!7Z6RCJEMXN1KV7KJX1Y4N6TQJ5MA|B0053VMNY2|AWwLAABZAAAA:314887|It is an ignoble end after we have come so far.” |1585009704195
CR!7Z6RCJEMXN1KV7KJX1Y4N6TQJ5MA:AY0LAAB5AAAA:321310:15|en:genuflecting|CR!7Z6RCJEMXN1KV7KJX1Y4N6TQJ5MA|B0053VMNY2|AY0LAAB5AAAA:321310|I bowed in what I hoped was an appropriate way, and since I was already kneeling it looked like I was genuflecting. |1585009998761
CR!Q5S8RKEDA95F50HSYJJ0TRV2RBM1:AWwnAAB+AAAA:1348373:13|en:immaculate|CR!Q5S8RKEDA95F50HSYJJ0TRV2RBM1|B0053VMNY2|AWwnAAB+AAAA:1348373|He sat in the high-backed chair behind the wide, immaculate desk. |1585129786043

...so it only takes a classical UNIX curse to extract the words and sort them on occurrence:

boran@borken> 
awk -F\| '{print $2 }' words.txt | sed -e 's/en://' | sort | uniq -c | sort

It turned out that besides a few mistakes (highlighting the word above etc), my most commonly looked up words are: contrite, derelict and torpor. That seems reasonable as I don't have a clue on what any of them means. The next step might be to write a word practice program or perhaps add them and their translations to Anki?

2022-05-12: What words have I looked up in my Kindle?