It suddenly hit me one day that I use the wordlist in my Kindle ebook reader quite frequently, but don't necessarily remember what I actually have looked up.
Not knowing anything about the inner workings, I simply connected the Kindle via USB to my computer and manually searched the filesystem until I found system/vocabulary/vocab.db
The file was copied to my laptop for furter investigation
$ file vocab.db vocab.db: SQLite 3.x database, last written using SQLite version 3026000, page size 1024, file counter 991, database pages 133, cookie 0x9, schema 4, UTF-8, version-valid-for 991
Ok, it's a SQLite3 database. That feels quite managable. Lets see what tables there is in there.
boran@borken> sqlite3 vocab.db SQLite version 3.37.0 2021-12-09 01:34:53 Enter ".help" for usage hints. sqlite> .tables BOOK_INFO DICT_INFO LOOKUPS METADATA VERSION WORDS
The next step is to investigate the table structures.
sqlite> .schema CREATE TABLE WORDS (id TEXT PRIMARY KEY NOT NULL, word TEXT, stem TEXT, lang TEXT, category INTEGER DEFAULT 0, timestamp INTEGER DEFAULT 0, profileid TEXT); CREATE INDEX wordprofileid ON WORDS (profileid); CREATE TABLE LOOKUPS (id TEXT PRIMARY KEY NOT NULL, word_key TEXT, book_key TEXT, dict_key TEXT, pos TEXT, usage TEXT, timestamp INTEGER DEFAULT 0); CREATE INDEX lookupwordkey ON LOOKUPS (word_key); CREATE INDEX lookupbookkey ON LOOKUPS (book_key); CREATE TABLE BOOK_INFO (id TEXT PRIMARY KEY NOT NULL, asin TEXT, guid TEXT, lang TEXT, title TEXT, authors TEXT); CREATE TABLE DICT_INFO (id TEXT PRIMARY KEY NOT NULL, asin TEXT, langin TEXT, langout TEXT); CREATE TABLE METADATA (id TEXT PRIMARY KEY NOT NULL, dsname TEXT, sscnt INTEGER, profileid TEXT); CREATE TABLE VERSION (id TEXT PRIMARY KEY NOT NULL, dsname TEXT, value INTEGER);
The above, combined with a few "select * from <table>", revealed the following structure:
- BOOK_INFO - Metadata for downloaded books
- DICT_INFO - Installed wordlists
- LOOKUPS - Looked up words plus context <-- Bingo!
- METADATA - ?
- VERSION - ?
- WORDS - conjugations of the looked up words
I wrote the LOOKUPS table to a file for simpler processing:
sqlite> .output words.txt sqlite> select * from LOOKUPS;
The file looked like this:
CR!7Z6RCJEMXN1KV7KJX1Y4N6TQJ5MA:AWwLAABZAAAA:314887:10|en:ignoble|CR!7Z6RCJEMXN1KV7KJX1Y4N6TQJ5MA|B0053VMNY2|AWwLAABZAAAA:314887|It is an ignoble end after we have come so far.” |1585009704195 CR!7Z6RCJEMXN1KV7KJX1Y4N6TQJ5MA:AY0LAAB5AAAA:321310:15|en:genuflecting|CR!7Z6RCJEMXN1KV7KJX1Y4N6TQJ5MA|B0053VMNY2|AY0LAAB5AAAA:321310|I bowed in what I hoped was an appropriate way, and since I was already kneeling it looked like I was genuflecting. |1585009998761 CR!Q5S8RKEDA95F50HSYJJ0TRV2RBM1:AWwnAAB+AAAA:1348373:13|en:immaculate|CR!Q5S8RKEDA95F50HSYJJ0TRV2RBM1|B0053VMNY2|AWwnAAB+AAAA:1348373|He sat in the high-backed chair behind the wide, immaculate desk. |1585129786043
...so it only takes a classical UNIX curse to extract the words and sort them on occurrence:
boran@borken> awk -F\| '{print $2 }' words.txt | sed -e 's/en://' | sort | uniq -c | sort
It turned out that besides a few mistakes (highlighting the word above etc), my most commonly looked up words are: contrite, derelict and torpor. That seems reasonable as I don't have a clue on what any of them means. The next step might be to write a word practice program or perhaps add them and their translations to Anki?