python-cdb compatibility module

cdblib.compat is designed to be used as a drop-in replacement for python-cdb, a Python 2-only module for interacting with constant databases.

To use it in your Python 3 application:

import cdblib.compat as cdb  # replaces import cdb

Reading existing databases

The init() function accepts a path to an existing database file. It returns a cdb object that can be used to retrieve records from it.

>>> db = cdb.init('info.cdb')

The .each() method returns successive (key, value) pairs from the database. After the last record is returned the next call will return None. The call after that will return the first record again.

>>> db.each()
('a', 'value_a1')
>>> db.each()
('a', 'value_a2')
>>> db.each()
('b', 'value_b1')
>>> db.each()  # No more records
>>> db.each()  # Loop around to the first record
('a', 'value_a1')

The .keys() method returns a list of distinct keys from the database.

>>> db.keys()
['a', 'b']

The cdb object keeps an iterator over the distinct keys of the database. The .firstkey() method resets the iterator and returns the first stored key. .nextkey() advances the iterator and returns the next key. After exhausting the iterator, None will be returned until .firstkey() is called again.

>>> db.firstkey()
'a'
>>> db.nextkey()
'b'
>>> db.nextkey()  # No more keys
>>> db.firstkey()  # Reset the iterator
'a'

Call the .get() method with a key k and an optional index i to retrieve the i-th value stored under k. If there is no such value, .get() returnes None.

>>> db.get('a')
'value_a1'
>>> db.get('a', 1)
'value_a2'
>>> db.get('a', 3)  # Returns None

The cdb object can be accessed like a dict to retrieve the first value stored under a key. If there is no such key in the database, KeyError is raised.

>>> db['a']
'value_a1'
>>> db['b']
'value_b1'

Call the .getall() method to retrieve a list of the values stored under the key k.

>>> db.getall('a')
['value_a1', 'value_a2']
>>> db.getall('b')
['value_b1']
>>> db.getall('c')  # No such key, returns empty list
[]

The cdb object has a size property, which returns the total size of the database (in bytes). It also has a name property, which returns the path to the database file.

Writing new databases

The cdbmake() class is used to create a new database. Call it with two file paths: (1) the ultimate location of the database, (2) a temporary location to use when creating the database.

>>> cdb_path = '/tmp/info.cdb'
>>> tmp_path = cdb_path + '.tmp'
>>> db = cdbmake(cdb_path, tmp_path)

Add records to the database with the .add() or .addmany() methods.

>>> db.add('b', 'value_b1')
>>> db.addmany([('a', 'value_a1'), ('a', 'value_a2')])

Write the database structure to disk and rename the temporary file to the ultimate file with the .finish() method.

Notes on encoding

Since python-cdb is a Python 2-only module, it does not distinguish between text and binary keys or values.

In order to handle str keys and values, cdblib.compat encodes text data on the way into the database:

>>> new_db.add('text_key', b'\x80 binary data')  # Key is encoded to binary
>>> new_db.add(b'\x80 binary key', 'text_data')  # Value is encoded to binary

It also decodes text data when reading:

>>> existing_db.get(b'\x80 binary key')  # Text value is decoded
'text_data'
>>> existing_db.get('text_key')  # Binary value is left alone
b'\x80 binary data'

utf-8 encoding is used by default in cdblib.compat.init() and cdblib.compat.cdbmake(). Pass a different encoding with the encoding keyword argument.

Turn off automatic encoding or decoding by supplying encoding=None. All keys and values will be assumed to be bytes objects.

>>> existing_db = cdblib.compat.init(cdb_path, encoding=None)
>>> new_db = cdblib.compat.make(cdb_path, tmp_path, encoding=None)

Other notes

The python-cdb package accepts integer file descriptors as well as file paths in init() and cdbmake(). This module does not.

The cdb objects (returned by the init() function) and the cdbmake objects close their open file objects when they are garbage collected. You may call the ._cleanup() method on either one to close the objects yourself (this method is not avaialble when using the python-cdb package).

The cdb object returned by the init() function uses mmap.mmap to avoid reading the whole database file into memory. This may be inappropriate when reading database files from certain locations, such as network drives. See the Python docs for more information on mmap.