Library reference
=================

The `Reader` classes
--------------------

`cdblib.Reader` reads standard "32-bit" cdb files, such as those produced by the
`cdbmake` CLI tool. `cdblib.Reader64` reads "64-bit" cdb files, which can be
produced by this package.

The `Reader` classes can be instantiated by passing one positional argument,
a `bytes`-like object with a database's content:

    >>> import cdblib
    >>> with open('info.cdb', 'rb') as f:
    ...     data = f.read()
    >>> reader = cdblib.Reader(data)

Alternatively, you can use the ``Reader`` classes as a context manager and
give either a file path or a file-like object.

    >>> with cdblib.Reader.from_file_path('info.cdb') as reader:
    ...    print(reader.items())

    >>> with open('info.cdb', 'rb') as f:
    ...     with cdblib.Reader.from_file_obj(f) as reader:
    ...         print(reader.items())

When using the `.from_file_path()` or `.from_file_obj()` constructors, a
memory-mapped file object is created. This keeps the whole database from
being read into memory. See the
`Python docs <https://docs.python.org/3/library/mmap.html>`_ for more
information on `mmap`.

Retrieving data
^^^^^^^^^^^^^^^

The `.items()` method returns a list of `(key, value)` tuples representing
all of the records stored in the database (in insertion order).
Note that a single key can have multiple values associated with it.

    >>> reader.items()
    [(b'k1', b'v1'), (b'k2', b'v2a'), (b'k2', b'v2b')]

The `.iteritems()` method is like `.items()`, but it returns an iterator over
the items rather than a list.

The `.keys()` method returns a list of the keys stored in the database
(in insertion order). The `.iterkeys()` method returns an iterator over the
keys. Note that keys will be repeated if a single key has multiple values
associated with it.

The `.values()` method returns a list of the values stored in the database
(in insertion order). The `.itervalues()` method returns an iterator over the
values.

----

Calling `len()` on a `Reader` instance returns the number of records (key-value
pairs) stored in the database.

    >>> len(reader)
    3

The `in` operator can be used to test whether a key is present in the database.

    >>> b'k1' in reader
    True
    >>> b'k3' in reader
    False

----

The `.get()` method returns the first value in the database for `key`.
If the key isn't in the database, `None` will be returned. To use a different
default value, use the `default` keyword:

    >>> reader.get(b'k2')
    b'v2a'
    >>> reader.get(b'missing')
    None
    >>> reader.get(b'missing', default=b'fallback')
    b'fallback'

The `.gets()` method returns an iterator over all the values associated
with `key`.

    >>> list(reader.gets(b'k2'))
    [b'v2a', b'v2b']

`Reader` instances also support dict-like retrieval of the first value
associated with `key`. `KeyError` will be raised if the requested key isn't in
the database.

    >>> reader[b'k2']
    b'v2a'
    >>> reader[b'missing2']
    KeyError: b'missing'

----

Note that the values retrieved by the `.get()` and `.gets()` methods are
`bytes` objects.

If the values in the database represent integers, you can retrieve them as
Python `int` objects with the `.getint()` and `.getints()` methods.

    >>> reader.get(b'key_with_int_value')
    b'1'
    >>> reader.getint(b'key_with_int_value')
    1

Similarly, the `.getstring()` and `.getstrings()` methods will retrieve
the values as `str` objects.

    >>> reader.get(b'key_with_str_value')
    b'text data'
    >>> reader.getstring(b'key_with_str_value')
    'text data'

You may specify an encoding with the `encoding` keyword argument.

    >>> reader.get(b'fancy_a_or_f')
    b'\xc4'
    >>> reader.getstring(b'fancy_a_or_f', encoding='cp1252')
    'Ä'
    >>> reader.getstring(b'fancy_a_or_f', encoding='mac-roman')
    'ƒ'


Encoding and strict mode
^^^^^^^^^^^^^^^^^^^^^^^^

Database keys are stored as `bytes` objects. By default, `Reader` instances
will attempt to convert `str` keys and `int` keys automatically.

    >>> reader.get(b'1')  # Binary key
    b'value_for_1'
    >>> reader.get('1')  # Text key
    b'value_for_1'
    >>> reader.get(1)  # Integer key
    b'value_for_1'

To disable this behavior, pass `strict=True` when creating the `Reader`
instance. This will increase read performance, and is useful when you want to
deal with `bytes` keys only.

    >>> import cdblib
    >>> with open('info.cdb', 'rb') as f:
    ...     data = f.read()
    >>> reader = cdblib.Reader(data, strict=True)
    >>> reader.get(b'1')  # Binary key
    b'value_for_1'
    >>> reader.get(1)
    ...
    TypeError: key must be of type 'bytes'

The `Writer` classes
--------------------

`cdblib.Writer` produces standard "32-bit" cdb files, which should be readable
by other `cdb` tools like `cdbget` and `cdbdump`. `cdblib.Writer64` produces
"64-bit" cdb files, which can be read by this package.

The `Writer` classes take one positional argument, a file-like object opened in
binary mode.

    >>> import cdblib
    ...
    ... with open('info.cdb', 'wb') as f:
    ...     writer = cdblib.Writer(f):
    ...     writer.put(b'k1', b'v1a')
    ...     writer.finalize()

`Writer` instances don't create readable databases until their  `.finalize()`
method is called. You should use them as a context manager wherever possible -
this ensures that `.finalize()` is called.

    >>> with open('info.cdb', 'wb') as f:
    ...     with cdblib.Writer(f) as writer:
    ...         writer.put(b'k1', b'v1a')


Storing data
^^^^^^^^^^^^

The `.put()` method is used to create a database record for a binary key
and a binary value.

    >>> import io
    >>> import cdblib
    >>> f = io.BytesIO()  # Use an in-memory database
    >>> writer = cdblib.writer(f)
    >>> writer.put(b'k1', b'v1a')

The `.puts()` method adds multiple binary values at the same key.

    >>> writer.puts(b'k2', [b'v2a', b'v2b'])

To store integer values, use `.putint()` or `.putints()`.

    >>> writer.putint(b'key_with_int_values', 1)
    >>> writer.putints(b'key_with_int_values', [2, 3])

To store text data, use `.putstring()` or `.putstrings()`, with an optional
`encoding` keyword argument. The default encoding is `'utf-8'`.

    >>> writer.putstring(b'fancy_a', 'Ä')  # stores b'\xc3\x84'
    >>> writer.putstring(b'fancy_a', 'Ä', encoding='cp1252')  # stores b'\xc4'
    >>> writer.putstrings(b'boring_a', ['a', 'A'])

As above, don't forget to call `.finalize()` to write the database to disk if
you're not using a context manager.

    >>> writer.finalize()

Encoding and strict mode
^^^^^^^^^^^^^^^^^^^^^^^^

Database keys are stored as `bytes` objects. As with `Reader` instances,
`Writer` instances will attempt to convert text keys and integer keys
automatically.

To disable this behavior, pass `strict=True` when creating the `Writer`
instance. This will increase write performance, and is useful when you want to
deal with `bytes` keys only.


Advanced usage
--------------

Alternate hash functions
^^^^^^^^^^^^^^^^^^^^^^^^

By default `python-pure-cdb` will use the standard cdb hash function
described on `djb's page <https://cr.yp.to/cdb/cdb.txt>`_.

You can substitute in your own hash function when using a `Writer` instance,
if you're so inclined. This will of course require you to use the same hash
function when reading the database.

    >>> import io
    ... import zlib
    ...
    ... import cdblib
    ...
    ...
    ... def custom_hash(x):
    ...     return zlib.adler32(x) & 0xffffffff
    ...
    ...
    ... with io.BytesIO() as f:
    ...     with cdblib.Writer(f, hashfn=custom_hash) as writer:
    ...         writer.put(b'k1', b'v1a')
    ...         writer.puts(b'k2', [b'v2a', b'v2b'])
    ...
    ...     reader = cdblib.Reader(f.getvalue(), hashfn=custom_hash)
    ...     reader.items()
    [(b'k1', b'v1a'), (b'k2', b'v2a'), (b'k2', b'v2b')]

C extension hash function
^^^^^^^^^^^^^^^^^^^^^^^^^

When using CPython, you can build a C Extension that speeds up using the
cdb hash function.

Set the `ENABLE_DJB_HASH_CEXT` environment variable when executing `setup.py`
to enable the extension:

.. code-block:: none

    $ ENABLE_DJB_HASH_CEXT=1 python setup.py install