BioC XML

bioc’s interfaces for processing BioC XML are grouped in the biocxml package.

Encoding the BioC object

Encoding the BioC collection object collection:

from bioc import biocxml
# Serialize ``collection`` to a BioC formatted ``str``.
biocxml.dumps(collection)
# Serialize ``collection`` as a BioC formatted stream to ``fp``.
with open(filename, 'w') as fp:
    biocxml.dump(collection, fp)

Compact encoding:

from bioc import biocxml
biocxml.dumps(collection, pretty_print=False)

Incremental BioC serialisation:

from bioc import biocxml
with biocxml.iterwrite(filename) as writer:
    writer.write_collection_info(collection)
    for document in collection.documents:
        writer.write_document(document)

Decoding the BioC XML file

Decoding the BioC XML file:

from bioc import biocxml
# Deserialize ``s`` to a BioC collection object.
collection = biocxml.loads(s)
# Deserialize ``fp`` to a BioC collection object.
with open(filename, 'r') as fp:
    collection = biocxml.load(fp)

Incrementally decoding the BioC XML file:

from bioc import biocxml
# read from a file
with biocxml.iterparse(filename) as reader:
    collection_info = reader.get_collection_info()
    for document in reader:
        # process document
        ...
# read from a ByteIO
with biocxml.iterparse(open(filename, 'rb')) as reader:
    collection_info = reader.get_collection_info()
    for document in reader:
        # process document
        ...

get_collection_info can be called after the with statement.

Together with Python coroutines, this can be used to generate BioC XML in an asynchronous, non-blocking fashion.

from bioc import biocxml

with biocxml.iterparse(source) as reader, biocxml.iterwrite(dest) as writer:
    collection_info = reader.get_collection_info()
    writer.write_collection_info(collection_info)
    for document in reader:
        # modify the document
        ...
        writer.write_document(document)