Brat

brat standoff format is created by the brat annotation tool to store annotations on disk in a standoff format. annotations are stored separately from the annotated document text, which is never modified by the tool.

pubtator’s interfaces for processing PubTator file are grouped in the pubtator package.

Encoding the Brat object

Encoding the Brat document object doc:

from bioc import brat
# Serialize ``doc`` to a brat formatted ``str``.
brat.dumps_ann(doc)
# Serialize ``doc`` as a brat formatted stream to ``text_fp`` and ``ann_fp``.
with open(annpath, 'w') as ann_fp, open(txtpath, 'w') as text_fp:
    brat.dump(doc, text_fp, ann_fp)

Decoding the Brat file

from bioc import brat
# Deserialize ``s`` to a PubTator object.
doc = brat.loads(text, ann)
# Deserialize ``fp`` to a PubTator object.
with open(annpath) as ann_fp, open(txtpath) as text_fp:
    doc = brat.load(text_fp, ann_fp)

Decoding the files in a folder:

from bioc import brat

# read from a file
for doc in brat.scandir(dirname):
    # process document
    ...

Converting from Brat to a BioC

from bioc import brat
from bioc.tools.brat2bioc import brat2bioc

docs = brat.listdir(dirname)

# Convert a list of Brat docs to a BioC collection object.
collection = brat2bioc(docs)