PubTator
pubtator’s interfaces for processing PubTator file are grouped in the pubtator
package.
Encoding the PubTator object
Encoding the PubTator document object doc
:
from bioc import pubtator
# Serialize ``doc`` to a PubTator formatted ``str``.
pubtator.dumps([doc])
# Serialize ``collection`` as a BioC formatted stream to ``fp``.
with open(filename, 'w') as fp:
pubtator.dump([doc], fp)
Decoding the PubTator file
from bioc import pubtator
# Deserialize ``s`` to a PubTator object.
docs = pubtator.loads(s)
# Deserialize ``fp`` to a PubTator object.
with open(filename, 'r') as fp:
docs = pubtator.load(fp)
Incrementally decoding the PubTator file:
from bioc import pubtator
# read from a file
with open(filename) as fp:
for doc in pubtator.iterparse(fp):
# process document
...
Converting from PubTator to a BioC
from bioc import pubtator
from bioc.tools.pubtator2bioc import pubtator2bioc
docs = pubtator.loads(text)
# Convert a list of PubTator docs to a BioC collection object.
collection = pubtator2bioc(docs)