Data Models

These model classes are used to represent objects on the Blackfynn platform. Briefly, there are three major classes of entities: “collection” classes, “data” classes, and “detail” or “helper” classes.

Base

BaseDataNode Base class to serve all “data” node-types on platform, e.g.

Data Catalog Basics

Dataset
Collection
DataPackage DataPackage is the core data object representation on the platform.

Time Series

TimeSeries Represents a timeseries package on the platform.
TimeSeriesChannel TimeSeriesChannel represents a single source of time series data.
TimeSeriesAnnotationLayer Annotation layer containing one or more annotations.
TimeSeriesAnnotation Annotation is an event on one or more channels in a dataset

Tabular

Tabular Represents a Tabular package on the platform.

Base Class

The BaseDataNode class provides the basic methods available on all models listed below.

class blackfynn.models.BaseDataNode(name, type, parent=None, owner_id=None, dataset_id=None, id=None, provenance_id=None, **kwargs)[source]

Base class to serve all “data” node-types on platform, e.g. Packages and Collections.

delete()[source]

Delete object from platform.

get_property(key, category='Blackfynn')[source]

Returns a single property for the provided key, if available

Parameters:
  • key (str) – key of the desired property
  • category (str, optional) – category of property
Returns:

object of type Property

Example:

pkg.set_property('quality', 85.0)
pkg.get_property('quality')
remove_property(key, category='Blackfynn')[source]

Removes property of key key and category category from the object.

Parameters:
  • key (str) – key of property to remove
  • category (str, optional) – category of property to remove
set_error()[source]

Set’s the package’s state to ERROR

set_property(key, value, fixed=False, hidden=False, category='Blackfynn', data_type=None)[source]

Add property to object using simplified interface.

Parameters:
  • key (str) – the key of the property
  • value (str,number) – the value of the property
  • fixed (bool) – if true, the value cannot be changed after the property is created
  • hidden (bool) – if true, the value is hidden on the platform
  • category (str) – the category of the property, default: “Blackfynn”
  • data_type (str) – one of ‘string’, ‘integer’, ‘double’, ‘date’, ‘user’
set_ready(**kwargs)[source]

Set’s the package’s state to READY

set_unavailable()[source]

Set’s the package’s state to UNAVAILABLE

update(**kwargs)[source]

Updates object on the platform (with any local changes) and syncs local instance with API response object.

Exmple:

pkg = bf.get('N:package:1234-1234-1234-1234')
pkg.name = "New name"
pkg.update()
exists

Whether or not the instance of this object exists on the platform.

properties

Returns a list of properties attached to object.

Data Catalog Basics

Note:

A useful special method for the following classes is __contains__, which enables you to do:

if my_pkg in my_collection:
   print "the package", pkg, "is in the collection"

Dataset

Datasets are core entities on the Blackfynn platform. All data must be placed in a Dataset, whether directly or nested. Datasets can be thought of as similar to “repositories” in GitHub; they exist directly underneath a user/organization, and all sharing is controlled from their level.

class blackfynn.models.Dataset(name, description=None, **kwargs)[source]
add(*items)

Add items to the Collection/Dataset.

create_collection(name)

Create a new collection within the current object. Collections can be created within datasets and within other collections.

Parameters:name (str) – The name of the to-be-created collection
Returns:The created Collection object.

Example:

from blackfynn import Blackfynn()

bf = Blackfynn()
ds = bf.get_dataset('my_dataset')

# create collection in dataset
col1 = ds.create_collection('my_collection')

# create collection in collection
col2 = col1.create_collection('another_collection')
create_model(name, display_name=None, description=None, schema=None, **kwargs)[source]

Defines a Model on the platform.

Parameters:
  • name (str) – Name of the model
  • description (str, optional) – Description of the model
  • schema (list, optional) – Definition of the model’s schema as list of ModelProperty objects.
Returns:

The newly created Model

Note

It is required that a model includes at least _one_ property that serves as the “title”.

Example

Create a participant model, including schema:

from blackfynn import ModelProperty

ds.create_model('participant',
    description = 'a human participant in a research study',
    schema = [
        ModelProperty('name', data_type=str, title=True),
        ModelProperty('age',  data_type=int)
    ]
)

Or define schema using dictionary:

ds.create_model('participant',
    schema = [
        {
            'name': 'full_name',
            'type': str,
            'title': True
        },
        {
            'name': 'age',
            'type': int,
        }
])

You can also create a model and define schema later:

# create model
pt = ds.create_model('participant')

# define schema
pt.add_property('name', str, title=True)
pt.add_property('age', int)
create_relationship_type(name, description, schema=None, **kwargs)[source]

Defines a RelationshipType on the platform.

Parameters:
  • name (str) – name of the relationship
  • description (str) – description of the relationship
  • schema (dict, optional) – definitation of the relationship’s schema
Returns:

The newly created RelationshipType

Example:

ds.create_relationship_type('belongs-to', 'this belongs to that')
delete()

Delete object from platform.

get_items_by_name(name)

Get an item inside of object by name (if match is found).

Parameters:name (str) – the name of the item
Returns:list of matches

Note

This only works for first-level items, meaning it must exist directly inside the current object; nested items will not be returned.

get_model(name_or_id)[source]

Retrieve a Model by name or id

Parameters:name_or_id (str or int) – name or id of the model
Returns:The requested Model in Dataset

Example:

mouse = ds.get_model('mouse')
get_property(key, category='Blackfynn')

Returns a single property for the provided key, if available

Parameters:
  • key (str) – key of the desired property
  • category (str, optional) – category of property
Returns:

object of type Property

Example:

pkg.set_property('quality', 85.0)
pkg.get_property('quality')
get_relationship(name_or_id)[source]

Retrieve a RelationshipType by name or id

Parameters:name_or_id (str or int) – name or id of the relationship
Returns:The requested RelationshipType

Example:

belongsTo = ds.get_relationship('belongs-to')
get_topology()[source]

Returns the set of Models and Relationships defined for the dataset

Returns:Keys are either models or relationships. Values are the list of objects of that type
Return type:dict
import_model(template)[source]

Imports a model based on the given template into the dataset

Parameters:template (ModelTemplate) – the ModelTemplate to import
Returns:A list of ModelProperty objects that have been imported into the dataset
models()[source]
Returns:List of models defined in Dataset
print_tree(indent=0)

Prints a tree of all items inside object.

relationships()[source]
Returns:List of relationships defined in Dataset
remove(*items)

Removes items, where items can be an object or the object’s ID (string).

remove_collaborators(*collaborator_ids)[source]

Remove collaborator(s) from Dataset.

Parameters:collaborator_ids – List of collaborator IDs to remove (Users)
remove_property(key, category='Blackfynn')

Removes property of key key and category category from the object.

Parameters:
  • key (str) – key of property to remove
  • category (str, optional) – category of property to remove
set_property(key, value, fixed=False, hidden=False, category='Blackfynn', data_type=None)

Add property to object using simplified interface.

Parameters:
  • key (str) – the key of the property
  • value (str,number) – the value of the property
  • fixed (bool) – if true, the value cannot be changed after the property is created
  • hidden (bool) – if true, the value is hidden on the platform
  • category (str) – the category of the property, default: “Blackfynn”
  • data_type (str) – one of ‘string’, ‘integer’, ‘double’, ‘date’, ‘user’
update(**kwargs)

Updates object on the platform (with any local changes) and syncs local instance with API response object.

Exmple:

pkg = bf.get('N:package:1234-1234-1234-1234')
pkg.name = "New name"
pkg.update()
upload(*files, **kwargs)

Upload files into current object.

Parameters:files – list of local files to upload.

Example:

my_collection.upload('/path/to/file1.nii.gz', '/path/to/file2.pdf')
collaborators

List of collaborators on Dataset.

exists

Whether or not the instance of this object exists on the platform.

items

Get all items inside Dataset/Collection (i.e. non-nested items).

Note

You can also iterate over items inside a Dataset/Colleciton without using .items:

for item in my_dataset:
    print "item name = ", item.name
properties

Returns a list of properties attached to object.

Collection

Collections are collections of data that exist inside of a Dataset. These can be thought of as simililar to a folder or directory.

class blackfynn.models.Collection(name, **kwargs)[source]
add(*items)

Add items to the Collection/Dataset.

create_collection(name)

Create a new collection within the current object. Collections can be created within datasets and within other collections.

Parameters:name (str) – The name of the to-be-created collection
Returns:The created Collection object.

Example:

from blackfynn import Blackfynn()

bf = Blackfynn()
ds = bf.get_dataset('my_dataset')

# create collection in dataset
col1 = ds.create_collection('my_collection')

# create collection in collection
col2 = col1.create_collection('another_collection')
delete()

Delete object from platform.

get_items_by_name(name)

Get an item inside of object by name (if match is found).

Parameters:name (str) – the name of the item
Returns:list of matches

Note

This only works for first-level items, meaning it must exist directly inside the current object; nested items will not be returned.

get_property(key, category='Blackfynn')

Returns a single property for the provided key, if available

Parameters:
  • key (str) – key of the desired property
  • category (str, optional) – category of property
Returns:

object of type Property

Example:

pkg.set_property('quality', 85.0)
pkg.get_property('quality')
print_tree(indent=0)

Prints a tree of all items inside object.

remove(*items)

Removes items, where items can be an object or the object’s ID (string).

remove_property(key, category='Blackfynn')

Removes property of key key and category category from the object.

Parameters:
  • key (str) – key of property to remove
  • category (str, optional) – category of property to remove
set_property(key, value, fixed=False, hidden=False, category='Blackfynn', data_type=None)

Add property to object using simplified interface.

Parameters:
  • key (str) – the key of the property
  • value (str,number) – the value of the property
  • fixed (bool) – if true, the value cannot be changed after the property is created
  • hidden (bool) – if true, the value is hidden on the platform
  • category (str) – the category of the property, default: “Blackfynn”
  • data_type (str) – one of ‘string’, ‘integer’, ‘double’, ‘date’, ‘user’
update(**kwargs)

Updates object on the platform (with any local changes) and syncs local instance with API response object.

Exmple:

pkg = bf.get('N:package:1234-1234-1234-1234')
pkg.name = "New name"
pkg.update()
upload(*files, **kwargs)

Upload files into current object.

Parameters:files – list of local files to upload.

Example:

my_collection.upload('/path/to/file1.nii.gz', '/path/to/file2.pdf')
exists

Whether or not the instance of this object exists on the platform.

items

Get all items inside Dataset/Collection (i.e. non-nested items).

Note

You can also iterate over items inside a Dataset/Colleciton without using .items:

for item in my_dataset:
    print "item name = ", item.name
properties

Returns a list of properties attached to object.

Data Package

The DataPackage class is used for all non-specific data classes (i.e. classes that do not need specialized methods).

class blackfynn.models.DataPackage(name, package_type, **kwargs)[source]

DataPackage is the core data object representation on the platform.

Parameters:
  • name (str) – The name of the data package
  • package_type (str) – The package type, e.g. ‘TimeSeries’, ‘MRI’, etc.

Note

package_type must be a supported package type. See our data type registry for supported values.

relate_to(*records)[source]

Relate current DataPackage to one or more ``Record``s

Parameters:records (list of Records) – Records to relate to data package
Returns:Relationship that defines the link

Example

Relate package to a single record:

eeg.relate_to(participant_123)

Relate package to multiple records:

# relate to explicit list of records
eeg.relate_to(
    participant_001
    participant_002,
    participant_003,
)

# relate to all participants
eeg.relate_to(participants.get_all())

Note

The created relationship will be of the form DataPackage –(belongs_to)–> Record.

files

Returns the files of a DataPackage. Files are the possibly modified source files (e.g. converted to a different format), but they could also be the source files themselves.

sources

Returns the sources of a DataPackage. Sources are the raw, unmodified files (if they exist) that contains the package’s data.

view

Returns the object(s) used to view the package. This is typically a set of file objects, that may be the DataPackage’s sources or files, but could also be a unique object specific for the viewer.

Data-Specific Classes

Timeseries

class blackfynn.models.TimeSeries(name, **kwargs)[source]

Bases: blackfynn.models.DataPackage

Represents a timeseries package on the platform. TimeSeries packages contain channels, which contain time-dependent data sampled at some frequency.

Parameters:name – The name of the timeseries package
add_annotations(layer, annotations)[source]
Parameters:
  • layer – either TimeSeriesAnnotationLayer object or name of annotation layer. Note that non existing layers will be created.
  • annotations – TimeSeriesAnnotation object(s)
Returns:

list of TimeSeriesAnnotation objects

add_channels(*channels)[source]

Add channels to TimeSeries package.

Parameters:channels – list of Channel objects.
add_layer(layer, description=None)[source]
Parameters:
  • layer – TimeSeriesAnnotationLayer object or name of annotation layer
  • description (str, optional) – description of layer
annotation_counts(start, end, layers, period, channels=None)[source]

Get annotation counts between start and end.

Parameters:
  • start (datetime or microseconds) – The starting time of the range to query
  • end (datetime or microseconds) – The ending time of the the range to query
  • layers ([TimeSeriesLayer]) – List of layers for which to count annotations
  • period (string) – The length of time to group the counts. Formatted as a string - e.g. ‘1s’, ‘5m’, ‘3h’
  • channels ([TimeSeriesChannel]) – List of channel (if omitted, all channels will be used)
append_annotation_file(file)[source]

Processes .bfannot file and adds to timeseries package.

Parameters:file – path to .bfannot file
delete_layer(layer)[source]

Delete annotation layer.

Parameters:layer – annotation layer object
get_channel(channel)[source]

Get channel by ID.

Parameters:channel (str) – ID of channel
get_data(start=None, end=None, length=None, channels=None, use_cache=True)[source]

Get timeseries data between start and end or start and start + length on specified channels (default all channels).

Parameters:
  • start (optional) – start time of data (usecs or datetime object)
  • end (optional) – end time of data (usecs or datetime object)
  • length (optional) – length of data to retrieve, e.g. ‘1s’, ‘5s’, ‘10m’, ‘1h’
  • channels (optional) – list of channel objects or IDs, default all channels.

Note

Data requests will be automatically chunked and combined into a single Pandas DataFrame. However, you must be sure you request only a span of data that will properly fit in memory.

See get_data_iter for an iterator approach to timeseries data retrieval.

Example

Get 5 seconds of data from start over all channels:

data = ts.get_data(length='5s')

Get data betwen 12345 and 56789 (representing usecs since Epoch):

data = ts.get_data(start=12345, end=56789)

Get first 10 seconds for the first two channels:

data = ts.get_data(length='10s', channels=ts.channels[:2])
get_data_iter(channels=None, start=None, end=None, length=None, chunk_size=None, use_cache=True)[source]

Returns iterator over the data. Must specify either ``end`` OR ``length``, not both.

Parameters:
  • channels (optional) – channels to retrieve data for (default: all)
  • start – start time of data (default: earliest time available).
  • end – end time of data (default: latest time avialable).
  • length – some time length, e.g. ‘1s’, ‘5m’, ‘1h’ or number of usecs
  • chunk – some time length, e.g. ‘1s’, ‘5m’, ‘1h’ or number of usecs
Returns:

iterator of Pandas Series, each the size of chunk_size.

get_layer(id_or_name)[source]

Get annotation layer by ID or name.

Parameters:id_or_name – layer ID or name
insert_annotation(layer, annotation, start=None, end=None, channel_ids=None, annotation_description=None)[source]

Insert annotations using a more direct interface, without the need for layer/annotation objects.

Parameters:
  • layer – str of new/existing layer or annotation layer object
  • annotation – str of annotation event
  • start (optional) – start of annotation
  • end (optional) – end of annotation
  • channels_ids (optional) – list of channel IDs to apply annotation
  • annotation_description (optional) – description of annotation

Example

To add annotation on layer “my-events” across all channels:

ts.insert_annotation('my-events', 'my annotation event')

To add annotation to first channel:

ts.insert_annotation('my-events', 'first channel event', channel_ids=ts.channels[0])
limits()[source]

Returns time limit tuple (start, end) of package.

remove_channels(*channels)[source]

Remove channels from TimeSeries package.

Parameters:channels – list of Channel objects or IDs
segments(start=None, stop=None, gap_factor=2)[source]

Returns list of contiguous data segments available for package. Segments are assesssed for all channels, and the union of segments is returned.

Parameters:
  • start (int, datetime, optional) – Return segments starting after this time (default earliest start of any channel)
  • stop (int, datetime, optional) – Return segments starting before this time (default latest end time of any channel)
  • gap_factor (int, optional) – Gaps are computed by sampling_rate * gap_factor (default 2)
Returns:

List of tuples, where each tuple represents the (start, stop) of contiguous data.

write_annotation_file(file, layer_names=None)[source]

Writes all layers to a csv .bfannot file

Parameters:
  • file – path to .bfannot output file. Appends extension if necessary
  • layer_names (optional) – List of layer names to write
channels

Returns list of Channel objects associated with package.

Note

This is a dynamically generated property, so every call will make an API request.

Suggested usage:

channels = ts.channels
for ch in channels:
    print ch

This will be much slower, as the API request is being made each time.:

for ch in ts.channels:
    print ch
end

The end time (in usecs) of time series data (over all channels)

layers

List of annotation layers attached to TimeSeries package.

start

The start time of time series data (over all channels)

class blackfynn.models.TimeSeriesChannel(name, rate, start=0, end=0, unit='V', channel_type='continuous', source_type='unspecified', group='default', last_annot=0, spike_duration=None, **kwargs)[source]

Bases: blackfynn.models.BaseDataNode

TimeSeriesChannel represents a single source of time series data. (e.g. electrode)

Parameters:
  • name (str) – Name of channel
  • rate (float) – Rate of the channel (Hz)
  • start (optional) – Absolute start time of all data (datetime obj)
  • end (optional) – Absolute end time of all data (datetime obj)
  • unit (str, optional) – Unit of measurement
  • channel_type (str, optional) – One of ‘continuous’ or ‘event’
  • source_type (str, optional) – The source of data, e.g. “EEG”
  • group (str, optional) – The channel group, default: “default”
get_data(start=None, end=None, length=None, use_cache=True)[source]

Get channel data between start and end or start and start + length

Parameters:
  • start (optional) – start time of data (usecs or datetime object)
  • end (optional) – end time of data (usecs or datetime object)
  • length (optional) – length of data to retrieve, e.g. ‘1s’, ‘5s’, ‘10m’, ‘1h’
  • use_cache (optional) – whether to use locally cached data
Returns:

Pandas Series containing requested data for channel.

Note

Data requests will be automatically chunked and combined into a single Pandas Series. However, you must be sure you request only a span of data that will properly fit in memory.

See get_data_iter for an iterator approach to timeseries data retrieval.

Example

Get 5 seconds of data from start over all channels:

data = channel.get_data(length='5s')

Get data betwen 12345 and 56789 (representing usecs since Epoch):

data = channel.get_data(start=12345, end=56789)
get_data_iter(start=None, end=None, length=None, chunk_size=None, use_cache=True)[source]

Returns iterator over the data. Must specify either ``end`` OR ``length``, not both.

Parameters:
  • start (optional) – start time of data (default: earliest time available).
  • end (optional) – end time of data (default: latest time avialable).
  • length (optional) – some time length, e.g. ‘1s’, ‘5m’, ‘1h’ or number of usecs
  • chunk_size (optional) – some time length, e.g. ‘1s’, ‘5m’, ‘1h’ or number of usecs
  • use_cache (optional) – whether to use locally cached data
Returns:

Iterator of Pandas Series, each the size of chunk_size.

segments(start=None, stop=None, gap_factor=2)[source]

Return list of contiguous segments of valid data for channel.

Parameters:
  • start (long, datetime, optional) – Return segments starting after this time (default start of channel)
  • stop (long, datetime, optional) – Return segments starting before this time (default end of channel)
  • gap_factor (int, optional) – Gaps are computed by sampling_period * gap_factor (default 2)
Returns:

List of tuples, where each tuple represents the (start, stop) of contiguous data.

update()[source]

Updates object on the platform (with any local changes) and syncs local instance with API response object.

Exmple:

pkg = bf.get('N:package:1234-1234-1234-1234')
pkg.name = "New name"
pkg.update()
end

The end time (in usecs) of channel data (microseconds since Epoch)

start

The start time of channel data (microseconds since Epoch)

class blackfynn.models.TimeSeriesAnnotation(label, channel_ids, start, end, name='', layer_id=None, time_series_id=None, description=None, **kwargs)[source]

Bases: blackfynn.models.BaseNode

Annotation is an event on one or more channels in a dataset

Parameters:
  • label (str) – The label for the annotation
  • channel_ids – List of channel IDs that annotation applies
  • start – Start time
  • end – End time
  • name – Name of annotation
  • layer_id – Layer ID for annoation (all annotations exist on a layer)
  • time_series_id – TimeSeries package ID
  • description – Description of annotation
class blackfynn.models.TimeSeriesAnnotationLayer(name, time_series_id, description=None, **kwargs)[source]

Bases: blackfynn.models.BaseNode

Annotation layer containing one or more annotations. Layers are used to separate annotations into logically distinct groups when applied to the same data package.

Parameters:
  • name – Name of the layer
  • time_series_id – The TimeSeries ID which the layer applies
  • description – Description of the layer
add_annotations(annotations)[source]

Add annotations to layer.

Parameters:annotations (str) – List of annotation objects to add.
annotation_counts(start, end, period, channels=None)[source]

The number of annotations between start and end over selected channels (all by default).

Parameters:
  • start (datetime or microseconds) – The starting time of the range to query
  • end (datetime or microseconds) – The ending time of the the range to query
  • period (string) – The length of time to group the counts. Formatted as a string - e.g. ‘1s’, ‘5m’, ‘3h’
  • channels ([TimeSeriesChannel]) – List of channel (if omitted, all channels will be used)
annotations(start=None, end=None, channels=None)[source]

Get annotations between start and end over channels (all channels by default).

Parameters:
  • start – Start time
  • end – End time
  • channels – List of channel objects or IDs
delete()[source]

Delete annotation layer.

insert_annotation(annotation, start=None, end=None, channel_ids=None, description=None)[source]

Add annotations; proxy for add_annotations.

Parameters:
  • annotation (str) – Annotation string
  • start – Start time (usecs or datetime)
  • end – End time (usecs or datetime)
  • channel_ids – list of channel IDs
Returns:

The created annotation object.

iter_annotations(window_size=10, channels=None)[source]

Iterate over annotations according to some window size (seconds).

Parameters:
  • window_size (float) – Number of seconds in window
  • channels – List of channel objects or IDs
Yields:

List of annotations found in current window.

Tabular

class blackfynn.models.Tabular(name, **kwargs)[source]

Bases: blackfynn.models.DataPackage

Represents a Tabular package on the platform.

Parameters:name – The name of the package
get_data(limit=1000, offset=0, order_by=None, order_direction='ASC')[source]

Get data from tabular package as DataFrame

Parameters:
  • limit – Max number of rows to return (1000 default)
  • offset – Offset when retrieving rows
  • order_by – Column to order data
  • order_direction – Ascending (‘ASC’) or descending (‘DESC’)
Returns:

Pandas DataFrame

get_data_iter(chunk_size=10000, offset=0, order_by=None, order_direction='ASC')[source]

Iterate over tabular data, each data chunk will be of size chunk_size.