API

This part of the documentation covers all the interfaces of Mara Schema. For parts where the package depends on external libraries, we document the most important right here and provide links to the canonical documentation.

Entities

class mara_schema.entity.Entity(name: str, description: str, schema_name: str, table_name: Optional[str] = None, pk_column_name: Optional[str] = None)
__init__(name: str, description: str, schema_name: str, table_name: Optional[str] = None, pk_column_name: Optional[str] = None)

A business object with attributes and links to other entities, corresponds to a table in the dimensional schema

Args:

name: A short noun phrase that captures the nature of the entity. E.g. “Customer”, “Order item” description: A short text that helps to understand the underlying business process.

E.g. “People who registered through the web site or installed the app”

schema_name: The database schema of the underlying table in the dimensional schema, e.g. “xy_dim” table_name: The name of the underlying table in the dimensional schema, e.g. “order_item”.

Defaults to the lower-cased entity name with spaces replaced by underscores

pk_column_name: The primary key column in the underlying table, defaults to table_name + ‘_id’

add_attribute(name: str, description: str, column_name: Optional[str] = None, type: Optional[mara_schema.attribute.Type] = None, high_cardinality: bool = False, personal_data: bool = False, important_field: bool = False, accessible_via_entity_link: bool = True, more_url: Optional[str] = None) None

Adds a property based on a column in the underlying dimensional table to the entity

Args:

name: How the attribute is displayed in front-ends, e.g. “Order date” description: A meaningful business definition of the attribute. E.g. “The date when the order was placed” column_name: The name of the column in the underlying database table.

Defaults to the lower-cased name with white-spaced replaced by underscores.

type: The type of the attribute, see definition of Type enum high_cardinality: It refers to columns with values that are very uncommon or unique. Defaults to False. personal_data: It refers to person related data, e.g. “Email address”, “Name”. important_field: A field that highlights the the data set. Shown by default in overviews accessible_via_entity_link: If False, then this attribute is excluded from data sets that are not

based on this entity.

more_url: URL (as string) which should be appended as a more… link in the UI.

connected_entities() ['Entity']

Find all recursively linked entities.

find_attribute(attribute_name: str) mara_schema.attribute.Attribute

Find an attribute by its name

Find an EntityLink by its target entity name or prefix.

Adds a link from the entity to another entity, corresponds to a foreign key relationship

Args:

target_entity: The referenced entity, e.g. an “Order” entity fk_column: The foreign key column in the source entity, e.g. “first_order_fk” in the “customer” table prefix: Attributes from the linked entity will be prefixed with this, e.g “First order”.

Defaults to the name of the linked entity.

description: A short explanation for the relation between the entity and target entity

remove_attribute(name: str) None

Removes a property based on a column in the underlying dimensional table from the entity

Args:

name: How the attribute is displayed in front-ends, e.g. “Order date”

__init__(target_entity: mara_schema.entity.Entity, prefix: str, description: Optional[str] = None, fk_column: Optional[str] = None) None

A link from an entity to another entity, corresponds to a foreign key relationship

Args:

target_entity: The referenced entity, e.g. an “Order” entity prefix: Attributes from the linked entity will be prefixed with this, e.g “First order”. description: A short explanation for the relation between the entity and target entity fk_column: The foreign key column in the source entity, e.g. “first_order_fk” in the “customer” table

Attributes

class mara_schema.attribute.Attribute(name: str, description: str, column_name: str, accessible_via_entity_link: bool, type: Optional[mara_schema.attribute.Type] = None, high_cardinality: bool = False, personal_data: bool = False, important_field: bool = False, more_url: Optional[str] = None)

A property of an entity, corresponds to a column in the underlying dimensional table

__init__(name: str, description: str, column_name: str, accessible_via_entity_link: bool, type: Optional[mara_schema.attribute.Type] = None, high_cardinality: bool = False, personal_data: bool = False, important_field: bool = False, more_url: Optional[str] = None) None

See documentation of function Entity.add_attribute

prefixed_name(path: Tuple[EntityLink] = None) str

Generate a meaningful business name by concatenating the prefix of entity link instances and original name of attribute.

class mara_schema.attribute.Type(cls, bases, classdict, **kwds)
Attribute types that need special treatment in artifact creation

Type.ID: A numeric ID that is converted to text in a flattened table so that it can be filtered Type.DATE: Date attribute as a foreign_key to a date dimension Type.DURATION: Duration attribute as a foreign_key to a duration dimension Type.ENUM: Attributes that is converted to text in a flattened table. Type.ARRAY: Attribute of type array

mara_schema.attribute.normalize_name(name: str, max_length: int = 63) str

Makes “Foo bar baz” out of “foo bar bar baz” Args:

name: the name to normalize max_length: optionally limit length by replacing too long part with a hash of the name

Data sets

class mara_schema.data_set.DataSet(entity: mara_schema.entity.Entity, name: str)
__init__(entity: mara_schema.entity.Entity, name: str)

An entity with its metrics and recursively linked entities.

Args:

entity: The underlying entity with its attributes and linked other entities name: The name of the data set.

add_composed_metric(name: str, description: str, formula: str, important_field: bool = False, number_format: mara_schema.metric.NumberFormat = 'Standard', more_url: Optional[str] = None)

Add a metric that is based on a list of simple metrics.

Args:

name: How the metric is displayed in front-ends, e.g. “Revenue after cancellations” description: A meaningful business definition of the metric formula: How to compute the metric. Examples: [Metric A] + [Metric B], [Metric A] / ([Metric B] + [Metric C]) important_field: It refers to key business metrics. number_format: The way to format a string. Defaults to NumberFormat.STANDARD. more_url: URL (as string) which should be appended as a more… link in the UI.

add_simple_metric(name: str, description: str, column_name: str, aggregation: mara_schema.metric.Aggregation, important_field: bool = False, number_format: mara_schema.metric.NumberFormat = 'Standard', more_url: Optional[str] = None)

Add a metric that is computed as a direct aggregation on a entity table column

Args:

name: How the metric is displayed in front-ends, e.g. “Revenue after cancellations” description: A meaningful business definition of the metric column_name: The column that the aggregation is based on aggregation: The aggregation method to use important_field: It refers to key business metrics. number_format: The way to format a string. Defaults to NumberFormat.STANDARD. more_url: URL (as string) which should be appended as a more… link in the UI.

connected_attributes(include_personal_data: bool = True) {(<class 'mara_schema.entity.EntityLink'>,): {<class 'str'>: <class 'mara_schema.attribute.Attribute'>}}

Returns all attributes with their prefixed name from all connected entities.

Args:

include_personal_data: If False, then exclude fields that are marked as personal data

Returns:

A dictionary with the paths as keys and dictionaries of prefixed attribute names and attributes as values. Example:

{(<EntityLink 1>, <EntityLink 2): {‘Prefixed attribute 1 name’: <Attribute 1>,

‘Prefixed attribute 2 name’: <Attribute 2>},

..}

exclude_attributes(path: mara_schema.data_set._PathSpec, attribute_names: [<class 'str'>] = None)

Exclude attributes of a connected entity in generated data set tables.

Args:
path: How to get to the entity from the data set entity.

A list of either strings (target entity names) or tuples of strings (target entity name + prefix). Example: [‘Entity 1’, (‘Entity 2’, ‘Prefix’), ‘Entity 3’]

attribute_names: A list of name of attributes to be excluded. If not provided, then exclude all attributes

exclude_path(path: mara_schema.data_set._PathSpec)

Exclude a connected entity from generated data set tables by specifying the entity links to that entity

Args:
path: How to get to the entity from the data set entity.

A list of either strings (target entity names) or tuples of strings (target entity name + prefix). Example: [‘Entity 1’, (‘Entity 2’, ‘Prefix’), ‘Entity 3’]

id()

Returns a representation that can be used in urls

include_attributes(path: mara_schema.data_set._PathSpec, attribute_names: [<class 'str'>])

Exclude all attributes except the explicitly included ones of a connected entity in generated data set tables.

Args:
path: How to get to the entity from the data set entity.

A list of either strings (target entity names) or tuples of strings (target entity name + prefix). Example: [‘Entity 1’, (‘Entity 2’, ‘Prefix’), ‘Entity 3’]

attribute_names: A list of name of attributes to be included.

paths_to_connected_entities() [(<class 'mara_schema.entity.EntityLink'>,)]

Get all possible paths to connected entities (tuples of entity links) - that are not explicitly excluded - that are are not beyond the max link depth or that are explicitly included

Metrics

class mara_schema.metric.Aggregation(cls, bases, classdict, **kwds)

Aggregation methods for metrics

class mara_schema.metric.NumberFormat(cls, bases, classdict, **kwds)

How to format values

class mara_schema.metric.SimpleMetric(name: str, description: str, data_set: DataSet, column_name: str, aggregation: mara_schema.metric.Aggregation, important_field: bool = False, number_format: mara_schema.metric.NumberFormat = 'Standard', more_url: Optional[str] = None)
__init__(name: str, description: str, data_set: DataSet, column_name: str, aggregation: mara_schema.metric.Aggregation, important_field: bool = False, number_format: mara_schema.metric.NumberFormat = 'Standard', more_url: Optional[str] = None)

A metric that is computed as a direct aggregation on a entity table column Args:

name: How the metric is displayed in front-ends, e.g. “Revenue after cancellations” description: A meaningful business definition of the metric data_set: The data set that contains the metric column_name: The column that the aggregation is based on aggregation: The aggregation method to use important_field: It refers to key business metrics. number_format: The way to format a string. Defaults to NumberFormat.STANDARD.

display_formula() str

Returns a documentation string for displaying the formula in the frontend

class mara_schema.metric.ComposedMetric(name: str, description: str, data_set: DataSet, parent_metrics: [<class 'mara_schema.metric.Metric'>], formula_template: str, important_field: bool = False, number_format: mara_schema.metric.NumberFormat = 'Standard', more_url: typing.Optional[str] = None)
__init__(name: str, description: str, data_set: DataSet, parent_metrics: [<class 'mara_schema.metric.Metric'>], formula_template: str, important_field: bool = False, number_format: mara_schema.metric.NumberFormat = 'Standard', more_url: typing.Optional[str] = None) None

A metric that is based on a list of simple metrics. Args:

name: How the metric is displayed in front-ends, e.g. “Revenue after cancellations” description: A meaningful business definition of the metric data_set: The data set that contains the metric parent_metrics: The parent metrics that this metric is composed of formula_template: How to compose the parent metrics, with ‘{}’ as placeholders

Examples: ‘{} + {}’, ‘{} / ({} + {})’

important_field: It refers to key business metrics. number_format: The way to format a string. Defaults to NumberFormat.STANDARD.

display_formula() str

Returns a documentation string for displaying the formula in the frontend

SQL Generation

mara_schema.sql_generation.data_set_sql_query(data_set: mara_schema.data_set.DataSet, human_readable_columns=True, pre_computed_metrics=True, star_schema: bool = False, star_schema_transitive_fks: bool = True, personal_data=True, high_cardinality_attributes=True, engine: Optional[sqlalchemy.engine.base.Engine] = None) str

Returns a SQL select statement that flattens all linked entities of a data set into a wide table

Args:

data_set: the data set to flatten human_readable_columns: Whether to use “Customer name” rather than “customer_name” as column name pre_computed_metrics: Whether to pre-compute composed metrics, counts and distinct counts on row level star_schema: Whether to add foreign keys to the tables of linked entities rather than including their attributes. star_schema_transitive_fks: Whether to include all attributes of all transitively linked entities. When False,

only their respective foreign keys are included. Defaults to True. Example for star_schema_transitive_fks = False:

SELECT order.id

order.date order.price

customer.customer_fk

store.store_fk

FROM order

LEFT JOIN customer LEFT JOIN store

personal_data: Whether to include attributes that are marked as personal dataTrue high_cardinality_attributes: Whether to include attributes that are marked to have a high cardinality engine: A sqlalchemy engine that is used to quote database identifiers. Defaults to a PostgreSQL engine.

Returns:

A string containing the select statement