API¶
This part of the documentation covers all the interfaces of Mara Schema. For parts where the package depends on external libraries, we document the most important right here and provide links to the canonical documentation.
Entities¶
- class mara_schema.entity.Entity(name: str, description: str, schema_name: str, table_name: Optional[str] = None, pk_column_name: Optional[str] = None)¶
- __init__(name: str, description: str, schema_name: str, table_name: Optional[str] = None, pk_column_name: Optional[str] = None)¶
A business object with attributes and links to other entities, corresponds to a table in the dimensional schema
- Args:
name: A short noun phrase that captures the nature of the entity. E.g. “Customer”, “Order item” description: A short text that helps to understand the underlying business process.
E.g. “People who registered through the web site or installed the app”
schema_name: The database schema of the underlying table in the dimensional schema, e.g. “xy_dim” table_name: The name of the underlying table in the dimensional schema, e.g. “order_item”.
Defaults to the lower-cased entity name with spaces replaced by underscores
pk_column_name: The primary key column in the underlying table, defaults to table_name + ‘_id’
- add_attribute(name: str, description: str, column_name: Optional[str] = None, type: Optional[mara_schema.attribute.Type] = None, high_cardinality: bool = False, personal_data: bool = False, important_field: bool = False, accessible_via_entity_link: bool = True, more_url: Optional[str] = None) None ¶
Adds a property based on a column in the underlying dimensional table to the entity
- Args:
name: How the attribute is displayed in front-ends, e.g. “Order date” description: A meaningful business definition of the attribute. E.g. “The date when the order was placed” column_name: The name of the column in the underlying database table.
Defaults to the lower-cased name with white-spaced replaced by underscores.
type: The type of the attribute, see definition of Type enum high_cardinality: It refers to columns with values that are very uncommon or unique. Defaults to False. personal_data: It refers to person related data, e.g. “Email address”, “Name”. important_field: A field that highlights the the data set. Shown by default in overviews accessible_via_entity_link: If False, then this attribute is excluded from data sets that are not
based on this entity.
more_url: URL (as string) which should be appended as a more… link in the UI.
- connected_entities() ['Entity'] ¶
Find all recursively linked entities.
- find_attribute(attribute_name: str) mara_schema.attribute.Attribute ¶
Find an attribute by its name
- find_entity_link(target_entity_name: str, prefix: Optional[str] = None) mara_schema.entity.EntityLink ¶
Find an EntityLink by its target entity name or prefix.
- link_entity(target_entity: mara_schema.entity.Entity, fk_column: Optional[str] = None, prefix: Optional[str] = None, description=None) None ¶
Adds a link from the entity to another entity, corresponds to a foreign key relationship
- Args:
target_entity: The referenced entity, e.g. an “Order” entity fk_column: The foreign key column in the source entity, e.g. “first_order_fk” in the “customer” table prefix: Attributes from the linked entity will be prefixed with this, e.g “First order”.
Defaults to the name of the linked entity.
description: A short explanation for the relation between the entity and target entity
- remove_attribute(name: str) None ¶
Removes a property based on a column in the underlying dimensional table from the entity
- Args:
name: How the attribute is displayed in front-ends, e.g. “Order date”
- class mara_schema.entity.EntityLink(target_entity: mara_schema.entity.Entity, prefix: str, description: Optional[str] = None, fk_column: Optional[str] = None)¶
- __init__(target_entity: mara_schema.entity.Entity, prefix: str, description: Optional[str] = None, fk_column: Optional[str] = None) None ¶
A link from an entity to another entity, corresponds to a foreign key relationship
- Args:
target_entity: The referenced entity, e.g. an “Order” entity prefix: Attributes from the linked entity will be prefixed with this, e.g “First order”. description: A short explanation for the relation between the entity and target entity fk_column: The foreign key column in the source entity, e.g. “first_order_fk” in the “customer” table
Attributes¶
- class mara_schema.attribute.Attribute(name: str, description: str, column_name: str, accessible_via_entity_link: bool, type: Optional[mara_schema.attribute.Type] = None, high_cardinality: bool = False, personal_data: bool = False, important_field: bool = False, more_url: Optional[str] = None)¶
A property of an entity, corresponds to a column in the underlying dimensional table
- __init__(name: str, description: str, column_name: str, accessible_via_entity_link: bool, type: Optional[mara_schema.attribute.Type] = None, high_cardinality: bool = False, personal_data: bool = False, important_field: bool = False, more_url: Optional[str] = None) None ¶
See documentation of function Entity.add_attribute
- prefixed_name(path: Tuple[EntityLink] = None) str ¶
Generate a meaningful business name by concatenating the prefix of entity link instances and original name of attribute.
- class mara_schema.attribute.Type(cls, bases, classdict, **kwds)¶
- Attribute types that need special treatment in artifact creation
Type.ID: A numeric ID that is converted to text in a flattened table so that it can be filtered Type.DATE: Date attribute as a foreign_key to a date dimension Type.DURATION: Duration attribute as a foreign_key to a duration dimension Type.ENUM: Attributes that is converted to text in a flattened table. Type.ARRAY: Attribute of type array
- mara_schema.attribute.normalize_name(name: str, max_length: int = 63) str ¶
Makes “Foo bar baz” out of “foo bar bar baz” Args:
name: the name to normalize max_length: optionally limit length by replacing too long part with a hash of the name
Data sets¶
- class mara_schema.data_set.DataSet(entity: mara_schema.entity.Entity, name: str)¶
- __init__(entity: mara_schema.entity.Entity, name: str)¶
An entity with its metrics and recursively linked entities.
- Args:
entity: The underlying entity with its attributes and linked other entities name: The name of the data set.
- add_composed_metric(name: str, description: str, formula: str, important_field: bool = False, number_format: mara_schema.metric.NumberFormat = 'Standard', more_url: Optional[str] = None)¶
Add a metric that is based on a list of simple metrics.
- Args:
name: How the metric is displayed in front-ends, e.g. “Revenue after cancellations” description: A meaningful business definition of the metric formula: How to compute the metric. Examples: [Metric A] + [Metric B], [Metric A] / ([Metric B] + [Metric C]) important_field: It refers to key business metrics. number_format: The way to format a string. Defaults to NumberFormat.STANDARD. more_url: URL (as string) which should be appended as a more… link in the UI.
- add_simple_metric(name: str, description: str, column_name: str, aggregation: mara_schema.metric.Aggregation, important_field: bool = False, number_format: mara_schema.metric.NumberFormat = 'Standard', more_url: Optional[str] = None)¶
Add a metric that is computed as a direct aggregation on a entity table column
- Args:
name: How the metric is displayed in front-ends, e.g. “Revenue after cancellations” description: A meaningful business definition of the metric column_name: The column that the aggregation is based on aggregation: The aggregation method to use important_field: It refers to key business metrics. number_format: The way to format a string. Defaults to NumberFormat.STANDARD. more_url: URL (as string) which should be appended as a more… link in the UI.
- connected_attributes(include_personal_data: bool = True) {(<class 'mara_schema.entity.EntityLink'>,): {<class 'str'>: <class 'mara_schema.attribute.Attribute'>}} ¶
Returns all attributes with their prefixed name from all connected entities.
- Args:
include_personal_data: If False, then exclude fields that are marked as personal data
- Returns:
A dictionary with the paths as keys and dictionaries of prefixed attribute names and attributes as values. Example:
- {(<EntityLink 1>, <EntityLink 2): {‘Prefixed attribute 1 name’: <Attribute 1>,
‘Prefixed attribute 2 name’: <Attribute 2>},
..}
- exclude_attributes(path: mara_schema.data_set._PathSpec, attribute_names: [<class 'str'>] = None)¶
Exclude attributes of a connected entity in generated data set tables.
- Args:
- path: How to get to the entity from the data set entity.
A list of either strings (target entity names) or tuples of strings (target entity name + prefix). Example: [‘Entity 1’, (‘Entity 2’, ‘Prefix’), ‘Entity 3’]
attribute_names: A list of name of attributes to be excluded. If not provided, then exclude all attributes
- exclude_path(path: mara_schema.data_set._PathSpec)¶
Exclude a connected entity from generated data set tables by specifying the entity links to that entity
- Args:
- path: How to get to the entity from the data set entity.
A list of either strings (target entity names) or tuples of strings (target entity name + prefix). Example: [‘Entity 1’, (‘Entity 2’, ‘Prefix’), ‘Entity 3’]
- id()¶
Returns a representation that can be used in urls
- include_attributes(path: mara_schema.data_set._PathSpec, attribute_names: [<class 'str'>])¶
Exclude all attributes except the explicitly included ones of a connected entity in generated data set tables.
- Args:
- path: How to get to the entity from the data set entity.
A list of either strings (target entity names) or tuples of strings (target entity name + prefix). Example: [‘Entity 1’, (‘Entity 2’, ‘Prefix’), ‘Entity 3’]
attribute_names: A list of name of attributes to be included.
- paths_to_connected_entities() [(<class 'mara_schema.entity.EntityLink'>,)] ¶
Get all possible paths to connected entities (tuples of entity links) - that are not explicitly excluded - that are are not beyond the max link depth or that are explicitly included
Metrics¶
- class mara_schema.metric.Aggregation(cls, bases, classdict, **kwds)¶
Aggregation methods for metrics
- class mara_schema.metric.NumberFormat(cls, bases, classdict, **kwds)¶
How to format values
- class mara_schema.metric.SimpleMetric(name: str, description: str, data_set: DataSet, column_name: str, aggregation: mara_schema.metric.Aggregation, important_field: bool = False, number_format: mara_schema.metric.NumberFormat = 'Standard', more_url: Optional[str] = None)¶
- __init__(name: str, description: str, data_set: DataSet, column_name: str, aggregation: mara_schema.metric.Aggregation, important_field: bool = False, number_format: mara_schema.metric.NumberFormat = 'Standard', more_url: Optional[str] = None)¶
A metric that is computed as a direct aggregation on a entity table column Args:
name: How the metric is displayed in front-ends, e.g. “Revenue after cancellations” description: A meaningful business definition of the metric data_set: The data set that contains the metric column_name: The column that the aggregation is based on aggregation: The aggregation method to use important_field: It refers to key business metrics. number_format: The way to format a string. Defaults to NumberFormat.STANDARD.
- display_formula() str ¶
Returns a documentation string for displaying the formula in the frontend
- class mara_schema.metric.ComposedMetric(name: str, description: str, data_set: DataSet, parent_metrics: [<class 'mara_schema.metric.Metric'>], formula_template: str, important_field: bool = False, number_format: mara_schema.metric.NumberFormat = 'Standard', more_url: typing.Optional[str] = None)¶
- __init__(name: str, description: str, data_set: DataSet, parent_metrics: [<class 'mara_schema.metric.Metric'>], formula_template: str, important_field: bool = False, number_format: mara_schema.metric.NumberFormat = 'Standard', more_url: typing.Optional[str] = None) None ¶
A metric that is based on a list of simple metrics. Args:
name: How the metric is displayed in front-ends, e.g. “Revenue after cancellations” description: A meaningful business definition of the metric data_set: The data set that contains the metric parent_metrics: The parent metrics that this metric is composed of formula_template: How to compose the parent metrics, with ‘{}’ as placeholders
Examples: ‘{} + {}’, ‘{} / ({} + {})’
important_field: It refers to key business metrics. number_format: The way to format a string. Defaults to NumberFormat.STANDARD.
- display_formula() str ¶
Returns a documentation string for displaying the formula in the frontend
SQL Generation¶
- mara_schema.sql_generation.data_set_sql_query(data_set: mara_schema.data_set.DataSet, human_readable_columns=True, pre_computed_metrics=True, star_schema: bool = False, star_schema_transitive_fks: bool = True, personal_data=True, high_cardinality_attributes=True, engine: Optional[sqlalchemy.engine.base.Engine] = None) str ¶
Returns a SQL select statement that flattens all linked entities of a data set into a wide table
- Args:
data_set: the data set to flatten human_readable_columns: Whether to use “Customer name” rather than “customer_name” as column name pre_computed_metrics: Whether to pre-compute composed metrics, counts and distinct counts on row level star_schema: Whether to add foreign keys to the tables of linked entities rather than including their attributes. star_schema_transitive_fks: Whether to include all attributes of all transitively linked entities. When False,
only their respective foreign keys are included. Defaults to True. Example for star_schema_transitive_fks = False:
- SELECT order.id
order.date order.price
customer.customer_fk
store.store_fk
- FROM order
LEFT JOIN customer LEFT JOIN store
personal_data: Whether to include attributes that are marked as personal dataTrue high_cardinality_attributes: Whether to include attributes that are marked to have a high cardinality engine: A sqlalchemy engine that is used to quote database identifiers. Defaults to a PostgreSQL engine.
- Returns:
A string containing the select statement