wiki:S3/S3Hierarchy

Version 12 (modified by Dominic König, 10 years ago) ( diff )

--

S3Hierarchy

Purpose

The S3Hierarchy toolkit can be used to perform lookups in hierarchical taxonomies. It analyses the parent-relationships of the records, and provides methods to access and search through the parent-, child- and sibling-axes of each record.

Example

Facility type as an example for a hierarchical taxonomy:

  • Arts and Recreation
    • Recreation Centers
  • Community Groups
    • Volunteer Opportunities
  • Education
    • Adult Education
    • Guidance and Tutoring Programs
  • Health and Mental Health
    • Dental Care
    • Health Centers
    • Health Clinics
    • Health Screening and Testing
    • Hospitals and Medical Centers
    • Mental Health Counseling
    • Mental Health Programs
    • Public Health Programs
    • Substance Abuse Programs
  • Social Services
    • Children and Family Services
    • Public Information Services
    • Senior Services
    • Support Groups

Data Model

To store a hierarchical taxonomy, the database table must include a parent reference (self-reference). This can either be a foreign key to the table, or - if the table is a super-entity instance - a (second) reference to the super-entity.

The diagrams explain the two models:

A typical data model for a hierarchical taxonomy (with simple self-reference) could look like:

        tablename = "org_facility_type"
        define_table(tablename,
                     Field("name",
                           ),
                     # The parent-field for the hierarchy:
                     Field("parent", "reference org_facility_type",
                           ),
                     s3_comments(),
                     *s3_meta_fields())

        configure(tablename, hierarchy = "parent")

For a super-entity self-reference, the parent-field is defined as reference to the super entity:

        tablename = "vulnerability_indicator"
        define_table(tablename,
                     # Instance table of stats_parameter
                     super_link("parameter_id", "stats_parameter"),
                     # The parent-field for the hierarchy:
                     Field("parent", "reference stats_parameter",
                           ),
                     Field("name",
                           ),
                     s3_comments(),
                     *s3_meta_fields())

        configure(tablename, hierarchy = "parent")

Configuration

The hierarchy is configured in the model as the field name of the parent reference:

    self.configure(tablename, hierarchy="parent")

If categories (e.g. level) are to be used, the hierarchy is configured as tuple of parent reference and category field:

    self.configure(tablename, hierarchy=("parent", "level"))

Subset Definition

A subset is an S3Hierarchy instance. With the tablename as only parameter for the constructor, the subset would include all records in the hierarchical table (...which are accessible for the user):

subset = S3Hierarchy("hierarchical_type_table")

To filter the records, a filter query can be specified as keyword parameter:

query = (FS("filter_field") == 5)
subset = S3Hierarchy("hierarchical_type_table, filter=query)

Note that the hierarchy is only loaded from the database when a lookup is performed (lazy instantiation). Also, the hierarchy will not be loaded again until the end of the request (unless it is marked as "dirty" during the request) - regardless how many subsets are created or lookups performed.

Performing Lookups

To perform lookups, you first have to define a subset.

All lookup attributes or methods of the subset use node IDs. The node IDs are either the record IDs (for simple self-reference) or the super-IDs (for super-entity self-reference) of the records in the subset.

All lookup attributes and methods return either a single node ID (long), or a set of node IDs. The only exception is path() which returns an ordered list of node IDs.

Root Nodes

To get all root nodes of the subset, use:

# Returns a set of node IDs
root_nodes = subset.roots

To get the root node for a particular node, use:

# Returns the root node ID for node_id (or node_id if it is a root node itself)
root = subset.root(node_id)

Child Nodes

To get all child nodes of a node, use:

# Returns the first generation of child nodes for node_id
children = subset.children(node_id)

To get all descendants of a node, use:

# Returns all descendant nodes (any generation) for node_id
children = subset.findall(node_id)

It is possible to use findall to get a union set of descendants for multiple parent nodes:

# Returns all descendants in all specified nodes
children = subset.findall((node_id_1, node_id_2, node_id_3))

Parent Nodes

To get the parent node ID for a node, use:

# Returns the parent node ID for node_id (or None if node_id is a root node)
parent = subset.parent(node_id)

Sibling Nodes

To get all sibling node IDs for a node, use:

# Returns all sibling node IDs for node_id
siblings = subset.siblings(node_id)

This does not normally return node_id itself, unless you specify inclusive=True:

# Returns all sibling node IDs for node_id - including node_id itself
siblings = subset.siblings(node_id, inclusive=True)

Path

The path of a node is an ordered list of all generations of parent node IDs from the root node down to the node itself. It can be requested by:

# Returns the path of a node (root node first) as ordered list
path = subset.path(node_id)

Using categories

Categories can be used to classify nodes "horizontally", e.g. to indicate a hierarchy "level". To use them with the hierarchy toolkit, an additional category-field must be defined in the hierarchy configuration:

    self.configure(tablename, hierarchy=(parent_field, category_field))

Categories are neither managed nor inferred by the hierarchy toolkit, but they can be used to filter the lookup axis.

Filtering the Lookup Axis

This is useful e.g. to find all descendants of a node of a specific category:

# Define a hierarchy of locations with "parent" as parent-reference and "level" as category
subset = S3Hierarchy("gis_location", hierarchy=("parent", "level"))

# Lookup all descendants of location #3 with category "L3"
communes = subset.findall(3, category="L3", inclusive=True)

When performing a root lookup, we may be interested in the closest parent of a particular category rather than the absolute root:

# Lookup the closest "L1" parent of location #454
state = subset.root(454, category="L1")

This does also work with path lookups:

# Lookup the path of location #378 down from the closest "L1" parent
path = subset.path(378, category="L1")

The category parameter can be used analogously with the children() and siblings() methods.

Looking up the Category of a Node

To lookup the category of a node, use:

# Returns the category for node_id (e.g. "L1")
category = subset.category(node_id)

To get the category of each node in the result of the parent(), root(), path(), children(), findall(), or siblings() methods, use the classify-flag like:

# Returns the children of location #328 as set of tuples like: set([(367, "L3"), (368, "L3")])
children = subset.children(328, classify=True)

Attachments (1)

Download all attachments as: .zip

Note: See TracWiki for help on using the wiki.