wiki:BluePrintLazyRepresentation

Version 1 (modified by Dominic König, 12 years ago) ( diff )

--

BluePrint: Lazy Representation

Introduction

For many output formats, data need to represented as strings (or even HTML elements). In the Table definition, you can define a representation method for every Field:

    Field("xy", "reference my_table",
          ...
          represent = my_representation_function,
          ...
          ),

These representation functions receive the field value as parameter, and can perform additional lookups to render this value as string (or HTML).

The problem is that, where many records are to be rendered, these functions are called many times and can therefore become a major performance bottleneck - especially if they perform additional database lookups. For the export of 50 records from a table that contains 5 fields with additional representation lookups, you would need 1 query to retrieve the 50 records, and then 250 additional queries to render them in the output format.

In an XML export, where you have 1000 records, this would mean 1 query to retrieve the records - and 5000 additional queries to render the field representations.

Description

To overcome the bottleneck arising from representation lookup, output formatters need to be able to perform representation lookups in bulk.

To achieve this, the output formatter would collect all values for a field from all records in the output, and then call a special bulk-representation function which performs optimized DB lookups to render all values in as few queries as possible (ideally, at most one single query). That means, bulk representation functions reduce the number of DB queries in output formatting from 1 query per field and record to 1 query per field.

Additionally, the bulk representation method should be lazy, i.e. only perform DB lookups when absolutely necessary and strictly avoid repeated DB lookups for the same value (within the same request).

Use-Cases

The two most prominent use-cases are:

  • data tables
  • XML exports

Data tables typically render only a limited number of records (with server-side pagination). However, even with only 50 records per page, the field representation can turn into a major bottleneck.

XML Exports are an even bigger problem as they are usually not paginated and thus can contain thousands of records (=tens of thousands of field representations).

Requirements

1) Bulk representation functions must be available (configurable) per Field. 2) They should not be separate from single-value representations but use the same lazy lookup mechanism. 3) Ideally, bulk representations do not introduce a new hook, but utilize the existing Field.represent hook. 4) Ideally, we need only a few individual representation functions - most representations follow the same pattern anyway 5) Standard representation of foreign keys would fall back to the name field in the referenced record 6) Bulk representation functions should create only minimum overheads during model loading

Design

The Field.represent hook can be set to a callable class instance:

class MyRepresentation(object):

    def __call__(self, value, row=None):
        # represent-code goes here
        ...
        return represent_str

...

    Field("xy", "reference my_table",
          ...
          represent = MyRepresentation(),
          ...
         )

Besides the call() method, this class would define a bulk() method like:

class MyRepresentation(object):

    def __call__(self, value, row=None):
        # represent-code goes here
        ...
        return represent_str

    def bulk(self, values, rows=None):
        # represent-code goes here
        ...
        return {values[0]:represent_str[0],
                values[1]:represent_str[1],
                ...
               }

The bulk()-method would perform optimized DB lookups for the list of values it receives, and return a dict of {value:representation}.

Output formatters (such as S3Resource.extract and S3Resource.export_tree) would then check whether the bulk()-method is available and use it instead of the single-value representation.

Implementation

Having to define an individual bulk representation class for each and every Field seems though a little too much effort, so this calls for a base-class that already covers the standard case:

    Field("xy", "reference my_table",
          ...
          represent = S3Represent(lookup="my_table")
          ...
         )

S3Represent is defined in s3fields.py, and is therefore available in every model (controllers need to use s3base.S3Represent).

The base class takes the following configuration parameters:

ParameterDefaultDesriptionComments
lookupNoneName of the referenced tablefor foreign keys
key"id"Name of the primary key in the referenced tablefor foreign keys
fieldsnameFields to lookup from the referenced tablefor foreign keys
labels"%(name)s"String template to render the representationcan also be a callable receiving the Row
optionsNonea dict with field optionsfor option lists, overrides lookup
translateFalsetranslate each label using T()for foreign keys
linktoNoneURL to link the label to, with [id] as placeholder for the foreign keyfor foreign keys, renders each label as A()
multipleFalseindicate that this is a list: typevalues are expected to always be lists
defaultcurrent.messages.UNKNOWN_OPTthe default for unresolvable keys
nonecurrent.messages.NONEthe default for None-values (or empty lists for list: types

S3Represent can be subclassed to meet specific requirements. Usually, the subclass would overwrite some of these methods:

    def lookup_rows(self, key, values, fields=[]):
        """
            Lookup all rows referenced by values (in foreign key representations).

            @param key: the key Field
            @param values: the values
            @param fields: the fields to retrieve
        """

This method should be overwritten in case additional fields and/or joins are required for the represent_row function.

  • For testing/benchmarking, lookup_rows() should increment self.queries for each query performed.
    def represent_row(self, row):
        """
            Represent the referenced row (in foreign key representations).

            @param row: the row
        """

This function receives each row retrieved by lookup_rows() and should return the string representation. It should not perform any additional DB lookups. It should return a lazyT if self.translate is True.

    def link(self, k, v):
        """
            Represent a (key, value) as hypertext link.

                - Typically, k is a foreign key value, and v the representation of the
                  referenced record, and the link shall open a read view of the referenced
                  record.

                - In the base class, the linkto-parameter expects a URL (as string) with "[id]"
                  as placeholder for the key.

            @param k: the key
            @param v: the representation of the key
        """

This function can be overwritten to implement specific link construction mechanisms. It should not perform any additional DB lookups.


BluePrint

Note: See TracWiki for help on using the wiki.