Objects, types and function overloading

From sygnm
Jump to: navigation, search

Data types and functions

Data types and functions in the sygnm framework represent mathematical objects and the computations done on them. They are implemented as C++ classes which are loaded at runtime. These classes are used to build expression trees to represent mathematical expressions (see Expression trees and evaluation). Every instance of a data type class represents a value of the data type, while all instances of a function class represent the same function (and it is not necessary to instantiate function classes to call the encapsulated function, but it is useful when building expression trees).

Besides function and data type objects there is a third type of sygnm object called a dynamic node. A dynamic node is used to represent objects of which only partial information is available (e.g. we want to represent a function knowing its name and parameter list but not its full ID or return value). A typical use of them is when creating expression trees from user input, since most of the time the user only specifies the name of the function to be called and its arguments but not the full ID; then at evaluation time this dynamic node is resolved into a function which best matches the known information (see Expression trees and evaluation). Another use of dynamic nodes is to represent other sygnm objects without instantiating them. This happens when specifying type parameters: we are not interested in instantiating a concrete value of the parameter type, only its type signature matters, so it is sufficient to pass a dynamic node (containing the signature) as type parameter.

All sygnm nodes possess the following attributes:

  • Identifier (unique, has three parts: package name, name, variant name)
  • Aliases (alternative names)
  • Signature (contains the unique ID, parameter list, type parameter list and return value)
  • Memory size function (estimates memory consumption)
  • Cloning function (creates an identical copy of the current node, including all child nodes)
  • Hash function
  • Equality function
  • Parent node (refers to the parent of the current node in an expression tree, or null if there is no parent)
  • Child nodes (children of the current node in an expression tree)

Additionally data types have the following attributes:

  • Type parameters (type parameters are also sygnm nodes)

Additionally functions have the following attributes:

  • Difficulty score function (how difficult is the evaluation of this function with the current parameters?)
  • Memoization table (stores inputs and results of previous function evaluations for fast lookup, only if enabled)

It is possible to have multiple functions with the same name (overloading) and this feature is used heavily in all packages. If only the name of a function is known, the sygnm framework selects the best matching implementation for the current parameter list at runtime (the signatures of the candidate functions are compared to the available information and the best match is returned). See the following sections for details.

Signatures and overload resolution

All functions and data types have a signature which can be thought of as the type of that object in the sygnm type system. The parts of a signature are the "base", a "return value" (only functions), a "parameter list" (only functions) and a "type parameter list" (only data types). All of these are typeclass set lists. A typeclass set list is a list of typeclass set objects. A typeclass set is a set of typeclasses with some additional attributes. A typeclass is an abstract interface specification.

In the following sections all these objects will be described starting with typeclasses and moving "upwards".

Typeclasses and typeclass implementations

Typeclasses are used to implement parametric polymorphism in the sygnm framework. A typeclass is essentially a set of incomplete function signatures with a name. The signatures can contain typeclass parameters. Typeclass implementations are concrete implementations of a typeclass interface for a type or a set of types.

The following is the declaration of a typeclass (Equality) and a typeclass implementation for the string type in pseudocode.

/* Declare the typeclass */
typeclass: Equality

/* The typeclass interface, now a single function called equal,
   with two function parameters ('a' and 'b').
   The type of both 'a' and 'b' is the first (only) typeclass parameter,
   here represented by '[0]'. */
base: equal
parameters: a: [0], b: [0] 

/* Declare an implementation of the Equality typeclass, where the first (and only) typeclass parameter is string. */
/* Now the string type is said to belong to the Equality typeclass. */
typeclass implementation: Equality
parameters: string

/* Declare an implementation of the 'equal' function for strings. The typeclass parameter ([0]) is replaced
   by string, just like it was written in the previous line. */
function: equal.string
parameters: a: string, b: string

Typeclasses and typeclass implementations make it possible to extend the functionality of existing packages and data types without any modification of the original declarations and implementation.

By default, all objects automatically belong to all typeclasses for which they have suitable implementations and the sygnm framework does some consisteny checks at startup to catch faulty typeclass implementations.

Typeclasses can appear in parameter lists, which means that it is possible to write functions which take any type belonging to a given typeclass (or a set of them). This is used to write generic functions.

Typeclass can also form a hierarchy by typeclass extension. This is similar (but not the same) as inheritance in object oriented programming. If typeclass "A" extends typeclass "B", then all types that implement the interface for "A" must also implement the interface for "B". Cycles are not allowed in the typeclass extension hierarchy.

There are some advanced features which are not described here (for example, typeclass parameters can be arbitrary typeclass sets instead of a concrete type like string), but this is enough to understand how typeclasses and typeclass implementations work in the sygnm system.

A note on multi-parameter typeclasses

It is possible to use multiple parameters in typeclass interfaces:

/* This is a modification of the previous example so that the Equality typeclass applies to two possibly different types. */

/* Declare the typeclass */
typeclass: Equality

/* The typeclass interface, now two functions, both called equal,
   with two function parameters ('a' and 'b').
   Two functions are necessary because of the symmetry of equality,
   so that both A == B and B == A works when A and B are of different types. */
base: equal
parameters: a: [0], b: [1]

base: equal
parameters: a: [1], b: [0]

typeclass implementation: Equality
parameters: string, char /* In this case the order of the parameters does not matter because of the symmetry of the function signatures. */

/* Declare an implementation for the string and char types. */
function: equal.sc1
parameters: a: string, b: char

function: equal.sc2
parameters: a: char, b: string

However, currently multi-parameter typeclasses have the severe limitation that they can't be effectively used in parameter lists; there is no way to declare a function which only accepts types that belong to the same multi-parameter typeclass. Consider the following example:

data type: A

data type: B

data type: C

data type: D


typeclass: MyTypeclass


typeclass implementation: MyTypeclass
parameters: A, B

typeclass implementation: MyTypeclass
parameters: C, D


/* We want to declare a function with two parameters which
   only accepts types X and Y as inputs if there is a MyTypeclass(X,Y)
   typeclass implementation.
   This is currently not possible.
   The problem with the following function declaration is that it
   accepts if 'a' is of type A, B, C or D and 'b' is of type A, B, C or D when it
   should only accept if 'a' is of type A and 'b' is of type B or 'a' is of type C and 'b' is of type D.
   It does not check whether the types of 'a' and 'b' belong to the same typeclass implementation or in what
   order do they appear as typeclass parameters. It only checks whether an individual
   type belongs to a typeclass (disregarding any other typeclass parameters). */
function: my_function
parameters: a: MyTypeclass, b: MyTypeclass

Because of this limitation, currently multi-parameter typeclasses are of questionable value. In the future, depending on their possible usefulness, they will either be removed or the implementation will be completed.

Typeclass sets

A typeclass set is a set of typeclasses (can be empty), with some additional attributes. A typeclass set roughly corresponds to a single parameter in a parameter list.

The additional attributes are the following:

  • ID (this can be incomplete)
  • Typeclass-inheritance information (in some circumstances types can inherit typeclasses from their type parameters, e.g. a parametric polynomial type can inherit the IntegralDomain typeclass if its value type is in the IntegralDomain typeclass).
  • Label (typeclass sets with the same label must match to the exact same type, see later)
  • Name (to name parameters in parameter lists, not used in the type system)
  • Starred (if a typeclass set is starred it means that it can match any number of (even zero) parameters of the same type (but only as the last parameter in a parameter list))

Usually a typeclass set does not represent a single type (except when the ID is full, since IDs are unique). A type is said to match another type set if the ID of the typeclass set is a subset of the ID of the type, and the typeclasses in the typeclass set form a subset of the typeclasses which the type belongs to (i.e. the typeclass set is in some sense a "subset" of the object type). In the general case, matching is an operation between two typeclass sets which returns a match score (how much do these typeclass sets match?).

Function overload resolution is based on matching of typeclass sets. A function can be called with an argument list if all typeclass sets in the formal parameter list of the function are compatible with the base typeclass sets of the corresponding actual parameters.

The compatibility checking algorithm is the following:

  1. Inputs: typeclass set "A" and typeclass set "B". We want to check whether A matches B (and to what degree), i.e. whether an object described by A is an acceptable actual parameter if B is the corresponding formal parameter, i.e. whether B is a subset of A in some sense.
  2. First the IDs of the typeclass sets are matched. If any part of an object ID is defined in both typeclass sets then those parts must be equal. If the IDs match, a match score is computed based on how many parts of the IDs are defined. If the IDs are not compatible, the matching fails and the algorithm ends.
  3. If the IDs match, we compute the closure of A under the typeclass extension relation, that is, we recursively find all typeclasses which are extended by some typeclass in A. For all typeclasses in A we also store the length of the longest path from the original A to the typeclass in the extension hierarch (for example, if the original A contains typeclasses "a" and "d", and "a" extends "b" which extends "c" and "d" also extends "c", then the longest distance to "c" will be 2, to "b" will be 1 and for "a" and "d" will be 0).
  4. After this is done, we check whether B is a subset of the extended A. If B is not a subset, the matching fails and the algorithm ends.

If B is a subset, then the typeclass sets match, and we need to compute the match score. The match score consist of multiple subscores. First is the ID match subscore, computed when the IDs were checked. The typeclass set match subscore can be computed by finding the minimum among the distances computed in the previous step, but only in the intersection of B and the extended A (which is B). We also need to compute how many typeclasses have this minimal distance in the intersection. The smaller this distance is and the more typeclasses have this distance, the better is the match. This makes sense since we want to prefer more "specialized" functions when deciding between multiple suitable functions when resolving overloaded functions.

The following is a simple example of using typeclass sets in parameter lists:

/* A typeclass for types representing rational numbers and their subsets (interface omitted).
typeclass: Rational


/* This function takes two parameters, both of which must belong to the Rational typeclass.
   No other restrictions are specified. */
function: add
parameters: a: {Rational}, b: {Rational}


/* The rational type belongs to the Rational typeclass.
   This means that the add function can be called with rational parameters. */
typeclass implementation: Rational
parameters: rational


/* The integer type belongs to the Rational typeclass.
   This means that the add function can be called with integer parameters.
   It is also possible to call the add function with a rational and an integer parameter. */
typeclass implementation: Rational
parameters: integer


/* This function takes two parameters, both of which must belong to the Rational typeclass
   and must be of the same exact type (since they both have the same label, '0').
   No other restrictions are specified.
   This function can be called with two integers or two rationals but can't be called with a rational and an integer. */
function: div
parameters: a[label=0]: {Rational}, b[label=0]: {Rational}

Typeclass set lists

Typeclass set lists are simply lists of typeclasses, representing parameter lists. Signatures contain the following typeclass set lists:

  • Base typeclass set: Stores the ID of the object and the typeclasses it belongs to, it represent the type of the object in some sense. This is always a single typeclass set (or a list of length 1). The base typeclass sets of the function parameters are matched against the formal parameter lists of candidate functions in function overload resolution.
  • Type parameter list (only in data type signatures): Determines the number and possible types of type parameters.
  • Parameter list (only in function signatures): Determines the number and possible types of parameters.
  • Return value (only in function signatures): Describes the return value of a function, this is also a single typeclass set (or a list of length 1).

With the exception of the base typeclass set, none of the above typeclass sets have to specify concrete types (have complete IDs). They can contain only typeclasses without specifying any ID, or they can even accept all types.

Matching variable number of parameters is possible by marking the last typeclass set as starred.

Before a function is called, the sygnm framework checks whether the formal parameter list matches the actual parameters. This is done by matching typeclass sets: every formal parameter typeclass set is matched with the base typeclass set of its corresponding actual parameter, as described above.

Automatic type conversion