Tutorial: creating sygnm packages

From sygnm
Jump to: navigation, search

This tutorial will show you how to create and use a simple sygnm package. The goal is to implement the euclidean algorithm for calculating the greatest common divisor of two integers. The resulting gcd package will contain a single function (called my_gcd) which takes two integers and returns their greatest common divisor.

Creating the package

Preliminaries

This tutorial assumes:

We will work in sygnm's user_packages directory, which is user-specific and automatically created when sygnm is started for the first time. It can be found in the user configuration directory (typically /home/<username>/.config/sygnm/user_packages on modern Linux systems). At the start of the tutorial this directory should be empty (which is the case if no packages have been created by the user yet). It is also assumed that automatic package discovery is enabled in the sygnm configuration (which is true for the default configuration). If sygnm has never been started on your computer you must either start it at least once or create the user_packages directory by hand.

While it is possible to work from any directory or turn off automatic package discovery, in that case the sygnm configuration must be modified accordingly. This means adding the directory to the package paths if working from another directory and explicitly listing the packages to be loaded if automatic package discovery is disabled.

Writing the sygnmpkg file

The first step is to create the sygnm package descriptor file, which contains basic information about the package and also declares all functions, data types, and other objects. In the user_packages directory, there should be a packages subdirectory. This directory will contain our sygnm package descriptor files (one file for every package). It is important that you place package descriptor files in the packages subdirectory and not directly in user_packages, since this layout is expected by the code generator.

The name of the package will be gcd, so create a text file called gcd.sygnmpkg with the following content:

/*
 * Calculate the greatest common divisor of two numbers.
 * @categories
 * Number theory
 * @tags
 * gcd, greatest common divisor, euclidean algorithm
 */
package: gcd
copyright: Copyright (C) 2018  my name  <example@example.com>
version: 1.0.0
author: my name example@example.com
depends on: numbers, boolean

/*
 * Calculates the greatest common divisor of two integers.
 */
function: my_gcd(a: numbers.integer, b: numbers.integer) -> :numbers.integer
test: my_gcd(5, 15) == 5
test: my_gcd(6, 8) == 2
test: my_gcd(0, 10) == 10
test: my_gcd(-4, -6) == 2
test: my_gcd(-2, 4) == 2
test: my_gcd(3, 7) == 1
test: my_gcd(437347347348984343, 262356326932623986237) == my_gcd(262356326932623986237, 437347347348984343)

sygnm package files are simple text files with key-value pairs, comments and raw code blocks. Our file starts with a comment (sygnm package files use C-style comments) which describes the package. Comments can have multiple sections (here main content (the default, unmarked section), categories and tags). Dividing comments into sections makes it possibly to have high quality automatically generated documentation. The comment is followed by a package declaration, which states the name of the package. Comments always apply to the directly following declaration, so here the first comment applies to the package declaration.

Next is the copyright, version and author information. The copyright information should always follow the format used in this example. The version number is three numbers separated by dots, and optionally a string tag (e.g. 1.0.0alpha1). It is recommended to follow the semantic versioning standard. The author information is the name of the package author followed by an e-mail address.

The depends on tag is a comma-separated list of the sygnm packages our package directly depends on (these are the packages which we will #include in the C++ implementation).

Now that we're done with the general package information tags it is time to declare our function. We will call the function my_gcd instead of gcd to avoid any potential name collision. A function declaration usually consists of the name of the function, then the parameter list in parentheses and finally the return value after the -> ("returns") symbol. A parameter list consists of parameters (here two integers named a and b). The return value is also a parameter list (of length 1) and uses the same syntax as the input parameters. In a parameter declaration, the name and the type are separated by a : character. Parameter names are optional, but the : must always be written (you can see this in the return value which has no name). Parameters are separated by commas.

Note that this is only the declaration of our function without any implementation. Package descriptor files usually only contain declarations.

Following the function declaration are some simple test cases. Here the simplest form of tests is used where each test is a logical expression. If the expression evaluates to true the test is considered passed, otherwise failed. The test tags here belong to the function since they directly follow the function declaration. Generally, tags which are always in the context of the preceding higher level tag (so here the tests belong to the function declaration and the function declaration belongs to the package declaration (package tag).

A detailed description of the package file format, the parameter list syntax, a list of all accepted tags and comment section names can be found in the reference.

Running the code generator

The purpose of the sygnm code generator is to automate the creation of boilerplate and wrapper code so that the user only has to focus on the implementation of the package's functionality.

Now that the package descriptor file is ready, run the code generator from the user_packages directory (not from the packages subdirectory!):

sygnm-codegen packages . --generate-python --generate-ruby \
  --generate-java --generate-implementations --generate-docs

Explanation of the argument list:

  • packages tells the code generator that we want to generate code from package descriptor files.
  • . means to look in the current directory.
  • --generate-python means to also generate code for the Python wrapper (this is optional requires SWIG and Python headers).
  • --generate-ruby means to also generate code for the Ruby wrapper (this is optional requires SWIG and Ruby headers).
  • --generate-java means to also generate code for the Java wrapper (this is optional and requires SWIG and Java JNI headers).
  • --generate-implementations means to also generate empty implementation files, further reducing the work needed to be done by the user.
  • --generate-docs means to automatically generate documentation for our package.

File system layout before running the code generator:

user_packages
|- packages
   |- gcd.sygnmpkg

If everything went well, you should see something like this in your terminal:

Using automatically detected template directory: /usr/share/sygnm/templates/
sygnm-codegen is running in user mode
Read docmacro:                          arb.txt (from /usr/share/sygnm/sygnmpkg/)
Read docmacro:                          cpp11.txt (from /usr/share/sygnm/sygnmpkg/)
...
Read docmacro:                          pcg.txt (from /usr/share/sygnm/sygnmpkg/)
Successfully parsed package file:       gcd.sygnmpkg
Generating code for package:            gcd
Implementations generated for:          gcd
C code generated for:                   gcd
C++ code generated for:                 gcd
Warning: no dependency information about package: numbers
Warning: no dependency information about package: boolean
Python code generated for:              gcd
Java code generated for:                gcd
Ruby code generated for:                gcd
Formatting generated sources:           gcd
Code generation done:                   gcd
Generated HTML documentation for:       gcd
Generated HTML documentation index
Copying HTML assets...
Generating SQL documentation

Notice that the code generator created some new directories in user_packages, so that the layout looks like this:

user_packages
|- htmldoc
   |- index.html
   |- ...
|- inc
   |- sygnm
      |- packages
         |- gcd.h
|- packages
   |- gcd.sygnmpkg
|- python
   |- sygnm_packages
      |- CMakeLists.txt
      |- gcd.i
|- ruby
   |- packages
      |- CMakeLists.txt
      |- gcd.i
      |- sygnm_gcd_generic.rb
|- java
   |- packages
      |- CMakeLists.txt
      |- gcd.i
|- src
   |- packages
      |- gcd
         |- gcd.cpp
         |- ...
      |- CMakeLists.txt
|- CMakeLists.txt

The good news is, that all of this, except src/packages/gcd/gcd.cpp is completely automatically generated and managed, and we don't have to do anything with them (indeed, even if we modify them they will be overwritten the next time the code generator is run). The htmldoc directory contains the HTML documentations of our package. You can open index.html to view it. The inc directory contains the public header file of our package. This is the header you need to include when using the package from C++ code. The python, ruby, java directories contain files related to the module generated for the respective language. The src directory contains the source code of the package. There may be many files here, but the only one we need to concern ourselves with is gcd.cpp where we will write the implementation of our package. The CMakeLists.txt files belong to the build system which makes it easy to compile our package. They are also completely automatically managed by the code generator.

Now that we ran the code generator we can start the implementation of our package. Finally, please note that every time you change the package descriptor file, you must re-run the code generator to propagate your changes to the generated sources.

Implementation

Since we ran the code generator with the --generate-implementations option even gcd.cpp was created for us and we only need to fill out the empty function bodies. It looks like this (license header comment omitted):

#include <sygnm/sygnm.h>
#include <sygnm/packages/gcd/gcd.h>
#include "pkg.h"
#include <sygnm/packages/numbers/numbers.h>
#include <sygnm/packages/boolean/boolean.h>


bool sygnmpkg::gcd::package_gcd::init()
{
    return true;
}

bool sygnmpkg::gcd::package_gcd::deinit()
{
    return true;
}

sygnm::hash_t sygnmpkg::gcd::package_gcd::hash() const
{

}

std::optional<sygnm::node_ptr> sygnmpkg::gcd::my_gcd_class::exec([[maybe_unused]] const sygnm::evaluation_context* ctx, const sygnm::const_node_ptr& a, const sygnm::const_node_ptr& b, [[maybe_unused]] const sygnm::properties& props)
{
    /*
    TODO: implementation of gcd.my_gcd.(a: numbers.integer, b: numbers.integer) -> :numbers.integer
    */
}

There are four functions here. The init and deinit functions run when the package is loaded and when sygnm is shutting down, respectively. They can be used for package-wide initialization. A true return value means that the (de)initialization was successful. Since we have nothing to initalize, we can leave them as-is. Next is the hash function. This is part of the sygnm functionality for reproductible computations. If the implementation of our package depends on any external factors (like 3rd party libraries, presence of some system configuration option, etc.), then we have to generate a hash from the state of these external factors (library versions, configuration values, etc.) so that if they change in the future, the hash will also change (at least with high probability) and the user is informed that the package may not behave exactly as expected. Internal factors (like the version of sygnm and the version of our package) are already accounted for, so we don't have to hash them. Since in this case we don't depend on any external factors, (only on other sygnm packages), we don't have to hash anything and return 0; suffices here.

Now there is only one remaining function, which is the function we declared in the .sygnmpkg file. This function takes four parameters: an evaluation context, the inputs of our function (a and b) and an object storing properties. We will only need a and b now. While the types of the input parameters do not appear in the C++ function signature, we can assume that by this point the sygnm system made sure that our function is only called with inputs which match the function signature in the package descriptor file (i.e. a and b are integers). This means that we do not have to further check the input.

There are two ways to implement our function: we can use lower level, low overhead functions, or we can use runtime dynamic sygnm calls. The low level approach is faster, but more verbose and not generic (will only work for integers). The runtime, dynamic approach is slower but produces generic code which will work for any input types where the operations make sense (of course we would have to modify the declaration in the .sygnmpkg file since now it only accepts integers).

Here is the pseudocode of the euclidean algorithm (Wikipedia):

function gcd(a, b)
    while b ≠ 0
       t := b; 
       b := a mod b; 
       a := t; 
    return a;

Let's see first the generic implementation:

    auto zero = sygnm::mk<numbers::integer>(0);
    sygnm::node_ptr aa = a->clone();
    sygnm::node_ptr bb = b->clone();
    while ($(!=, bb, zero)->get<boolean::boolean>())
    {
        auto t = bb->clone();
        bb = $(%, aa, bb);
        aa = t;
    }
    return aa;

The first line is straightforward, we create a zero value to be used in the comparison in the loop condition (or the is_zero function could have been used). In the following two lines we copy the inputs, since by default, we can't overwrite the objects the user passes in to our function. This is followed by the loop, where in the condition we have the first runtime dynamic sygnm call. The call looks like this: $(!=, aa, bb). The $ is a special macro which initiates sygnm calls (CALL or SYGNM_CALL can also be used instead of it). The first argument of the macro is the name of the function (here it is the != operator). This is followed by the inputs to the function. What this call does is search at runtime for the best overload of != which can be used with the given inputs (it may even do type conversion if necessary). Which implementation is selected depends not only on the types of aa and bb but also on what packages were loaded when the sygnm system was started (and therefore on what functions are available). The result of such a call is always a generic sygnm object, which we will have to cast, in this case, into a boolean. This is what the ->get<boolean::boolean>() part does. The remaining part of the function is a direct implementation of the pseudocode, notice that the modulo operation is also a runtime dynamic call, and that we never used the fact that the inputs are integers.

Now the low level implementation:

    auto zero = sygnm::mk<numbers::integer>(0);
    sygnm::node_ptr aa = a->clone();
    sygnm::node_ptr bb = b->clone();
    while (!numbers::equal_integer(ctx, bb, zero, props)->get<boolean::boolean>())
    {
        auto t = bb->clone();
        bb = numbers::mod_integer(ctx, aa, bb, props);
        aa = t;
    }
    return aa;

As you can see, the only difference is in the function calls. While previously we did not know which implementation to use for the comparison and the modulo operation, now we specify that we want to use the ones located in the numbers package that are for integer inputs. We also have to pass the evaluation context and properties explicitly. This is faster than the previous implementation since we are using ordinary C++ function calls so there is no runtime overhead. However, this code will only work for integers and we have to know which exact implementation we need (and if any type conversion is needed we also have to do that ourselves).

Copy one of these implementations into the function body. Now we are done with the implementation of our package and are ready to compile it.

Compiling the package

To compile the package, open a terminal in the user_packages directory, then run

cmake .

and then

make

If you encounter any errors, then probably some dependency is missing from your computer or sygnm was not installed correctly (see the #Preliminaries section). To remove all files generated by cmake and make (including the compiled binaries), you can run

sygnm-codegen build_cleanup .

Testing the package

Assuming everything went well so far, our package is ready to use. First, we should run the tests from the package descriptor file (still standing in the user_packages directory):

sygnm-codegen run_tests . sygnm /home/<username>/.config/sygnm/sygnm.cfg gcd

The second argument is the package directory (now .), the third is name of the sygnm binary we want to use (which is just sygnm now since we want to use the installed version), the fourth argument is a path to a sygnm configuration file (you can use the one that was created when sygnm was first started, in the sygnm configuration directory), and the last argument is the name of the package to be tested. The output shoud look similar to this:

Read docmacro:                          arb.txt (from /usr/share/sygnm/sygnmpkg/)
Read docmacro:                          cpp11.txt (from /usr/share/sygnm/sygnmpkg/)
...
Read docmacro:                          pcg.txt (from /usr/share/sygnm/sygnmpkg/)
Successfully parsed package file:       gcd.sygnmpkg
================================================================================
Running tests for package:              gcd
[PASS]
[PASS]
[PASS]
[PASS]
[PASS]
[PASS]
[PASS]
Testing finished for package:           gcd
PASS: 7/7 (100.00%)
================================================================================
Testing finished
PASS: 7/7 (100.00%)

Looks like our package is working.

Using the package

Now that our packages is ready, let's see how to access its functionality.

In an interactive sygnm session

Start an interactive sygnm session, for example with the sygnm-cli command which gives you a command line interface. sygnm will automatically load the package at startup. Type

my_gcd(5, 15);

and hit enter.

In C++

If you've read the previous section on implementing the package, then this code will not be much of a surprise. Here it is:

#include <iostream>
#include <sygnm/sygnm.h>
#include <sygnm/packages/gcd/gcd.h>
#include <sygnm/packages/numbers/numbers.h>

using namespace sygnmpkg;

int main()
{
    //Initialize sygnm
    sygnm::initialize(sygnm::settings::default_profile());

    //Inputs
    auto a = sygnm::mk<numbers::integer>(2356346);
    auto b = sygnm::mk<numbers::integer>(-99467348);

    //Default (global) evaluation context and property objects
    auto ctx = &sygnm::object_registry::registry().get_global_evaluation_context();
    sygnm::properties props;

    //Call my_gcd: runtime dynamic sygnm function call (slower)
    auto gcd_result_0 = $(my_gcd, a, b);

    //Call my_gcd: low level, low overhead call (faster)
    auto gcd_result_1 = gcd::my_gcd(ctx, a, b);

    //Print the results
    auto renderer = sygnm::object_registry::registry().create_renderer("default_renderer");
    std::cout << renderer->render_to_string(gcd_result_0)
              << std::endl
              << renderer->render_to_string(gcd_result_1)
              << std::endl;

    //Shut down sygnm
    sygnm::deinitialize();

    return 0;
}

We include the necessary headers, initialize sygnm, create some variables, call our function (for the sake of example with both the low level and the generic method) then print the results with the default renderer and finally shut down sygnm. This is the general pattern when using sygnm from a C++ application. Save this code to a file called main.cpp (it doesn't matter where this file is saved to).

The real question is, how can we compile this code? We have to make sure that the compiler finds the included files and the linker finds all needed libraries. For this, we will use the CMake build system. Create a CMakeLists.txt file besides main.cpp with the following content:

cmake_minimum_required(VERSION 3.3.0)
project(sygnm-example-gcd-cpp)

find_package(sygnm COMPONENTS numbers)

include(${sygnm_CMAKE_COMMON_INCLUDE})

include_directories(SYSTEM ${sygnm_INCLUDE_DIRS})

include_directories(SYSTEM $ENV{XDG_CONFIG_HOME}/sygnm/user_packages/inc)
link_directories($ENV{XDG_CONFIG_HOME}/sygnm/user_packages/src/packages/gcd)

add_executable(sygnm-gcd-cpp main.cpp)
target_link_libraries(sygnm-gcd-cpp ${sygnm_LIBRARIES} sygnm-gcd)

The first two lines are pretty self-explanatory. We use the find_package command to configure sygnm for our project. Here in the components section we have to mention every package which we #includeed and which comes with sygnm (so we do not mention our gcd package here since it doesn't come with sygnm). The following line (include(...)) does some general configuration to make the build environment right for building sygnm software (do not omit this line, it may cause subtle errors). Then the include_directories and link_directories commands tell the compiler where to look for the sygnm include files and the include files and binaries of our package (assuming our package is in the user_package directory). Then in the final two lines we tell CMake which source files to use and what libraries to link with. Notice that we have to mention our package separately (sygnm-gcd) in the last line; since our package does not come with sygnm it is not included in the sygnm_LIBRARIES variable.

When we are done with the CMakeLists.txt there is nothing left to do than run

cmake .

and then

make

to compile our program and finally execute it with

./sygnm-gcd-cpp

In Python

By default, Python can't load modules from anywhere but only from predefined paths, so we either need to copy the contents of the python directory to one of Python's module paths, or we can tell Python to also look for packages in the directory where our generated Python module resides. The latter can be done by setting the PYTHONPATH environment variable or by starting the Python interpreter and typing

import sys
sys.path.append('/home/<username>/.config/sygnm/user_packages/python')

Note that you must do this every time you start the interpreter.

After Python knows where to look for our package, using it is pretty easy:

#Import the main sygnm module and our package
import sygnm
from sygnm_packages.gcd import generic as gcd

#Initialize sygnm
sygnm.initialize(sygnm.settings.default_profile())

#Compute GCD (the default, global evaluation context is used)
ctx = sygnm.object_registry.registry().get_global_evaluation_context()
result = gcd.my_gcd(ctx, 3252363632632, -6236237888)

#Print the result using the default renderer
renderer = sygnm.object_registry.registry().create_renderer("default_renderer")
print(renderer.render_to_string(result))

#Shut down sygnm
sygnm.deinitialize()

In Ruby

By default, Ruby can't load packages from anywhere but only from predefined paths, so we either need to copy the contents of the python directory to one of Ruby's package paths, or we can tell Ruby to also look for packages in the directory where our generated Ruby package resides. The latter can be done by setting the RUBYLIB environment variable when starting the Ruby interpreter:

RUBYLIB=/home/<username>/.config/sygnm/user_packages/ruby/packages ruby my_script.rb

After Ruby knows where to look for our package, using it is pretty easy:

#Load the gcd package
require 'sygnm_gcd_generic'
GCD = Sygnm_gcd_generic

#Initialize sygnm
Sygnm::initialize(Sygnm::Settings::default_profile())

#Compute GCD
ctx = Sygnm::Object_registry::registry().get_global_evaluation_context()
result = GCD.my_gcd(ctx, 3252363632632, -6236237888)

#Print the result
renderer = Sygnm::Object_registry::registry().create_renderer("default_renderer")
puts renderer.render_to_string(result)

#Shut down sygnm
Sygnm::deinitialize()

In Java

The Java interface is somewhat similar to the C++ one, however there are some differences:

  • There is an external dependency on Java Native Access (jna.jar).
  • When compiling and running the application we have to tell Java where to find the sygnm classes.
  • We have to place the sygnm native libraries in a directory where Java will find them.

When you build the package, a native library and some Java classes are generated. The native library is in user_packages/java/packages (on Linux it's called libsygnm_java_gcd.so). For the following example to work, you either have to move the compiled native library from user_packages/java/packages to your system's library path (e.g. /usr/lib on Linux) or add this directory to the list of system library paths.

The Java code to calculate the GCD of two integers using our package is the following:

import sygjava.*;
import sygjava.gcd.gcd;
import sygjava.numbers.numbers;
import sygjava.utils.utils;

public class gcd_java {
  public static void main(String argv[]) {
    //Need to load the native libraries we use first
    String[] pkgs = {"sygnm_java_numbers", "sygnm_java_gcd"};
    utils.load_native_sygnm_libs(pkgs);

    //Initialize sygnm
    sygnm.initialize(settings.global_settings().default_profile());

    //Compute the GCD
    evaluation_context ctx = object_registry.registry().get_global_evaluation_context();
    node_ptr a = numbers.integer(50);
    node_ptr b = numbers.integer("326326326525");
    node_ptr result = gcd.my_gcd(ctx, a.cptr(), b.cptr());

    //Print the result
    renderer rdr = object_registry.registry().create_renderer("default_renderer");
    System.out.println(sygnm.unicode_string_to_local_8bit(rdr.render_to_string(result)));

    //Shut down sygnm
    sygnm.deinitialize();
  }
}

The only notable difference from the other languages is that we manually have to load the native libraries of our package and its dependencies before initializing sygnm.

Assuming that Java can find the native libraries, you can compile this code with the following command:

javac -cp .:/path/to/sygnm/java/classes:/home/<username>/.config/sygnm/user_packages/java:/path/to/jna.jar gcd_java.java

We had to add the current directory, the path to the sygnm Java interface classes, the path to the Java classes generated for our package and the path to jna.jar (Java Native Access) to the classpath.

Running it is similar to the compilation:

java -cp .:/path/to/sygnm/java/classes:/home/<username>/.config/sygnm/user_packages/java:/path/to/jna.jar gcd_java