Table Of Contents

This Page

Module

What is a Theano Module

Theano ‘Module’ is a structure which implements what could be called a “theano class”. A Module can contain Members, which act like instance variables (“state”). It can also contain an arbitrary number of Methods, which are functions that share the same Members in addition to their own inputs. Last but not least, Modules can be nested (explanations and examples follow). Module is meant to:

  1. ease the sharing of variables between several Theano functions,
  2. streamline automatic naming, and
  3. allow a hierarchy of “modules” whose states can interact.

import

All examples suppose that you have done those import:

#!/usr/bin/python
import theano
import numpy as N
from theano import tensor as T
from theano.tensor import nnet as NN
from theano.compile import module as M

Module

A Module can contain Members, Methods and inner Modules. Each type has a special meaning.

module = M.Module()

Member

Usage:

#module.state = variable
module.state = T.scalar()

A Member represents a state variable (i.e., whose value remains after a Method is called). It will be named automatically after that field and it will be an implicit input of all Methods of the Module. Its storage (i.e. where the value is stored) will be shared by all Methods of the Module.

A Variable which is the variable of a previous computation (by opposition to being updated) is not a Member. Internally this is called an External. You should not need to care about this.

For sharing state between modules, see Inner Module section.

Method

Usage:

module.method = M.Method(inputs, outputs, **updates)

Each key in the updates dictionary must be the name of an existing Member of the Module and the value associated to that key is the update expression for the state. When called on a ModuleInstance produced by the Module, the method will calculate the outputs from the inputs and will update all the states as specified by the update expressions. See the basic example below.

Inner Module

To share a Member between modules, the modules must be linked through the inner module mechanism.

Usage:

module2.submodule = module

ModuleInstance

A Module can produce a ModuleInstance with its make method. Think of this as a class and an object in C++/Java. If an attribute was a Member, it will become a read/write access to actual data for the state. If it was a M.Method, a function will be compiled with the proper signature and semantics.

Module Interface

def make(self, mode = {'FAST_COMPILE', 'FAST_RUN', ... }, **init)

‘’‘make’‘’ compiles all Methods and allocates storage for all Members into a ModuleInstance object, which is returned. The init dictionary can be used to provide initial values for the members.

def resolve(self, symbol, filter = None)

Resolves a symbol in this module. The symbol can be a string or a Variable. If the string contains dots (eg "x.y"), the module will resolve the symbol hierarchically in its inner modules. The filter argument is None or a class and it can be used to restrict the search to Member or Method instances for example.

def _instance_initialize(self, inst, **init)

The inst argument is a ModuleInstance. For each key, value pair in init: setattr(inst, key, value) is called. This can be easily overriden by Module subclasses to initialize an instance in different ways. If you don’t know what to put their, don’t put it and it will execute a default version. If you want to call the parent version call: M.default_initialize(inst,**init)

Basic example

The problem here is to create two functions, inc and dec and a shared state c such that inc(n) increases c by n and dec(n) decreases c by n. We also want a third function, plus10, which return 10 + the current state without changing the current state. Using the function interface, the feature can be implemented as follows:

n, c = T.scalars('nc')
inc = theano.function([n, ((c, c + n), 0)], [])
dec = theano.function([n, ((c, c - n), inc.container[c])], []) # we need to pass inc's container in order to share
plus10 = theano.function([(c, inc.container[c])], c + 10)
assert inc[c] == 0
inc(2)
assert inc[c] == 2 and dec[c] == inc[c]
dec(3)
assert inc[c] == -1 and dec[c] == inc[c]
assert plus10() == 9

Now, using Module:

m = M.Module()
n = T.scalar('n')
m.c = T.scalar() # state variables
m.inc = M.Method(n, [], updates = {m.c: m.c + n}) # m.c <= m.c + n
m.dec = M.Method(n, [], updates = {m.c: m.c - n}) # k.c <= k.c - n
#m.dec = M.Method(n, [], updates = {c: m.c - n})#global c don't exist
#m.plus10 does not update the state
m.plus10 = M.Method([], m.c + 10) # m.c is always accessible since it is a member of this mlass

inst = m.make(c = 0) # here, we make an "instance" of the module with c initialized to 0
assert inst.c == 0
inst.inc(2)
assert inst.c == 2
inst.dec(3)
assert inst.c == -1
assert inst.plus10() == 9
Benefits of Module over function in this example:
  • There is no need to manipulate the containers directly
  • The fact inc and dec share a state is more obvious syntactically.
  • Method does not require the states to be anywhere in the input list.
  • The interface of the instance produced by m.make() is simple and coherent, extremely similar to that of a normal python object. It is directly usable by any user.

Nesting example

The problem now is to create two pairs of inc dec functions and a function sum that adds the shared states of the first and second pair.

Using function:

def make_incdec_function():
       n, c = T.scalars('nc')
       inc = theano.function([n, ((c, c + n), 0)], [])
       dec = theano.function([n, ((c, c - n), inc.container[c])], [])#inc and dec share the same state.
       return inc,dec


inc1, dec1 = make_incdec_function()
inc2, dec2 = make_incdec_function()
a, b = T.scalars('ab')
sum = theano.function([(a, inc1.container['c']), (b, inc2.container['c'])], a + b)
inc1(2)
dec1(4)
inc2(6)
assert inc1['c'] == -2 and inc2['c'] == 6
assert sum() == 4 # -2 + 6

Using Module:

def make_incdec_module():
    m = M.Module()
    n = T.scalar('n')
    m.c = T.scalar() # state variables
    m.inc = M.Method(n, [], updates = {m.c: m.c + n}) # m.c <= m.c + n
    m.dec = M.Method(n, [], updates = {m.c: m.c - n}) # m.c <= m.c - n
    return m

m = M.Module()
m.incdec1 = make_incdec_module()
m.incdec2 = make_incdec_module()
m.sum = M.Method([], m.incdec1.c + m.incdec2.c)
inst = m.make(incdec1 = dict(c=0), incdec2 = dict(c=0))
inst.incdec1.inc(2)
inst.incdec1.dec(4)
inst.incdec2.inc(6)
assert inst.incdec1.c == -2 and inst.incdec2.c == 6
assert inst.sum() == 4 # -2 + 6

Here, we make a new Module and we give it two inner Modules like the one defined in the basic example. Each inner module has methods inc and dec as well as a state c and their state is directly accessible from the outer module, which means that it can define methods using them. The instance (inst) we make from the Module (m) reflects the hierarchy that we created. Unlike the method using function, there is no need to manipulate any containers directly.

Advanced example

Complex models can be implemented by subclassing Module (though that is not mandatory). Here is a complete, extensible (and working) regression model implemented using this system:

import theano
import numpy as N
from theano import tensor as T
from theano.tensor import nnet as NN
from theano.compile import module as M

class RegressionLayer(M.Module):
    def __init__(self, input = None, target = None, regularize = True):
        super(RegressionLayer, self).__init__() #boilerplate
        # MODEL CONFIGURATION
        self.regularize = regularize
        # ACQUIRE/MAKE INPUT AND TARGET
        if not input:
            input = T.matrix('input')
        if not target:
            target = T.matrix('target')
        # HYPER-PARAMETERS
        self.stepsize = T.scalar()  # a stepsize for gradient descent
        # PARAMETERS
        self.w = T.matrix()  #the linear transform to apply to our input points
        self.b = T.vector()  #a vector of biases, which make our transform affine instead of linear
        # REGRESSION MODEL
        self.activation = T.dot(input, self.w) + self.b
        self.prediction = self.build_prediction()
        # CLASSIFICATION COST
        self.classification_cost = self.build_classification_cost(target)
        # REGULARIZATION COST
        self.regularization = self.build_regularization()
        # TOTAL COST
        self.cost = self.classification_cost
        if self.regularize:
            self.cost = self.cost + self.regularization
        # GET THE GRADIENTS NECESSARY TO FIT OUR PARAMETERS
        self.grad_w, self.grad_b, grad_act = T.grad(self.cost, [self.w, self.b, self.prediction])
        print 'grads', self.grad_w, self.grad_b
        # INTERFACE METHODS
        self.update = M.Method([input, target],
                               [self.cost, self.grad_w, self.grad_b, grad_act],
                               updates={self.w: self.w - self.stepsize * self.grad_w,
                                        self.b: self.b - self.stepsize * self.grad_b})
        self.apply = M.Method(input, self.prediction)
    def params(self):
        return self.w, self.b
    def _instance_initialize(self, obj, input_size = None, target_size = None,
                             seed = 1827, **init):
        # obj is an "instance" of this module holding values for each member and
        # functions for each method
        if input_size and target_size:
            # initialize w and b in a special way using input_size and target_size
            sz = (input_size, target_size)
            rng = N.random.RandomState(seed)
            obj.w = rng.uniform(size = sz, low = -0.5, high = 0.5)
            obj.b = N.zeros(target_size)
            obj.stepsize = 0.01
        # here we call the default_initialize method, which takes all the name: value
        # pairs in init and sets the property with that name to the provided value
        # this covers setting stepsize, l2_coef; w and b can be set that way too
        # we call it after as we want the parameter to superseed the default value.
        M.default_initialize(obj,**init)
    def build_regularization(self):
        return T.zero() # no regularization!


class SpecifiedRegressionLayer(RegressionLayer):
    """ XE mean cross entropy"""
    def build_prediction(self):
        # return NN.softmax(self.activation) #use this line to expose a slow subtensor
        # implementation
        return NN.sigmoid(self.activation)
    def build_classification_cost(self, target):
        self.classification_cost_matrix = (target - self.prediction)**2
        #print self.classification_cost_matrix.type
        self.classification_costs = T.sum(self.classification_cost_matrix, axis=1)
        return T.sum(self.classification_costs)
    def build_regularization(self):
        self.l2_coef = T.scalar() # we can add a hyper parameter if we need to
        return self.l2_coef * T.sum(self.w * self.w)


class PrintEverythingMode(theano.Mode):
    def __init__(self, linker, optimizer=None):                                                       
        def print_eval(i, node, fn): 
            print i, node, [input[0] for input in fn.inputs],                                         
            fn()
            print [output[0] for output in fn.outputs]
        wrap_linker = theano.gof.WrapLinkerMany([linker], [print_eval])
        super(PrintEverythingMode, self).__init__(wrap_linker, optimizer)                             


def test_module_advanced_example():

    profmode = theano.ProfileMode(optimizer='fast_run', linker=theano.gof.OpWiseCLinker())
    profmode = PrintEverythingMode(theano.gof.OpWiseCLinker(), 'fast_run')

    data_x = N.random.randn(4, 10)
    data_y = [ [int(x)] for x in (N.random.randn(4) > 0)]


    model = SpecifiedRegressionLayer(regularize = False).make(input_size = 10,
                       target_size = 1,
                       stepsize = 0.1,
                       mode=profmode)

    for i in xrange(1000):
       xe, gw, gb, ga = model.update(data_x, data_y)
       if i % 100 == 0:
           print i, xe
           pass
       #for inputs, targets in my_training_set():
           #print "cost:", model.update(inputs, targets)

    print "final weights:", model.w
    print "final biases:", model.b

    profmode.print_summary()

Here is how we use the model:

data_x = N.random.randn(4, 10)
data_y = [ [int(x)] for x in N.random.randn(4) > 0]


model = SoftmaxXERegression(regularize = False).make(input_size = 10,
                   target_size = 1,
                   stepsize = 0.1)

for i in xrange(1000):
   xe = model.update(data_x, data_y)
   if i % 100 == 0:
       print i, xe
       pass
   #for inputs, targets in my_training_set():
       #print "cost:", model.update(inputs, targets)

print "final weights:", model.w
print "final biases:", model.b

Extending Methods

Methods can be extended to update more parameters. For example, if we wanted to add a variable holding the sum of all costs encountered so far to SoftmaxXERegression, we could proceed like this:

model_module = SoftmaxXERegression(regularize = False)
model_module.sum = T.scalar() # we add a module member to hold the sum
model_module.update.updates.update(sum = model_module.sum + model_module.cost) # now update will also update sum!

model = model_module.make(input_size = 4,
                         target_size = 2,
                         stepsize = 0.1,
                         sum = 0) # we mustn't forget to initialize the sum

test = model.update([[0,0,1,0]], [[0,1]]) + model.update([[0,1,0,0]], [[1,0]])
assert model.sum == test

The inputs and outputs list of a Method can be doctored as well, but it is trickier, arguably less useful and not fully supported at the moment.