Symbolic gradient is usually computed from tensor.grad(), which offers a more convenient syntax for the common case of wanting the gradient in some expressions with respect to a scalar cost. The grad_sources_inputs() function does the underlying work, and is more flexible, but is also more awkward to use when tensor.grad() can do the job.
Driver for gradient calculations.
Raised when grad is asked to compute the gradient with respect to a disconnected input and disconnected_inputs=’raise’.
A type indicating that a variable is a result of taking the gradient of c with respect to x when c is not a function of x. A symbolic placeholder for 0, but to convey the extra information that this gradient is 0 because it is disconnected.
This error is raised when a gradient is calculated, but incorrect.
Computes the L operation on f wrt to wrt evaluated at points given in eval_points. Mathematically this stands for the jacobian of f wrt to wrt left muliplied by the eval points.
Return type: | Variable or list/tuple of Variables depending on type of f |
---|---|
Returns: | symbolic expression such that L_op[i] = sum_i ( d f[i] / d wrt[j]) eval_point[i] where the indices in that expression are magic multidimensional indices that specify both the position within a list and all coordinates of the tensor element in the last If f is a list/tuple, then return a list/tuple with the results. |
Raised when grad encounters a NullType.
Computes the R operation on f wrt to wrt evaluated at points given in eval_points. Mathematically this stands for the jacobian of f wrt to wrt right muliplied by the eval points.
Return type: | Variable or list/tuple of Variables depending on type of f |
---|---|
Returns: | symbolic expression such that R_op[i] = sum_j ( d f[i] / d wrt[j]) eval_point[j] where the indices in that expression are magic multidimensional indices that specify both the position within a list and all coordinates of the tensor element in the last. If wrt is a list/tuple, then return a list/tuple with the results. |
Formats the outputs according to the flags use_list and use_tuple. If use_list is True, outputs is returned as a list (if outputs is not a list or a tuple then it is converted in a one element list). If use_tuple is True, outputs is returned as a tuple (if outputs is not a list or a tuple then it is converted into a one element tuple). Otherwise (if both flags are false), outputs is returned.
Parameters: |
|
---|---|
Return type: | Variable or list/tuple of Variables (depending upon wrt) |
Returns: | symbolic expression of gradient of cost with respect to wrt. If an element of wrt is not differentiable with respect to the output, then a zero variable is returned. It returns an object of same type as wrt: a list/tuple or Variable in all cases. |
Return an un-computable symbolic variable of type x.type.
If any call to tensor.grad results in an expression containing this un-computable variable, an exception (NotImplementedError) will be raised indicating that the gradient on the x_pos‘th input of op has not been implemented. Likewise if any call to theano.function involves this variable.
Optionally adds a comment to the exception explaining why this gradient is not implemented.
Return an un-computable symbolic variable of type x.type.
If any call to tensor.grad results in an expression containing this un-computable variable, an exception (GradUndefinedError) will be raised indicating that the gradient on the x_pos‘th input of op is mathematically undefined. Likewise if any call to theano.function involves this variable.
Optionally adds a comment to the exception explaining why this gradient is not defined.
Parameters: |
|
---|---|
Returns: | either a instance of Variable or list/tuple of Variables (depending upon wrt) repressenting the Hessian of the cost with respect to (elements of) wrt. If an element of wrt is not differentiable with respect to the output, then a zero variable is returned. The return value is of same type as wrt: a list/tuple or TensorVariable in all cases. |
Parameters: |
|
---|---|
Returns: | either a instance of Variable or list/tuple of Variables (depending upon wrt) repesenting the jacobian of expression with respect to (elements of) wrt. If an element of wrt is not differentiable with respect to the output, then a zero variable is returned. The return value is of same type as wrt: a list/tuple or TensorVariable in all cases. |
Compute the numeric derivative of a scalar-valued function at a particular point.
Return absolute and relative error between a and b.
The relative error is a small number when a and b are close, relative to how big they are.
The denominator is clipped at 1e-8 to avoid dividing by 0 when a and b are both close to 0.
The tuple (abs_err, rel_err) is returned
Return the abs and rel error of gradient estimate g_pt
g_pt must be a list of ndarrays of the same length as self.gf, otherwise a ValueError is raised.
Corresponding ndarrays in g_pt and self.gf must have the same shape or ValueError is raised.
Find the biggest error between g_pt and self.gf.
What is measured is the violation of relative and absolute errors, wrt the provided tolerances (abs_tol, rel_tol). A value > 1 means both tolerances are exceeded.
Return the argmax of min(abs_err / abs_tol, rel_err / rel_tol) over g_pt, as well as abs_err and rel_err at this point.
Test a gradient by Finite Difference Method. Raise error on failure.
>>> verify_grad(theano.tensor.tanh,
(numpy.asarray([[2,3,4], [-1, 3.3, 9.9]]),),
rng=numpy.random)
Raises an Exception if the difference between the analytic gradient and numerical gradient (computed through the Finite Difference Method) of a random projection of the fun’s output to a scalar exceeds the given tolerance.
Parameters: |
|
---|---|
Note : | WARNING to unit-test writers: if op is a function that builds a graph, try to make it a SMALL graph. Often verify grad is run in debug mode, which can be very slow if it has to verify a lot of intermediate computations. |
Note : | This function does not support multiple outputs. In tests/test_scan.py there is an experimental verify_grad that covers that case as well by using random projections. |