Commit 0b13265a authored by Stefan Krah's avatar Stefan Krah

Issue #7652: Integrate the decimal floating point libmpdec library to speed

up the decimal module. Performance gains of the new C implementation are
between 12x and 80x, depending on the application.
parent 592d76f0
This diff is collapsed.
......@@ -8,9 +8,9 @@ Numeric and Mathematical Modules
The modules described in this chapter provide numeric and math-related functions
and data types. The :mod:`numbers` module defines an abstract hierarchy of
numeric types. The :mod:`math` and :mod:`cmath` modules contain various
mathematical functions for floating-point and complex numbers. For users more
interested in decimal accuracy than in speed, the :mod:`decimal` module supports
exact representations of decimal numbers.
mathematical functions for floating-point and complex numbers. The :mod:`decimal`
module supports exact representations of decimal numbers, using arbitrary precision
arithmetic.
The following modules are documented in this chapter:
......
......@@ -596,6 +596,93 @@ curses
(Contributed by Iñigo Serna in :issue:`6755`)
decimal
-------
:issue:`7652` - integrate fast native decimal arithmetic.
C-module and libmpdec written by Stefan Krah.
The new C version of the decimal module integrates the high speed libmpdec
library for arbitrary precision correctly-rounded decimal arithmetic.
libmpdec conforms to IBM's General Decimal Arithmetic Specification.
Performance gains range from 12x for database applications to 80x for
numerically intensive applications:
+---------+-------------+--------------+-------------+
| | decimal.py | _decimal | speedup |
+=========+=============+==============+=============+
| pi | 42.75s | 0.58s | 74x |
+---------+-------------+--------------+-------------+
| telco | 172.19s | 5.68s | 30x |
+---------+-------------+--------------+-------------+
| psycopg | 3.57s | 0.29s | 12x |
+---------+-------------+--------------+-------------+
Features
~~~~~~~~
* The :exc:`~decimal.FloatOperation` signal optionally enables stricter
semantics for mixing floats and Decimals.
* If Python is compiled without threads, the C version automatically
disables the expensive thread local context machinery. In this case,
the variable :data:`~decimal.HAVE_THREADS` is set to False.
API changes
~~~~~~~~~~~
* The C module has the following context limits, depending on the machine
architecture:
+-------------------+---------------------+------------------------------+
| | 32-bit | 64-bit |
+===================+=====================+==============================+
| :const:`MAX_PREC` | :const:`425000000` | :const:`999999999999999999` |
+-------------------+---------------------+------------------------------+
| :const:`MAX_EMAX` | :const:`425000000` | :const:`999999999999999999` |
+-------------------+---------------------+------------------------------+
| :const:`MIN_EMIN` | :const:`-425000000` | :const:`-999999999999999999` |
+-------------------+---------------------+------------------------------+
* In the context templates (:class:`~decimal.DefaultContext`,
:class:`~decimal.BasicContext` and :class:`~decimal.ExtendedContext`)
the magnitude of :attr:`~decimal.Context.Emax` and
:attr:`~decimal.Context.Emin` has changed to :const:`999999`.
* The :class:`~decimal.Decimal` constructor in decimal.py does not observe
the context limits and converts values with arbitrary exponents or precision
exactly. Since the C version has internal limits, the following scheme is
used: If possible, values are converted exactly, otherwise
:exc:`~decimal.InvalidOperation` is raised and the result is NaN. In the
latter case it is always possible to use :meth:`~decimal.Context.create_decimal`
in order to obtain a rounded or inexact value.
* The power function in decimal.py is always correctly-rounded. In the
C version, it is defined in terms of the correctly-rounded
:meth:`~decimal.Decimal.exp` and :meth:`~decimal.Decimal.ln` functions,
but the final result is only "almost always correctly rounded".
* In the C version, the context dictionary containing the signals is a
:class:`~collections.abc.MutableMapping`. For speed reasons,
:attr:`~decimal.Context.flags` and :attr:`~decimal.Context.traps` always
refer to the same :class:`~collections.abc.MutableMapping` that the context
was initialized with. If a new signal dictionary is assigned,
:attr:`~decimal.Context.flags` and :attr:`~decimal.Context.traps`
are updated with the new values, but they do not reference the RHS
dictionary.
* Pickling a :class:`~decimal.Context` produces a different output in order
to have a common interchange format for the Python and C versions.
* The order of arguments in the :class:`~decimal.Context` constructor has been
changed to match the order displayed by :func:`repr`.
faulthandler
------------
......
......@@ -6,7 +6,7 @@ extern "C" {
#endif
/* This is published for the benefit of "friend" marshal.c only. */
/* This is published for the benefit of "friends" marshal.c and _decimal.c. */
/* Parameters of the long integer representation. There are two different
sets of parameters: one set for 30-bit digits, stored in an unsigned 32-bit
......
This diff is collapsed.
......@@ -1416,7 +1416,7 @@ def run_unittest(*classes):
#=======================================================================
# doctest driver.
def run_doctest(module, verbosity=None):
def run_doctest(module, verbosity=None, optionflags=0):
"""Run doctest on the given module. Return (#failures, #tests).
If optional argument verbosity is not specified (or is None), pass
......@@ -1431,7 +1431,7 @@ def run_doctest(module, verbosity=None):
else:
verbosity = None
f, t = doctest.testmod(module, verbose=verbosity)
f, t = doctest.testmod(module, verbose=verbosity, optionflags=optionflags)
if f:
raise TestFailed("%d of %d doctests failed" % (f, t))
if verbose:
......
This diff is collapsed.
......@@ -401,14 +401,10 @@ class FractionTest(unittest.TestCase):
def testMixingWithDecimal(self):
# Decimal refuses mixed arithmetic (but not mixed comparisons)
self.assertRaisesMessage(
TypeError,
"unsupported operand type(s) for +: 'Fraction' and 'Decimal'",
operator.add, F(3,11), Decimal('3.1415926'))
self.assertRaisesMessage(
TypeError,
"unsupported operand type(s) for +: 'Decimal' and 'Fraction'",
operator.add, Decimal('3.1415926'), F(3,11))
self.assertRaises(TypeError, operator.add,
F(3,11), Decimal('3.1415926'))
self.assertRaises(TypeError, operator.add,
Decimal('3.1415926'), F(3,11))
def testComparisons(self):
self.assertTrue(F(1, 2) < F(2, 3))
......
......@@ -150,7 +150,7 @@ class ComparisonTest(unittest.TestCase):
# int, float, Fraction, Decimal
test_values = [
float('-inf'),
D('-1e999999999'),
D('-1e425000000'),
-1e308,
F(-22, 7),
-3.14,
......
......@@ -30,6 +30,10 @@ Core and Builtins
Library
-------
- Issue #7652: Integrate the decimal floating point libmpdec library to speed
up the decimal module. Performance gains of the new C implementation are
between 12x and 80x, depending on the application.
- Issue #3573: IDLE hangs when passing invalid command line args
(directory(ies) instead of file(s)) (Patch by Guilherme Polo)
......
......@@ -456,3 +456,16 @@
fun:PyUnicode_FSConverter
}
# Additional suppressions for the unified decimal tests:
{
test_decimal
Memcheck:Addr4
fun:PyUnicodeUCS2_FSConverter
}
{
test_decimal2
Memcheck:Addr4
fun:PyUnicode_FSConverter
}
Normal priority:
----------------
1) Add C-API importable as capsule.
2) Add --with-system-libmpdec to ./configure.
3) Use same default emin/emax on 32-bit (MAX_EMAX=425000000) and 64-bit
(MAX_EMAX=10**18-1).
4) Order of arguments in Context().
5) Documentation.
6) quantize()
- exp argument is misleading:
Decimal('0.321000e+2').quantize(exp=9) -> user might expect
that the result will have exp=9.
- watchexp
7) Match the exception hierarchy of decimal.py:
exceptions.ArithmeticError(exceptions.Exception)
DecimalException
Clamped
DivisionByZero(DecimalException, exceptions.ZeroDivisionError)
Inexact
Overflow(Inexact, Rounded)
Underflow(Inexact, Rounded, Subnormal)
InvalidOperation
Rounded
Subnormal
FloatOperation
Low priority:
-------------
1) Convert tabs (wait until commit).
2) Pre-ANSI compilers require '#' in the first column (should be done
for the whole Python source tree if we support such compilers). (?)
3) FETCH_CURRENT_CONTEXT vs. CURRENT_CONTEXT?
4) Justify remaining uses of exit on overflow in bignum arith. Short
answer: with correct context values the coefficients never get big
enough for that to happen.
5) Justify remaining uses of abort() in mpdecimal: These uses are
for debug purposes and can't be reached when the library is used
correctly.
About
=====
_decimal.c is a wrapper for the libmpdec library. libmpdec is a fast C
library for correctly-rounded arbitrary precision decimal floating point
arithmetic. It is a complete implementation of Mike Cowlishaw/IBM's
General Decimal Arithmetic Specification.
Build process for the module
============================
As usual, the build process for _decimal.so is driven by setup.py in the top
level directory. setup.py autodetects the following build configurations:
1) x64 - 64-bit Python, x86_64 processor (AMD, Intel)
2) uint128 - 64-bit Python, compiler provides __uint128_t (gcc)
3) ansi64 - 64-bit Python, ANSI C
4) ppro - 32-bit Python, x86 CPU, PentiumPro or later
5) ansi32 - 32-bit Python, ANSI C
6) ansi-legacy - 32-bit Python, compiler without uint64_t
7) universal - Mac OS only (multi-arch)
It is possible to override autodetection by exporting:
PYTHON_DECIMAL_WITH_MACHINE=value, where value is one of the above options.
NOTE
====
decimal.so is not built from a static libmpdec.a since doing so led to
failures on AIX (user report) and Windows (mixing static and dynamic CRTs
causes locale problems and more).
This diff is collapsed.
This diff is collapsed.
libmpdec
========
libmpdec is a fast C/C++ library for correctly-rounded arbitrary precision
decimal floating point arithmetic. It is a complete implementation of
Mike Cowlishaw/IBM's General Decimal Arithmetic Specification.
Files required for the Python _decimal module
=============================================
Core files for small and medium precision arithmetic
----------------------------------------------------
basearith.{c,h} -> Core arithmetic in base 10**9 or 10**19.
bits.h -> Portable detection of least/most significant one-bit.
constants.{c,h} -> Constants that are used in multiple files.
context.c -> Context functions.
io.{c,h} -> Conversions between mpd_t and ASCII strings,
mpd_t formatting (allows UTF-8 fill character).
memory.{c,h} -> Allocation handlers with overflow detection
and functions for switching between static
and dynamic mpd_t.
mpdecimal.{c,h} -> All (quiet) functions of the specification.
typearith.h -> Fast primitives for double word multiplication,
division etc.
Visual Studio only:
~~~~~~~~~~~~~~~~~~~
vccompat.h -> snprintf <==> sprintf_s and similar things.
vcstdint.h -> stdint.h (included in VS 2010 but not in VS 2008).
vcdiv64.asm -> Double word division used in typearith.h. VS 2008 does
not allow inline asm for x64. Also, it does not provide
an intrinsic for double word division.
Files for bignum arithmetic:
----------------------------
The following files implement the Fast Number Theoretic Transform
used for multiplying coefficients with more than 1024 words (see
mpdecimal.c: _mpd_fntmul()).
umodarith.h -> Fast low level routines for unsigned modular arithmetic.
numbertheory.{c,h} -> Routines for setting up the Number Theoretic Transform.
difradix2.{c,h} -> Decimation in frequency transform, used as the
"base case" by the following three files:
fnt.{c,h} -> Transform arrays up to 4096 words.
sixstep.{c,h} -> Transform larger arrays of length 2**n.
fourstep.{c,h} -> Transform larger arrays of length 3 * 2**n.
convolute.{c,h} -> Fast convolution using one of the three transform
functions.
transpose.{c,h} -> Transpositions needed for the sixstep algorithm.
crt.{c,h} -> Chinese Remainder Theorem: use information from three
transforms modulo three different primes to get the
final result.
Pointers to literature, proofs and more
=======================================
literature/
-----------
REFERENCES.txt -> List of relevant papers.
bignum.txt -> Explanation of the Fast Number Theoretic Transform (FNT).
fnt.py -> Verify constants used in the FNT; Python demo for the
O(N**2) discrete transform.
matrix-transform.txt -> Proof for the Matrix Fourier Transform used in
fourstep.c.
six-step.txt -> Show that the algorithm used in sixstep.c is
a variant of the Matrix Fourier Transform.
mulmod-64.txt -> Proof for the mulmod64 algorithm from
umodarith.h.
mulmod-ppro.txt -> Proof for the x87 FPU modular multiplication
from umodarith.h.
umodarith.lisp -> ACL2 proofs for many functions from umodarith.h.
Library Author
==============
Stefan Krah <skrah@bytereef.org>
This diff is collapsed.
/*
* Copyright (c) 2008-2012 Stefan Krah. All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
*
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
*
* THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS "AS IS" AND
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*/
#ifndef BASEARITH_H
#define BASEARITH_H
#include "mpdecimal.h"
#include <stdio.h>
#include "typearith.h"
mpd_uint_t _mpd_baseadd(mpd_uint_t *w, const mpd_uint_t *u, const mpd_uint_t *v,
mpd_size_t m, mpd_size_t n);
void _mpd_baseaddto(mpd_uint_t *w, const mpd_uint_t *u, mpd_size_t n);
mpd_uint_t _mpd_shortadd(mpd_uint_t *w, mpd_size_t m, mpd_uint_t v);
mpd_uint_t _mpd_shortadd_b(mpd_uint_t *w, mpd_size_t m, mpd_uint_t v,
mpd_uint_t b);
mpd_uint_t _mpd_baseincr(mpd_uint_t *u, mpd_size_t n);
void _mpd_basesub(mpd_uint_t *w, const mpd_uint_t *u, const mpd_uint_t *v,
mpd_size_t m, mpd_size_t n);
void _mpd_basesubfrom(mpd_uint_t *w, const mpd_uint_t *u, mpd_size_t n);
void _mpd_basemul(mpd_uint_t *w, const mpd_uint_t *u, const mpd_uint_t *v,
mpd_size_t m, mpd_size_t n);
void _mpd_shortmul(mpd_uint_t *w, const mpd_uint_t *u, mpd_size_t n,
mpd_uint_t v);
void _mpd_shortmul_b(mpd_uint_t *w, const mpd_uint_t *u, mpd_size_t n,
mpd_uint_t v, mpd_uint_t b);
mpd_uint_t _mpd_shortdiv(mpd_uint_t *w, const mpd_uint_t *u, mpd_size_t n,
mpd_uint_t v);
mpd_uint_t _mpd_shortdiv_b(mpd_uint_t *w, const mpd_uint_t *u, mpd_size_t n,
mpd_uint_t v, mpd_uint_t b);
int _mpd_basedivmod(mpd_uint_t *q, mpd_uint_t *r, const mpd_uint_t *uconst,
const mpd_uint_t *vconst, mpd_size_t nplusm, mpd_size_t n);
void _mpd_baseshiftl(mpd_uint_t *dest, mpd_uint_t *src, mpd_size_t n,
mpd_size_t m, mpd_size_t shift);
mpd_uint_t _mpd_baseshiftr(mpd_uint_t *dest, mpd_uint_t *src, mpd_size_t slen,
mpd_size_t shift);
#ifdef CONFIG_64
extern const mpd_uint_t mprime_rdx;
/*
* Algorithm from: Division by Invariant Integers using Multiplication,
* T. Granlund and P. L. Montgomery, Proceedings of the SIGPLAN '94
* Conference on Programming Language Design and Implementation.
*
* http://gmplib.org/~tege/divcnst-pldi94.pdf
*
* Variables from the paper and their translations (See section 8):
*
* N := 64
* d := MPD_RADIX
* l := 64
* m' := floor((2**(64+64) - 1)/MPD_RADIX) - 2**64
*
* Since N-l == 0:
*
* dnorm := d
* n2 := hi
* n10 := lo
*
* ACL2 proof: mpd-div-words-r-correct
*/
static inline void
_mpd_div_words_r(mpd_uint_t *q, mpd_uint_t *r, mpd_uint_t hi, mpd_uint_t lo)
{
mpd_uint_t n_adj, h, l, t;
mpd_uint_t n1_neg;
/* n1_neg = if lo >= 2**63 then MPD_UINT_MAX else 0 */
n1_neg = (lo & (1ULL<<63)) ? MPD_UINT_MAX : 0;
/* n_adj = if lo >= 2**63 then lo+MPD_RADIX else lo */
n_adj = lo + (n1_neg & MPD_RADIX);
/* (h, l) = if lo >= 2**63 then m'*(hi+1) else m'*hi */
_mpd_mul_words(&h, &l, mprime_rdx, hi-n1_neg);
l = l + n_adj;
if (l < n_adj) h++;
t = h + hi;
/* At this point t == qest, with q == qest or q == qest+1:
* 1) 0 <= 2**64*hi + lo - qest*MPD_RADIX < 2*MPD_RADIX
*/
/* t = 2**64-1 - qest = 2**64 - (qest+1) */
t = MPD_UINT_MAX - t;
/* (h, l) = 2**64*MPD_RADIX - (qest+1)*MPD_RADIX */
_mpd_mul_words(&h, &l, t, MPD_RADIX);
l = l + lo;
if (l < lo) h++;
h += hi;
h -= MPD_RADIX;
/* (h, l) = 2**64*hi + lo - (qest+1)*MPD_RADIX (mod 2**128)
* Case q == qest+1:
* a) h == 0, l == r
* b) q := h - t == qest+1
* c) r := l
* Case q == qest:
* a) h == MPD_UINT_MAX, l == 2**64-(MPD_RADIX-r)
* b) q := h - t == qest
* c) r := l + MPD_RADIX = r
*/
*q = (h - t);
*r = l + (MPD_RADIX & h);
}
#else
static inline void
_mpd_div_words_r(mpd_uint_t *q, mpd_uint_t *r, mpd_uint_t hi, mpd_uint_t lo)
{
_mpd_div_words(q, r, hi, lo, MPD_RADIX);
}
#endif
/* Multiply two single base MPD_RADIX words, store result in array w[2]. */
static inline void
_mpd_singlemul(mpd_uint_t w[2], mpd_uint_t u, mpd_uint_t v)
{
mpd_uint_t hi, lo;
_mpd_mul_words(&hi, &lo, u, v);
_mpd_div_words_r(&w[1], &w[0], hi, lo);
}
/* Multiply u (len 2) and v (len m, 1 <= m <= 2). */
static inline void
_mpd_mul_2_le2(mpd_uint_t w[4], mpd_uint_t u[2], mpd_uint_t v[2], mpd_ssize_t m)
{
mpd_uint_t hi, lo;
_mpd_mul_words(&hi, &lo, u[0], v[0]);
_mpd_div_words_r(&w[1], &w[0], hi, lo);
_mpd_mul_words(&hi, &lo, u[1], v[0]);
lo = w[1] + lo;
if (lo < w[1]) hi++;
_mpd_div_words_r(&w[2], &w[1], hi, lo);
if (m == 1) return;
_mpd_mul_words(&hi, &lo, u[0], v[1]);
lo = w[1] + lo;
if (lo < w[1]) hi++;
_mpd_div_words_r(&w[3], &w[1], hi, lo);
_mpd_mul_words(&hi, &lo, u[1], v[1]);
lo = w[2] + lo;
if (lo < w[2]) hi++;
lo = w[3] + lo;
if (lo < w[3]) hi++;
_mpd_div_words_r(&w[3], &w[2], hi, lo);
}
/*
* Test if all words from data[len-1] to data[0] are zero. If len is 0, nothing
* is tested and the coefficient is regarded as "all zero".
*/
static inline int
_mpd_isallzero(const mpd_uint_t *data, mpd_ssize_t len)
{
while (--len >= 0) {
if (data[len] != 0) return 0;
}
return 1;
}
/*
* Test if all full words from data[len-1] to data[0] are MPD_RADIX-1
* (all nines). Return true if len == 0.
*/
static inline int
_mpd_isallnine(const mpd_uint_t *data, mpd_ssize_t len)
{
while (--len >= 0) {
if (data[len] != MPD_RADIX-1) return 0;
}
return 1;
}
#endif /* BASEARITH_H */
/*
* Copyright (c) 2008-2012 Stefan Krah. All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
*
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
*
* THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS "AS IS" AND
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*/
#ifndef BITS_H
#define BITS_H
#include "mpdecimal.h"
#include <stdio.h>
/* Check if n is a power of 2. */
static inline int
ispower2(mpd_size_t n)
{
return n != 0 && (n & (n-1)) == 0;
}
#if defined(ANSI)
/*
* Return the most significant bit position of n from 0 to 31 (63).
* Assumptions: n != 0.
*/
static inline int
mpd_bsr(mpd_size_t n)
{
int pos = 0;
mpd_size_t tmp;
#ifdef CONFIG_64
tmp = n >> 32;
if (tmp != 0) { n = tmp; pos += 32; }
#endif
tmp = n >> 16;
if (tmp != 0) { n = tmp; pos += 16; }
tmp = n >> 8;
if (tmp != 0) { n = tmp; pos += 8; }
tmp = n >> 4;
if (tmp != 0) { n = tmp; pos += 4; }
tmp = n >> 2;
if (tmp != 0) { n = tmp; pos += 2; }
tmp = n >> 1;
if (tmp != 0) { n = tmp; pos += 1; }
return pos + (int)n - 1;
}
/*
* Return the least significant bit position of n from 0 to 31 (63).
* Assumptions: n != 0.
*/
static inline int
mpd_bsf(mpd_size_t n)
{
int pos;
#ifdef CONFIG_64
pos = 63;
if (n & 0x00000000FFFFFFFFULL) { pos -= 32; } else { n >>= 32; }
if (n & 0x000000000000FFFFULL) { pos -= 16; } else { n >>= 16; }
if (n & 0x00000000000000FFULL) { pos -= 8; } else { n >>= 8; }
if (n & 0x000000000000000FULL) { pos -= 4; } else { n >>= 4; }
if (n & 0x0000000000000003ULL) { pos -= 2; } else { n >>= 2; }
if (n & 0x0000000000000001ULL) { pos -= 1; }
#else
pos = 31;
if (n & 0x000000000000FFFFUL) { pos -= 16; } else { n >>= 16; }
if (n & 0x00000000000000FFUL) { pos -= 8; } else { n >>= 8; }
if (n & 0x000000000000000FUL) { pos -= 4; } else { n >>= 4; }
if (n & 0x0000000000000003UL) { pos -= 2; } else { n >>= 2; }
if (n & 0x0000000000000001UL) { pos -= 1; }
#endif
return pos;
}
/* END ANSI */
#elif defined(ASM)
/*
* Bit scan reverse. Assumptions: a != 0.
*/
static inline int
mpd_bsr(mpd_size_t a)
{
mpd_size_t retval;
__asm__ (
#ifdef CONFIG_64
"bsrq %1, %0\n\t"
#else
"bsr %1, %0\n\t"
#endif
:"=r" (retval)
:"r" (a)
:"cc"
);
return (int)retval;
}
/*
* Bit scan forward. Assumptions: a != 0.
*/
static inline int
mpd_bsf(mpd_size_t a)
{
mpd_size_t retval;
__asm__ (
#ifdef CONFIG_64
"bsfq %1, %0\n\t"
#else
"bsf %1, %0\n\t"
#endif
:"=r" (retval)
:"r" (a)
:"cc"
);
return (int)retval;
}
/* END ASM */
#elif defined(MASM)
#include <intrin.h>
/*
* Bit scan reverse. Assumptions: a != 0.
*/
static inline int __cdecl
mpd_bsr(mpd_size_t a)
{
unsigned long retval;
#ifdef CONFIG_64
_BitScanReverse64(&retval, a);
#else
_BitScanReverse(&retval, a);
#endif
return (int)retval;
}
/*
* Bit scan forward. Assumptions: a != 0.
*/
static inline int __cdecl
mpd_bsf(mpd_size_t a)
{
unsigned long retval;
#ifdef CONFIG_64
_BitScanForward64(&retval, a);
#else
_BitScanForward(&retval, a);
#endif
return (int)retval;
}
/* END MASM (_MSC_VER) */
#else
#error "missing preprocessor definitions"
#endif /* BSR/BSF */
#endif /* BITS_H */
/*
* Copyright (c) 2008-2012 Stefan Krah. All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
*
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
*
* THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS "AS IS" AND
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*/
#include "mpdecimal.h"
#include <stdio.h>
#include "constants.h"
#if defined(CONFIG_64)
/* number-theory.c */
const mpd_uint_t mpd_moduli[3] = {
18446744069414584321ULL, 18446744056529682433ULL, 18446742974197923841ULL
};
const mpd_uint_t mpd_roots[3] = {7ULL, 10ULL, 19ULL};
/* crt.c */
const mpd_uint_t INV_P1_MOD_P2 = 18446744055098026669ULL;
const mpd_uint_t INV_P1P2_MOD_P3 = 287064143708160ULL;
const mpd_uint_t LH_P1P2 = 18446744052234715137ULL; /* (P1*P2) % 2^64 */
const mpd_uint_t UH_P1P2 = 18446744052234715141ULL; /* (P1*P2) / 2^64 */
/* transpose.c */
const mpd_size_t mpd_bits[64] = {
1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16384,
32768, 65536, 131072, 262144, 524288, 1048576, 2097152, 4194304, 8388608,
16777216, 33554432, 67108864, 134217728, 268435456, 536870912, 1073741824,
2147483648ULL, 4294967296ULL, 8589934592ULL, 17179869184ULL, 34359738368ULL,
68719476736ULL, 137438953472ULL, 274877906944ULL, 549755813888ULL,
1099511627776ULL, 2199023255552ULL, 4398046511104, 8796093022208ULL,
17592186044416ULL, 35184372088832ULL, 70368744177664ULL, 140737488355328ULL,
281474976710656ULL, 562949953421312ULL, 1125899906842624ULL,
2251799813685248ULL, 4503599627370496ULL, 9007199254740992ULL,
18014398509481984ULL, 36028797018963968ULL, 72057594037927936ULL,
144115188075855872ULL, 288230376151711744ULL, 576460752303423488ULL,
1152921504606846976ULL, 2305843009213693952ULL, 4611686018427387904ULL,
9223372036854775808ULL
};
/* mpdecimal.c */
const mpd_uint_t mpd_pow10[MPD_RDIGITS+1] = {
1,10,100,1000,10000,100000,1000000,10000000,100000000,1000000000,
10000000000ULL,100000000000ULL,1000000000000ULL,10000000000000ULL,
100000000000000ULL,1000000000000000ULL,10000000000000000ULL,
100000000000000000ULL,1000000000000000000ULL,10000000000000000000ULL
};
/* magic number for constant division by MPD_RADIX */
const mpd_uint_t mprime_rdx = 15581492618384294730ULL;
#elif defined(CONFIG_32)
/* number-theory.c */
const mpd_uint_t mpd_moduli[3] = {2113929217UL, 2013265921UL, 1811939329UL};
const mpd_uint_t mpd_roots[3] = {5UL, 31UL, 13UL};
/* PentiumPro modular multiplication: These constants have to be loaded as
* 80 bit long doubles, which are not supported by certain compilers. */
const uint32_t mpd_invmoduli[3][3] = {
{4293885170U, 2181570688U, 16352U}, /* ((long double) 1 / 2113929217UL) */
{1698898177U, 2290649223U, 16352U}, /* ((long double) 1 / 2013265921UL) */
{2716021846U, 2545165803U, 16352U} /* ((long double) 1 / 1811939329UL) */
};
const float MPD_TWO63 = 9223372036854775808.0; /* 2^63 */
/* crt.c */
const mpd_uint_t INV_P1_MOD_P2 = 2013265901UL;
const mpd_uint_t INV_P1P2_MOD_P3 = 54UL;
const mpd_uint_t LH_P1P2 = 4127195137UL; /* (P1*P2) % 2^32 */
const mpd_uint_t UH_P1P2 = 990904320UL; /* (P1*P2) / 2^32 */
/* transpose.c */
const mpd_size_t mpd_bits[32] = {
1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16384,
32768, 65536, 131072, 262144, 524288, 1048576, 2097152, 4194304, 8388608,
16777216, 33554432, 67108864, 134217728, 268435456, 536870912, 1073741824,
2147483648UL
};
/* mpdecimal.c */
const mpd_uint_t mpd_pow10[MPD_RDIGITS+1] = {
1,10,100,1000,10000,100000,1000000,10000000,100000000,1000000000
};
#else
#error "CONFIG_64 or CONFIG_32 must be defined."
#endif
const char *mpd_round_string[MPD_ROUND_GUARD] = {
"ROUND_UP", /* round away from 0 */
"ROUND_DOWN", /* round toward 0 (truncate) */
"ROUND_CEILING", /* round toward +infinity */
"ROUND_FLOOR", /* round toward -infinity */
"ROUND_HALF_UP", /* 0.5 is rounded up */
"ROUND_HALF_DOWN", /* 0.5 is rounded down */
"ROUND_HALF_EVEN", /* 0.5 is rounded to even */
"ROUND_05UP", /* round zero or five away from 0 */
"ROUND_TRUNC", /* truncate, but set infinity */
};
const char *mpd_clamp_string[MPD_CLAMP_GUARD] = {
"CLAMP_DEFAULT",
"CLAMP_IEEE_754"
};
/*
* Copyright (c) 2008-2012 Stefan Krah. All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
*
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
*
* THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS "AS IS" AND
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*/
#ifndef CONSTANTS_H
#define CONSTANTS_H
#include "mpdecimal.h"
/* choice of optimized functions */
#if defined(CONFIG_64)
/* x64 */
#define MULMOD(a, b) x64_mulmod(a, b, umod)
#define MULMOD2C(a0, a1, w) x64_mulmod2c(a0, a1, w, umod)
#define MULMOD2(a0, b0, a1, b1) x64_mulmod2(a0, b0, a1, b1, umod)
#define POWMOD(base, exp) x64_powmod(base, exp, umod)
#define SETMODULUS(modnum) std_setmodulus(modnum, &umod)
#define SIZE3_NTT(x0, x1, x2, w3table) std_size3_ntt(x0, x1, x2, w3table, umod)
#elif defined(PPRO)
/* PentiumPro (or later) gcc inline asm */
#define MULMOD(a, b) ppro_mulmod(a, b, &dmod, dinvmod)
#define MULMOD2C(a0, a1, w) ppro_mulmod2c(a0, a1, w, &dmod, dinvmod)
#define MULMOD2(a0, b0, a1, b1) ppro_mulmod2(a0, b0, a1, b1, &dmod, dinvmod)
#define POWMOD(base, exp) ppro_powmod(base, exp, &dmod, dinvmod)
#define SETMODULUS(modnum) ppro_setmodulus(modnum, &umod, &dmod, dinvmod)
#define SIZE3_NTT(x0, x1, x2, w3table) ppro_size3_ntt(x0, x1, x2, w3table, umod, &dmod, dinvmod)
#else
/* ANSI C99 */
#define MULMOD(a, b) std_mulmod(a, b, umod)
#define MULMOD2C(a0, a1, w) std_mulmod2c(a0, a1, w, umod)
#define MULMOD2(a0, b0, a1, b1) std_mulmod2(a0, b0, a1, b1, umod)
#define POWMOD(base, exp) std_powmod(base, exp, umod)
#define SETMODULUS(modnum) std_setmodulus(modnum, &umod)
#define SIZE3_NTT(x0, x1, x2, w3table) std_size3_ntt(x0, x1, x2, w3table, umod)
#endif
/* PentiumPro (or later) gcc inline asm */
extern const float MPD_TWO63;
extern const uint32_t mpd_invmoduli[3][3];
enum {P1, P2, P3};
extern const mpd_uint_t mpd_moduli[];
extern const mpd_uint_t mpd_roots[];
extern const mpd_size_t mpd_bits[];
extern const mpd_uint_t mpd_pow10[];
extern const mpd_uint_t INV_P1_MOD_P2;
extern const mpd_uint_t INV_P1P2_MOD_P3;
extern const mpd_uint_t LH_P1P2;
extern const mpd_uint_t UH_P1P2;
#endif /* CONSTANTS_H */
/*
* Copyright (c) 2008-2012 Stefan Krah. All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
*
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
*
* THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS "AS IS" AND
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*/
#include "mpdecimal.h"
#include <stdio.h>
#include <string.h>
#include <signal.h>
void
mpd_dflt_traphandler(mpd_context_t *ctx UNUSED)
{
raise(SIGFPE);
}
void (* mpd_traphandler)(mpd_context_t *) = mpd_dflt_traphandler;
/* Set guaranteed minimum number of coefficient words. The function may
be used once at program start. Setting MPD_MINALLOC to out-of-bounds
values is a catastrophic error, so in that case the function exits rather
than relying on the user to check a return value. */
void
mpd_setminalloc(mpd_ssize_t n)
{
static int minalloc_is_set = 0;
if (minalloc_is_set) {
mpd_err_warn("mpd_setminalloc: ignoring request to set "
"MPD_MINALLOC a second time\n");
return;
}
if (n < MPD_MINALLOC_MIN || n > MPD_MINALLOC_MAX) {
mpd_err_fatal("illegal value for MPD_MINALLOC"); /* GCOV_NOT_REACHED */
}
MPD_MINALLOC = n;
minalloc_is_set = 1;
}
void
mpd_init(mpd_context_t *ctx, mpd_ssize_t prec)
{
mpd_ssize_t ideal_minalloc;
mpd_defaultcontext(ctx);
if (!mpd_qsetprec(ctx, prec)) {
mpd_addstatus_raise(ctx, MPD_Invalid_context);
return;
}
ideal_minalloc = 2 * ((prec+MPD_RDIGITS-1) / MPD_RDIGITS);
if (ideal_minalloc < MPD_MINALLOC_MIN) ideal_minalloc = MPD_MINALLOC_MIN;
if (ideal_minalloc > MPD_MINALLOC_MAX) ideal_minalloc = MPD_MINALLOC_MAX;
mpd_setminalloc(ideal_minalloc);
}
void
mpd_maxcontext(mpd_context_t *ctx)
{
ctx->prec=MPD_MAX_PREC;
ctx->emax=MPD_MAX_EMAX;
ctx->emin=MPD_MIN_EMIN;
ctx->round=MPD_ROUND_HALF_EVEN;
ctx->traps=MPD_Traps;
ctx->status=0;
ctx->newtrap=0;
ctx->clamp=0;
ctx->allcr=1;
}
void
mpd_defaultcontext(mpd_context_t *ctx)
{
ctx->prec=2*MPD_RDIGITS;
ctx->emax=MPD_MAX_EMAX;
ctx->emin=MPD_MIN_EMIN;
ctx->round=MPD_ROUND_HALF_UP;
ctx->traps=MPD_Traps;
ctx->status=0;
ctx->newtrap=0;
ctx->clamp=0;
ctx->allcr=1;
}
void
mpd_basiccontext(mpd_context_t *ctx)
{
ctx->prec=9;
ctx->emax=MPD_MAX_EMAX;
ctx->emin=MPD_MIN_EMIN;
ctx->round=MPD_ROUND_HALF_UP;
ctx->traps=MPD_Traps|MPD_Clamped;
ctx->status=0;
ctx->newtrap=0;
ctx->clamp=0;
ctx->allcr=1;
}
int
mpd_ieee_context(mpd_context_t *ctx, int bits)
{
if (bits <= 0 || bits > MPD_IEEE_CONTEXT_MAX_BITS || bits % 32) {
return -1;
}
ctx->prec = 9 * (bits/32) - 2;
ctx->emax = 3 * ((mpd_ssize_t)1<<(bits/16+3));
ctx->emin = 1 - ctx->emax;
ctx->round=MPD_ROUND_HALF_EVEN;
ctx->traps=0;
ctx->status=0;
ctx->newtrap=0;
ctx->clamp=1;
ctx->allcr=1;
return 0;
}
mpd_ssize_t
mpd_getprec(const mpd_context_t *ctx)
{
return ctx->prec;
}
mpd_ssize_t
mpd_getemax(const mpd_context_t *ctx)
{
return ctx->emax;
}
mpd_ssize_t
mpd_getemin(const mpd_context_t *ctx)
{
return ctx->emin;
}
int
mpd_getround(const mpd_context_t *ctx)
{
return ctx->round;
}
uint32_t
mpd_gettraps(const mpd_context_t *ctx)
{
return ctx->traps;
}
uint32_t
mpd_getstatus(const mpd_context_t *ctx)
{
return ctx->status;
}
int
mpd_getclamp(const mpd_context_t *ctx)
{
return ctx->clamp;
}
int
mpd_getcr(const mpd_context_t *ctx)
{
return ctx->allcr;
}
int
mpd_qsetprec(mpd_context_t *ctx, mpd_ssize_t prec)
{
if (prec <= 0 || prec > MPD_MAX_PREC) {
return 0;
}
ctx->prec = prec;
return 1;
}
int
mpd_qsetemax(mpd_context_t *ctx, mpd_ssize_t emax)
{
if (emax < 0 || emax > MPD_MAX_EMAX) {
return 0;
}
ctx->emax = emax;
return 1;
}
int
mpd_qsetemin(mpd_context_t *ctx, mpd_ssize_t emin)
{
if (emin > 0 || emin < MPD_MIN_EMIN) {
return 0;
}
ctx->emin = emin;
return 1;
}
int
mpd_qsetround(mpd_context_t *ctx, int round)
{
if (!(0 <= round && round < MPD_ROUND_GUARD)) {
return 0;
}
ctx->round = round;
return 1;
}
int
mpd_qsettraps(mpd_context_t *ctx, uint32_t traps)
{
if (traps > MPD_Max_status) {
return 0;
}
ctx->traps = traps;
return 1;
}
int
mpd_qsetstatus(mpd_context_t *ctx, uint32_t flags)
{
if (flags > MPD_Max_status) {
return 0;
}
ctx->status = flags;
return 1;
}
int
mpd_qsetclamp(mpd_context_t *ctx, int c)
{
if (c != 0 && c != 1) {
return 0;
}
ctx->clamp = c;
return 1;
}
int
mpd_qsetcr(mpd_context_t *ctx, int c)
{
if (c != 0 && c != 1) {
return 0;
}
ctx->allcr = c;
return 1;
}
void
mpd_addstatus_raise(mpd_context_t *ctx, uint32_t flags)
{
ctx->status |= flags;
if (flags&ctx->traps) {
ctx->newtrap = (flags&ctx->traps);
mpd_traphandler(ctx);
}
}
/*
* Copyright (c) 2008-2012 Stefan Krah. All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
*
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
*
* THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS "AS IS" AND
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*/
#include "mpdecimal.h"
#include <stdio.h>
#include "bits.h"
#include "constants.h"
#include "fnt.h"
#include "fourstep.h"
#include "numbertheory.h"
#include "sixstep.h"
#include "umodarith.h"
#include "convolute.h"
/* Bignum: Fast convolution using the Number Theoretic Transform. Used for
the multiplication of very large coefficients. */
/* Convolute the data in c1 and c2. Result is in c1. */
int
fnt_convolute(mpd_uint_t *c1, mpd_uint_t *c2, mpd_size_t n, int modnum)
{
int (*fnt)(mpd_uint_t *, mpd_size_t, int);
int (*inv_fnt)(mpd_uint_t *, mpd_size_t, int);
#ifdef PPRO
double dmod;
uint32_t dinvmod[3];
#endif
mpd_uint_t n_inv, umod;
mpd_size_t i;
SETMODULUS(modnum);
n_inv = POWMOD(n, (umod-2));
if (ispower2(n)) {
if (n > SIX_STEP_THRESHOLD) {
fnt = six_step_fnt;
inv_fnt = inv_six_step_fnt;
}
else {
fnt = std_fnt;
inv_fnt = std_inv_fnt;
}
}
else {
fnt = four_step_fnt;
inv_fnt = inv_four_step_fnt;
}
if (!fnt(c1, n, modnum)) {
return 0;
}
if (!fnt(c2, n, modnum)) {
return 0;
}
for (i = 0; i < n-1; i += 2) {
mpd_uint_t x0 = c1[i];
mpd_uint_t y0 = c2[i];
mpd_uint_t x1 = c1[i+1];
mpd_uint_t y1 = c2[i+1];
MULMOD2(&x0, y0, &x1, y1);
c1[i] = x0;
c1[i+1] = x1;
}
if (!inv_fnt(c1, n, modnum)) {
return 0;
}
for (i = 0; i < n-3; i += 4) {
mpd_uint_t x0 = c1[i];
mpd_uint_t x1 = c1[i+1];
mpd_uint_t x2 = c1[i+2];
mpd_uint_t x3 = c1[i+3];
MULMOD2C(&x0, &x1, n_inv);
MULMOD2C(&x2, &x3, n_inv);
c1[i] = x0;
c1[i+1] = x1;
c1[i+2] = x2;
c1[i+3] = x3;
}
return 1;
}
/* Autoconvolute the data in c1. Result is in c1. */
int
fnt_autoconvolute(mpd_uint_t *c1, mpd_size_t n, int modnum)
{
int (*fnt)(mpd_uint_t *, mpd_size_t, int);
int (*inv_fnt)(mpd_uint_t *, mpd_size_t, int);
#ifdef PPRO
double dmod;
uint32_t dinvmod[3];
#endif
mpd_uint_t n_inv, umod;
mpd_size_t i;
SETMODULUS(modnum);
n_inv = POWMOD(n, (umod-2));
if (ispower2(n)) {
if (n > SIX_STEP_THRESHOLD) {
fnt = six_step_fnt;
inv_fnt = inv_six_step_fnt;
}
else {
fnt = std_fnt;
inv_fnt = std_inv_fnt;
}
}
else {
fnt = four_step_fnt;
inv_fnt = inv_four_step_fnt;
}
if (!fnt(c1, n, modnum)) {
return 0;
}
for (i = 0; i < n-1; i += 2) {
mpd_uint_t x0 = c1[i];
mpd_uint_t x1 = c1[i+1];
MULMOD2(&x0, x0, &x1, x1);
c1[i] = x0;
c1[i+1] = x1;
}
if (!inv_fnt(c1, n, modnum)) {
return 0;
}
for (i = 0; i < n-3; i += 4) {
mpd_uint_t x0 = c1[i];
mpd_uint_t x1 = c1[i+1];
mpd_uint_t x2 = c1[i+2];
mpd_uint_t x3 = c1[i+3];
MULMOD2C(&x0, &x1, n_inv);
MULMOD2C(&x2, &x3, n_inv);
c1[i] = x0;
c1[i+1] = x1;
c1[i+2] = x2;
c1[i+3] = x3;
}
return 1;
}
/*
* Copyright (c) 2008-2012 Stefan Krah. All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
*
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
*
* THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS "AS IS" AND
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*/
#ifndef CONVOLUTE_H
#define CONVOLUTE_H
#include "mpdecimal.h"
#include <stdio.h>
#define SIX_STEP_THRESHOLD 4096
int fnt_convolute(mpd_uint_t *c1, mpd_uint_t *c2, mpd_size_t n, int modnum);
int fnt_autoconvolute(mpd_uint_t *c1, mpd_size_t n, int modnum);
#endif
/*
* Copyright (c) 2008-2012 Stefan Krah. All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
*
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
*
* THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS "AS IS" AND
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*/
#include "mpdecimal.h"
#include <stdio.h>
#include <assert.h>
#include "numbertheory.h"
#include "umodarith.h"
#include "crt.h"
/* Bignum: Chinese Remainder Theorem, extends the maximum transform length. */
/* Multiply P1P2 by v, store result in w. */
static inline void
_crt_mulP1P2_3(mpd_uint_t w[3], mpd_uint_t v)
{
mpd_uint_t hi1, hi2, lo;
_mpd_mul_words(&hi1, &lo, LH_P1P2, v);
w[0] = lo;
_mpd_mul_words(&hi2, &lo, UH_P1P2, v);
lo = hi1 + lo;
if (lo < hi1) hi2++;
w[1] = lo;
w[2] = hi2;
}
/* Add 3 words from v to w. The result is known to fit in w. */
static inline void
_crt_add3(mpd_uint_t w[3], mpd_uint_t v[3])
{
mpd_uint_t carry;
mpd_uint_t s;
s = w[0] + v[0];
carry = (s < w[0]);
w[0] = s;
s = w[1] + (v[1] + carry);
carry = (s < w[1]);
w[1] = s;
w[2] = w[2] + (v[2] + carry);
}
/* Divide 3 words in u by v, store result in w, return remainder. */
static inline mpd_uint_t
_crt_div3(mpd_uint_t *w, const mpd_uint_t *u, mpd_uint_t v)
{
mpd_uint_t r1 = u[2];
mpd_uint_t r2;
if (r1 < v) {
w[2] = 0;
}
else {
_mpd_div_word(&w[2], &r1, u[2], v); /* GCOV_NOT_REACHED */
}
_mpd_div_words(&w[1], &r2, r1, u[1], v);
_mpd_div_words(&w[0], &r1, r2, u[0], v);
return r1;
}
/*
* Chinese Remainder Theorem:
* Algorithm from Joerg Arndt, "Matters Computational",
* Chapter 37.4.1 [http://www.jjj.de/fxt/]
*
* See also Knuth, TAOCP, Volume 2, 4.3.2, exercise 7.
*/
/*
* CRT with carry: x1, x2, x3 contain numbers modulo p1, p2, p3. For each
* triple of members of the arrays, find the unique z modulo p1*p2*p3, with
* zmax = p1*p2*p3 - 1.
*
* In each iteration of the loop, split z into result[i] = z % MPD_RADIX
* and carry = z / MPD_RADIX. Let N be the size of carry[] and cmax the
* maximum carry.
*
* Limits for the 32-bit build:
*
* N = 2**96
* cmax = 7711435591312380274
*
* Limits for the 64 bit build:
*
* N = 2**192
* cmax = 627710135393475385904124401220046371710
*
* The following statements hold for both versions:
*
* 1) cmax + zmax < N, so the addition does not overflow.
*
* 2) (cmax + zmax) / MPD_RADIX == cmax.
*
* 3) If c <= cmax, then c_next = (c + zmax) / MPD_RADIX <= cmax.
*/
void
crt3(mpd_uint_t *x1, mpd_uint_t *x2, mpd_uint_t *x3, mpd_size_t rsize)
{
mpd_uint_t p1 = mpd_moduli[P1];
mpd_uint_t umod;
#ifdef PPRO
double dmod;
uint32_t dinvmod[3];
#endif
mpd_uint_t a1, a2, a3;
mpd_uint_t s;
mpd_uint_t z[3], t[3];
mpd_uint_t carry[3] = {0,0,0};
mpd_uint_t hi, lo;
mpd_size_t i;
for (i = 0; i < rsize; i++) {
a1 = x1[i];
a2 = x2[i];
a3 = x3[i];
SETMODULUS(P2);
s = ext_submod(a2, a1, umod);
s = MULMOD(s, INV_P1_MOD_P2);
_mpd_mul_words(&hi, &lo, s, p1);
lo = lo + a1;
if (lo < a1) hi++;
SETMODULUS(P3);
s = dw_submod(a3, hi, lo, umod);
s = MULMOD(s, INV_P1P2_MOD_P3);
z[0] = lo;
z[1] = hi;
z[2] = 0;
_crt_mulP1P2_3(t, s);
_crt_add3(z, t);
_crt_add3(carry, z);
x1[i] = _crt_div3(carry, carry, MPD_RADIX);
}
assert(carry[0] == 0 && carry[1] == 0 && carry[2] == 0);
}
/*
* Copyright (c) 2008-2012 Stefan Krah. All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
*
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
*
* THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS "AS IS" AND
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*/
#ifndef CRT_H
#define CRT_H
#include "mpdecimal.h"
#include <stdio.h>
void crt3(mpd_uint_t *x1, mpd_uint_t *x2, mpd_uint_t *x3, mpd_size_t nmemb);
#endif
/*
* Copyright (c) 2008-2012 Stefan Krah. All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
*
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
*
* THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS "AS IS" AND
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*/
#include "mpdecimal.h"
#include <stdio.h>
#include <assert.h>
#include "bits.h"
#include "numbertheory.h"
#include "umodarith.h"
#include "difradix2.h"
/* Bignum: The actual transform routine (decimation in frequency). */
/*
* Generate index pairs (x, bitreverse(x)) and carry out the permutation.
* n must be a power of two.
* Algorithm due to Brent/Lehmann, see Joerg Arndt, "Matters Computational",
* Chapter 1.14.4. [http://www.jjj.de/fxt/]
*/
static inline void
bitreverse_permute(mpd_uint_t a[], mpd_size_t n)
{
mpd_size_t x = 0;
mpd_size_t r = 0;
mpd_uint_t t;
do { /* Invariant: r = bitreverse(x) */
if (r > x) {
t = a[x];
a[x] = a[r];
a[r] = t;
}
/* Flip trailing consecutive 1 bits and the first zero bit
* that absorbs a possible carry. */
x += 1;
/* Mirror the operation on r: Flip n_trailing_zeros(x)+1
high bits of r. */
r ^= (n - (n >> (mpd_bsf(x)+1)));
/* The loop invariant is preserved. */
} while (x < n);
}
/* Fast Number Theoretic Transform, decimation in frequency. */
void
fnt_dif2(mpd_uint_t a[], mpd_size_t n, struct fnt_params *tparams)
{
mpd_uint_t *wtable = tparams->wtable;
mpd_uint_t umod;
#ifdef PPRO
double dmod;
uint32_t dinvmod[3];
#endif
mpd_uint_t u0, u1, v0, v1;
mpd_uint_t w, w0, w1, wstep;
mpd_size_t m, mhalf;
mpd_size_t j, r;
assert(ispower2(n));
assert(n >= 4);
SETMODULUS(tparams->modnum);
/* m == n */
mhalf = n / 2;
for (j = 0; j < mhalf; j += 2) {
w0 = wtable[j];
w1 = wtable[j+1];
u0 = a[j];
v0 = a[j+mhalf];
u1 = a[j+1];
v1 = a[j+1+mhalf];
a[j] = addmod(u0, v0, umod);
v0 = submod(u0, v0, umod);
a[j+1] = addmod(u1, v1, umod);
v1 = submod(u1, v1, umod);
MULMOD2(&v0, w0, &v1, w1);
a[j+mhalf] = v0;
a[j+1+mhalf] = v1;
}
wstep = 2;
for (m = n/2; m >= 2; m>>=1, wstep<<=1) {
mhalf = m / 2;
/* j == 0 */
for (r = 0; r < n; r += 2*m) {
u0 = a[r];
v0 = a[r+mhalf];
u1 = a[m+r];
v1 = a[m+r+mhalf];
a[r] = addmod(u0, v0, umod);
v0 = submod(u0, v0, umod);
a[m+r] = addmod(u1, v1, umod);
v1 = submod(u1, v1, umod);
a[r+mhalf] = v0;
a[m+r+mhalf] = v1;
}
for (j = 1; j < mhalf; j++) {
w = wtable[j*wstep];
for (r = 0; r < n; r += 2*m) {
u0 = a[r+j];
v0 = a[r+j+mhalf];
u1 = a[m+r+j];
v1 = a[m+r+j+mhalf];
a[r+j] = addmod(u0, v0, umod);
v0 = submod(u0, v0, umod);
a[m+r+j] = addmod(u1, v1, umod);
v1 = submod(u1, v1, umod);
MULMOD2C(&v0, &v1, w);
a[r+j+mhalf] = v0;
a[m+r+j+mhalf] = v1;
}
}
}
bitreverse_permute(a, n);
}
/*
* Copyright (c) 2008-2012 Stefan Krah. All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
*
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
*
* THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS "AS IS" AND
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*/
#ifndef DIF_RADIX2_H
#define DIF_RADIX2_H
#include "mpdecimal.h"
#include <stdio.h>
#include "numbertheory.h"
void fnt_dif2(mpd_uint_t a[], mpd_size_t n, struct fnt_params *tparams);
#endif
/*
* Copyright (c) 2008-2012 Stefan Krah. All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
*
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
*
* THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS "AS IS" AND
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*/
#include "mpdecimal.h"
#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
#include "bits.h"
#include "difradix2.h"
#include "numbertheory.h"
#include "fnt.h"
/* Bignum: Fast transform for medium-sized coefficients. */
/* forward transform, sign = -1 */
int
std_fnt(mpd_uint_t *a, mpd_size_t n, int modnum)
{
struct fnt_params *tparams;
assert(ispower2(n));
assert(n >= 4);
assert(n <= 3*MPD_MAXTRANSFORM_2N);
if ((tparams = _mpd_init_fnt_params(n, -1, modnum)) == NULL) {
return 0;
}
fnt_dif2(a, n, tparams);
mpd_free(tparams);
return 1;
}
/* reverse transform, sign = 1 */
int
std_inv_fnt(mpd_uint_t *a, mpd_size_t n, int modnum)
{
struct fnt_params *tparams;
assert(ispower2(n));
assert(n >= 4);
assert(n <= 3*MPD_MAXTRANSFORM_2N);
if ((tparams = _mpd_init_fnt_params(n, 1, modnum)) == NULL) {
return 0;
}
fnt_dif2(a, n, tparams);
mpd_free(tparams);
return 1;
}
/*
* Copyright (c) 2008-2012 Stefan Krah. All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
*
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
*
* THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS "AS IS" AND
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*/
#ifndef FNT_H
#define FNT_H
#include "mpdecimal.h"
#include <stdio.h>
int std_fnt(mpd_uint_t a[], mpd_size_t n, int modnum);
int std_inv_fnt(mpd_uint_t a[], mpd_size_t n, int modnum);
#endif
/*
* Copyright (c) 2008-2012 Stefan Krah. All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
*
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
*
* THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS "AS IS" AND
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*/
#include "mpdecimal.h"
#include <assert.h>
#include "numbertheory.h"
#include "sixstep.h"
#include "transpose.h"
#include "umodarith.h"
#include "fourstep.h"
/* Bignum: Cache efficient Matrix Fourier Transform for arrays of the
form 3 * 2**n (See literature/matrix-transform.txt). */
#ifndef PPRO
static inline void
std_size3_ntt(mpd_uint_t *x1, mpd_uint_t *x2, mpd_uint_t *x3,
mpd_uint_t w3table[3], mpd_uint_t umod)
{
mpd_uint_t r1, r2;
mpd_uint_t w;
mpd_uint_t s, tmp;
/* k = 0 -> w = 1 */
s = *x1;
s = addmod(s, *x2, umod);
s = addmod(s, *x3, umod);
r1 = s;
/* k = 1 */
s = *x1;
w = w3table[1];
tmp = MULMOD(*x2, w);
s = addmod(s, tmp, umod);
w = w3table[2];
tmp = MULMOD(*x3, w);
s = addmod(s, tmp, umod);
r2 = s;
/* k = 2 */
s = *x1;
w = w3table[2];
tmp = MULMOD(*x2, w);
s = addmod(s, tmp, umod);
w = w3table[1];
tmp = MULMOD(*x3, w);
s = addmod(s, tmp, umod);
*x3 = s;
*x2 = r2;
*x1 = r1;
}
#else /* PPRO */
static inline void
ppro_size3_ntt(mpd_uint_t *x1, mpd_uint_t *x2, mpd_uint_t *x3, mpd_uint_t w3table[3],
mpd_uint_t umod, double *dmod, uint32_t dinvmod[3])
{
mpd_uint_t r1, r2;
mpd_uint_t w;
mpd_uint_t s, tmp;
/* k = 0 -> w = 1 */
s = *x1;
s = addmod(s, *x2, umod);
s = addmod(s, *x3, umod);
r1 = s;
/* k = 1 */
s = *x1;
w = w3table[1];
tmp = ppro_mulmod(*x2, w, dmod, dinvmod);
s = addmod(s, tmp, umod);
w = w3table[2];
tmp = ppro_mulmod(*x3, w, dmod, dinvmod);
s = addmod(s, tmp, umod);
r2 = s;
/* k = 2 */
s = *x1;
w = w3table[2];
tmp = ppro_mulmod(*x2, w, dmod, dinvmod);
s = addmod(s, tmp, umod);
w = w3table[1];
tmp = ppro_mulmod(*x3, w, dmod, dinvmod);
s = addmod(s, tmp, umod);
*x3 = s;
*x2 = r2;
*x1 = r1;
}
#endif
/* forward transform, sign = -1; transform length = 3 * 2**n */
int
four_step_fnt(mpd_uint_t *a, mpd_size_t n, int modnum)
{
mpd_size_t R = 3; /* number of rows */
mpd_size_t C = n / 3; /* number of columns */
mpd_uint_t w3table[3];
mpd_uint_t kernel, w0, w1, wstep;
mpd_uint_t *s, *p0, *p1, *p2;
mpd_uint_t umod;
#ifdef PPRO
double dmod;
uint32_t dinvmod[3];
#endif
mpd_size_t i, k;
assert(n >= 48);
assert(n <= 3*MPD_MAXTRANSFORM_2N);
/* Length R transform on the columns. */
SETMODULUS(modnum);
_mpd_init_w3table(w3table, -1, modnum);
for (p0=a, p1=p0+C, p2=p0+2*C; p0<a+C; p0++,p1++,p2++) {
SIZE3_NTT(p0, p1, p2, w3table);
}
/* Multiply each matrix element (addressed by i*C+k) by r**(i*k). */
kernel = _mpd_getkernel(n, -1, modnum);
for (i = 1; i < R; i++) {
w0 = 1; /* r**(i*0): initial value for k=0 */
w1 = POWMOD(kernel, i); /* r**(i*1): initial value for k=1 */
wstep = MULMOD(w1, w1); /* r**(2*i) */
for (k = 0; k < C-1; k += 2) {
mpd_uint_t x0 = a[i*C+k];
mpd_uint_t x1 = a[i*C+k+1];
MULMOD2(&x0, w0, &x1, w1);
MULMOD2C(&w0, &w1, wstep); /* r**(i*(k+2)) = r**(i*k) * r**(2*i) */
a[i*C+k] = x0;
a[i*C+k+1] = x1;
}
}
/* Length C transform on the rows. */
for (s = a; s < a+n; s += C) {
if (!six_step_fnt(s, C, modnum)) {
return 0;
}
}
#if 0 /* An unordered transform is sufficient for convolution. */
/* Transpose the matrix. */
transpose_3xpow2(a, R, C);
#endif
return 1;
}
/* backward transform, sign = 1; transform length = 3 * 2**n */
int
inv_four_step_fnt(mpd_uint_t *a, mpd_size_t n, int modnum)
{
mpd_size_t R = 3; /* number of rows */
mpd_size_t C = n / 3; /* number of columns */
mpd_uint_t w3table[3];
mpd_uint_t kernel, w0, w1, wstep;
mpd_uint_t *s, *p0, *p1, *p2;
mpd_uint_t umod;
#ifdef PPRO
double dmod;
uint32_t dinvmod[3];
#endif
mpd_size_t i, k;
assert(n >= 48);
assert(n <= 3*MPD_MAXTRANSFORM_2N);
#if 0 /* An unordered transform is sufficient for convolution. */
/* Transpose the matrix, producing an R*C matrix. */
transpose_3xpow2(a, C, R);
#endif
/* Length C transform on the rows. */
for (s = a; s < a+n; s += C) {
if (!inv_six_step_fnt(s, C, modnum)) {
return 0;
}
}
/* Multiply each matrix element (addressed by i*C+k) by r**(i*k). */
SETMODULUS(modnum);
kernel = _mpd_getkernel(n, 1, modnum);
for (i = 1; i < R; i++) {
w0 = 1;
w1 = POWMOD(kernel, i);
wstep = MULMOD(w1, w1);
for (k = 0; k < C; k += 2) {
mpd_uint_t x0 = a[i*C+k];
mpd_uint_t x1 = a[i*C+k+1];
MULMOD2(&x0, w0, &x1, w1);
MULMOD2C(&w0, &w1, wstep);
a[i*C+k] = x0;
a[i*C+k+1] = x1;
}
}
/* Length R transform on the columns. */
_mpd_init_w3table(w3table, 1, modnum);
for (p0=a, p1=p0+C, p2=p0+2*C; p0<a+C; p0++,p1++,p2++) {
SIZE3_NTT(p0, p1, p2, w3table);
}
return 1;
}
/*
* Copyright (c) 2008-2012 Stefan Krah. All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
*
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
*
* THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS "AS IS" AND
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*/
#ifndef FOUR_STEP_H
#define FOUR_STEP_H
#include "mpdecimal.h"
#include <stdio.h>
int four_step_fnt(mpd_uint_t *a, mpd_size_t n, int modnum);
int inv_four_step_fnt(mpd_uint_t *a, mpd_size_t n, int modnum);
#endif
This diff is collapsed.
/*
* Copyright (c) 2008-2012 Stefan Krah. All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
*
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
*
* THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS "AS IS" AND
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*/
#ifndef IO_H
#define IO_H
#include <errno.h>
#include "mpdecimal.h"
#if SIZE_MAX == MPD_SIZE_MAX
#define mpd_strtossize _mpd_strtossize
#else
static inline mpd_ssize_t
mpd_strtossize(const char *s, char **end, int base)
{
int64_t retval;
errno = 0;
retval = _mpd_strtossize(s, end, base);
if (errno == 0 && (retval > MPD_SSIZE_MAX || retval < MPD_SSIZE_MIN)) {
errno = ERANGE;
}
if (errno == ERANGE) {
return (retval < 0) ? MPD_SSIZE_MIN : MPD_SSIZE_MAX;
}
return (mpd_ssize_t)retval;
}
#endif
#endif
This document contains links to the literature used in the process of
creating the library. The list is probably not complete.
Mike Cowlishaw: General Decimal Arithmetic Specification
http://speleotrove.com/decimal/decarith.html
Jean-Michel Muller: On the definition of ulp (x)
lara.inist.fr/bitstream/2332/518/1/LIP-RR2005-09.pdf
T. E. Hull, A. Abrham: Properly rounded variable precision square root
http://portal.acm.org/citation.cfm?id=214413
T. E. Hull, A. Abrham: Variable precision exponential function
http://portal.acm.org/citation.cfm?id=6498
Roman E. Maeder: Storage allocation for the Karatsuba integer multiplication
algorithm. http://www.springerlink.com/content/w15058mj6v59t565/
J. M. Pollard: The fast Fourier transform in a finite field
http://www.ams.org/journals/mcom/1971-25-114/S0025-5718-1971-0301966-0/home.html
David H. Bailey: FFTs in External or Hierarchical Memory
http://crd.lbl.gov/~dhbailey/dhbpapers/
W. Morven Gentleman: Matrix Multiplication and Fast Fourier Transforms
http://www.alcatel-lucent.com/bstj/vol47-1968/articles/bstj47-6-1099.pdf
Mikko Tommila: Apfloat documentation
http://www.apfloat.org/apfloat/2.41/apfloat.pdf
Joerg Arndt: "Matters Computational"
http://www.jjj.de/fxt/
Karl Hasselstrom: Fast Division of Large Integers
www.treskal.com/kalle/exjobb/original-report.pdf
Bignum support (Fast Number Theoretic Transform or FNT):
========================================================
Bignum arithmetic in libmpdec uses the scheme for fast convolution
of integer sequences from:
J. M. Pollard: The fast Fourier transform in a finite field
http://www.ams.org/journals/mcom/1971-25-114/S0025-5718-1971-0301966-0/home.html
The transform in a finite field can be used for convolution in the same
way as the Fourier Transform. The main advantages of the Number Theoretic
Transform are that it is both exact and very memory efficient.
Convolution in pseudo-code:
~~~~~~~~~~~~~~~~~~~~~~~~~~~
fnt_convolute(a, b):
x = fnt(a) # forward transform of a
y = fnt(b) # forward transform of b
z = pairwise multiply x[i] and y[i]
result = inv_fnt(z) # backward transform of z.
Extending the maximum transform length (Chinese Remainder Theorem):
-------------------------------------------------------------------
The maximum transform length is quite limited when using a single
prime field. However, it is possible to use multiple primes and
recover the result using the Chinese Remainder Theorem.
Multiplication in pseudo-code:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
_mpd_fntmul(u, v):
c1 = fnt_convolute(u, v, P1) # convolute modulo prime1
c2 = fnt_convolute(u, v, P2) # convolute modulo prime2
c3 = fnt_convolute(u, v, P3) # convolute modulo prime3
result = crt3(c1, c2, c3) # Chinese Remainder Theorem
Optimized transform functions:
------------------------------
There are three different fnt() functions:
std_fnt: "standard" decimation in frequency transform for array lengths
of 2**n. Performs well up to 1024 words.
sixstep: Cache-friendly algorithm for array lengths of 2**n. Outperforms
std_fnt for large arrays.
fourstep: Algorithm for array lengths of 3 * 2**n. Also cache friendly
in large parts.
List of bignum-only files:
--------------------------
Functions from these files are only used in _mpd_fntmul().
umodarith.h -> fast low level routines for unsigned modular arithmetic
numbertheory.c -> routines for setting up the FNT
difradix2.c -> decimation in frequency transform, used as the
"base case" by the following three files:
fnt.c -> standard transform for smaller arrays
sixstep.c -> transform large arrays of length 2**n
fourstep.c -> transform arrays of length 3 * 2**n
convolute.c -> do the actual fast convolution, using one of
the three transform functions.
transpose.c -> transpositions needed for the sixstep algorithm.
crt.c -> Chinese Remainder Theorem: use information from three
transforms modulo three different primes to get the
final result.
#
# Copyright (c) 2008-2012 Stefan Krah. All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
#
# 1. Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
#
# 2. Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
#
# THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS "AS IS" AND
# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
# ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
# OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
# HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
# LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
# OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
# SUCH DAMAGE.
#
######################################################################
# This file lists and checks some of the constants and limits used #
# in libmpdec's Number Theoretic Transform. At the end of the file #
# there is an example function for the plain DFT transform. #
######################################################################
#
# Number theoretic transforms are done in subfields of F(p). P[i]
# are the primes, D[i] = P[i] - 1 are highly composite and w[i]
# are the respective primitive roots of F(p).
#
# The strategy is to convolute two coefficients modulo all three
# primes, then use the Chinese Remainder Theorem on the three
# result arrays to recover the result in the usual base RADIX
# form.
#
# ======================================================================
# Primitive roots
# ======================================================================
#
# Verify primitive roots:
#
# For a prime field, r is a primitive root if and only if for all prime
# factors f of p-1, r**((p-1)/f) =/= 1 (mod p).
#
def prod(F, E):
"""Check that the factorization of P-1 is correct. F is the list of
factors of P-1, E lists the number of occurrences of each factor."""
x = 1
for y, z in zip(F, E):
x *= y**z
return x
def is_primitive_root(r, p, factors, exponents):
"""Check if r is a primitive root of F(p)."""
if p != prod(factors, exponents) + 1:
return False
for f in factors:
q, control = divmod(p-1, f)
if control != 0:
return False
if pow(r, q, p) == 1:
return False
return True
# =================================================================
# Constants and limits for the 64-bit version
# =================================================================
RADIX = 10**19
# Primes P1, P2 and P3:
P = [2**64-2**32+1, 2**64-2**34+1, 2**64-2**40+1]
# P-1, highly composite. The transform length d is variable and
# must divide D = P-1. Since all D are divisible by 3 * 2**32,
# transform lengths can be 2**n or 3 * 2**n (where n <= 32).
D = [2**32 * 3 * (5 * 17 * 257 * 65537),
2**34 * 3**2 * (7 * 11 * 31 * 151 * 331),
2**40 * 3**2 * (5 * 7 * 13 * 17 * 241)]
# Prime factors of P-1 and their exponents:
F = [(2,3,5,17,257,65537), (2,3,7,11,31,151,331), (2,3,5,7,13,17,241)]
E = [(32,1,1,1,1,1), (34,2,1,1,1,1,1), (40,2,1,1,1,1,1)]
# Maximum transform length for 2**n. Above that only 3 * 2**31
# or 3 * 2**32 are possible.
MPD_MAXTRANSFORM_2N = 2**32
# Limits in the terminology of Pollard's paper:
m2 = (MPD_MAXTRANSFORM_2N * 3) // 2 # Maximum length of the smaller array.
M1 = M2 = RADIX-1 # Maximum value per single word.
L = m2 * M1 * M2
P[0] * P[1] * P[2] > 2 * L
# Primitive roots of F(P1), F(P2) and F(P3):
w = [7, 10, 19]
# The primitive roots are correct:
for i in range(3):
if not is_primitive_root(w[i], P[i], F[i], E[i]):
print("FAIL")
# =================================================================
# Constants and limits for the 32-bit version
# =================================================================
RADIX = 10**9
# Primes P1, P2 and P3:
P = [2113929217, 2013265921, 1811939329]
# P-1, highly composite. All D = P-1 are divisible by 3 * 2**25,
# allowing for transform lengths up to 3 * 2**25 words.
D = [2**25 * 3**2 * 7,
2**27 * 3 * 5,
2**26 * 3**3]
# Prime factors of P-1 and their exponents:
F = [(2,3,7), (2,3,5), (2,3)]
E = [(25,2,1), (27,1,1), (26,3)]
# Maximum transform length for 2**n. Above that only 3 * 2**24 or
# 3 * 2**25 are possible.
MPD_MAXTRANSFORM_2N = 2**25
# Limits in the terminology of Pollard's paper:
m2 = (MPD_MAXTRANSFORM_2N * 3) // 2 # Maximum length of the smaller array.
M1 = M2 = RADIX-1 # Maximum value per single word.
L = m2 * M1 * M2
P[0] * P[1] * P[2] > 2 * L
# Primitive roots of F(P1), F(P2) and F(P3):
w = [5, 31, 13]
# The primitive roots are correct:
for i in range(3):
if not is_primitive_root(w[i], P[i], F[i], E[i]):
print("FAIL")
# ======================================================================
# Example transform using a single prime
# ======================================================================
def ntt(lst, dir):
"""Perform a transform on the elements of lst. len(lst) must
be 2**n or 3 * 2**n, where n <= 25. This is the slow DFT."""
p = 2113929217 # prime
d = len(lst) # transform length
d_prime = pow(d, (p-2), p) # inverse of d
xi = (p-1)//d
w = 5 # primitive root of F(p)
r = pow(w, xi, p) # primitive root of the subfield
r_prime = pow(w, (p-1-xi), p) # inverse of r
if dir == 1: # forward transform
a = lst # input array
A = [0] * d # transformed values
for i in range(d):
s = 0
for j in range(d):
s += a[j] * pow(r, i*j, p)
A[i] = s % p
return A
elif dir == -1: # backward transform
A = lst # input array
a = [0] * d # transformed values
for j in range(d):
s = 0
for i in range(d):
s += A[i] * pow(r_prime, i*j, p)
a[j] = (d_prime * s) % p
return a
def ntt_convolute(a, b):
"""convolute arrays a and b."""
assert(len(a) == len(b))
x = ntt(a, 1)
y = ntt(b, 1)
for i in range(len(a)):
y[i] = y[i] * x[i]
r = ntt(y, -1)
return r
# Example: Two arrays representing 21 and 81 in little-endian:
a = [1, 2, 0, 0]
b = [1, 8, 0, 0]
assert(ntt_convolute(a, b) == [1, 10, 16, 0])
assert(21 * 81 == (1*10**0 + 10*10**1 + 16*10**2 + 0*10**3))
This diff is collapsed.
(* Copyright (c) 2011 Stefan Krah. All rights reserved. *)
==========================================================================
Calculate (a * b) % p using special primes
==========================================================================
A description of the algorithm can be found in the apfloat manual by
Tommila [1].
Definitions:
------------
In the whole document, "==" stands for "is congruent with".
Result of a * b in terms of high/low words:
(1) hi * 2**64 + lo = a * b
Special primes:
(2) p = 2**64 - z + 1, where z = 2**n
Single step modular reduction:
(3) R(hi, lo) = hi * z - hi + lo
Strategy:
---------
a) Set (hi, lo) to the result of a * b.
b) Set (hi', lo') to the result of R(hi, lo).
c) Repeat step b) until 0 <= hi' * 2**64 + lo' < 2*p.
d) If the result is less than p, return lo'. Otherwise return lo' - p.
The reduction step b) preserves congruence:
-------------------------------------------
hi * 2**64 + lo == hi * z - hi + lo (mod p)
Proof:
~~~~~~
hi * 2**64 + lo = (2**64 - z + 1) * hi + z * hi - hi + lo
= p * hi + z * hi - hi + lo
== z * hi - hi + lo (mod p)
Maximum numbers of step b):
---------------------------
# To avoid unneccessary formalism, define:
def R(hi, lo, z):
return divmod(hi * z - hi + lo, 2**64)
# For simplicity, assume hi=2**64-1, lo=2**64-1 after the
# initial multiplication a * b. This is of course impossible
# but certainly covers all cases.
# Then, for p1:
hi=2**64-1; lo=2**64-1; z=2**32
p1 = 2**64 - z + 1
hi, lo = R(hi, lo, z) # First reduction
hi, lo = R(hi, lo, z) # Second reduction
hi * 2**64 + lo < 2 * p1 # True
# For p2:
hi=2**64-1; lo=2**64-1; z=2**34
p2 = 2**64 - z + 1
hi, lo = R(hi, lo, z) # First reduction
hi, lo = R(hi, lo, z) # Second reduction
hi, lo = R(hi, lo, z) # Third reduction
hi * 2**64 + lo < 2 * p2 # True
# For p3:
hi=2**64-1; lo=2**64-1; z=2**40
p3 = 2**64 - z + 1
hi, lo = R(hi, lo, z) # First reduction
hi, lo = R(hi, lo, z) # Second reduction
hi, lo = R(hi, lo, z) # Third reduction
hi * 2**64 + lo < 2 * p3 # True
Step d) preserves congruence and yields a result < p:
-----------------------------------------------------
Case hi = 0:
Case lo < p: trivial.
Case lo >= p:
lo == lo - p (mod p) # result is congruent
p <= lo < 2*p -> 0 <= lo - p < p # result is in the correct range
Case hi = 1:
p < 2**64 /\ 2**64 + lo < 2*p -> lo < p # lo is always less than p
2**64 + lo == 2**64 + (lo - p) (mod p) # result is congruent
= lo - p # exactly the same value as the previous RHS
# in uint64_t arithmetic.
p < 2**64 + lo < 2*p -> 0 < 2**64 + (lo - p) < p # correct range
[1] http://www.apfloat.org/apfloat/2.40/apfloat.pdf
This diff is collapsed.
(* Copyright (c) 2011 Stefan Krah. All rights reserved. *)
The Six Step Transform:
=======================
In libmpdec, the six-step transform is the Matrix Fourier Transform (See
matrix-transform.txt) in disguise. It is called six-step transform after
a variant that appears in [1]. The algorithm requires that the input
array can be viewed as an R*C matrix.
Algorithm six-step (forward transform):
---------------------------------------
1a) Transpose the matrix.
1b) Apply a length R FNT to each row.
1c) Transpose the matrix.
2) Multiply each matrix element (addressed by j*C+m) by r**(j*m).
3) Apply a length C FNT to each row.
4) Transpose the matrix.
Note that steps 1a) - 1c) are exactly equivalent to step 1) of the Matrix
Fourier Transform. For large R, it is faster to transpose twice and do
a transform on the rows than to perform a column transpose directly.
Algorithm six-step (inverse transform):
---------------------------------------
0) View the matrix as a C*R matrix.
1) Transpose the matrix, producing an R*C matrix.
2) Apply a length C FNT to each row.
3) Multiply each matrix element (addressed by i*C+n) by r**(i*n).
4a) Transpose the matrix.
4b) Apply a length R FNT to each row.
4c) Transpose the matrix.
Again, steps 4a) - 4c) are equivalent to step 4) of the Matrix Fourier
Transform.
--
[1] David H. Bailey: FFTs in External or Hierarchical Memory
http://crd.lbl.gov/~dhbailey/dhbpapers/
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment