Pascal’s Triangle¶
Summary¶
This article is about Pascal’s Triangle building algorithm implementations, their performance and limitations analysis. The article shows some techniques for performance optimization that can be used for solving real life software optimization problems.Programming languages:
- Python (CPython and PyPy implementations)
- Cython
- C
- Assembler
- Microoptimizations
- Replace recursive algorithm with non-recursive
- Replace Python with other programming languages
Conclusions¶
- It is possible to improve performance for more than 200 times if you are ready to switch from CPython to C or Assembler
- You probably should not try to implement your algorithm in Assembler for better performance unless you are as proficient in Assembler as a C-compiler developer, otherwise just use compiler optimization options
- It was possible to improve CPython implementation by 40% by switching to non-recursive implementation and other language specific optimizations
- It is possible to improve performance by 20-50% by switching to Cython with no changes to source code
- It is possible to improve performance for 2-7 times by switching to PyPy with no changes to source code
- As expected from fastest to slowest: Assembler and C, PyPy, Cython, CPython
- You should always check if your performance optimization leads to improvement, because sometime unexpectedly they lead to performance degradation
- Faster implementations may have limitations that make them impractical. For example CPascalTriangleFullAsm has the best time performance, but limited to 34 height. Therefore it is more practical to cache or precalculate Pascal’s Triangle of this height and return result from memory which would would run even faster than the Assembler implementation.
Implementation Notes¶
- Complete source code can be found in pascal_triangle github repository
- This article is automatically generated with Sphinx
- Metaclasses are used for unit-testing variety of solutions of the same problem
- Specific property of Pascal’s Triangle is used to test the solutions: sum of line’s integers is equal to 2 to the power of line number
- cythonize() tries to compile *.pyx files before Cython is installed according to setup_requires argument of setup(), so there is a need to have a LazyList() as work around
- Cython is incompatible with PyPy, so there is a need to check for python implementation being used
- It is important to set constant CPU frequency to get stable benchmark results
Full story¶
The basic definition of Pascal’s Triangle is the following:Pascal’s Triangle is a triangle formed of lines of integer numbers. Each line has one more integer than the previous line and centered with the previous line. First line has only one integer with value of 1. Every integer in each line is a sum of two above integers.
Example:
1
1 1
1 2 1
1 3 3 1
1 4 6 4 1
1 5 10 10 5 1
Here is the code that I produced on a job interview when I was asked to code a program in Python that prints Pascal’s Triangle:
def print_pascal_triangle_original(n):
"""
Print Pascal's Triangle of height n: original interview solution.
:param n: height of the triangle to print
:return: n-th line of Pascal's Triangle
"""
if n <= 1:
print [1]
return [1]
a = [0] + print_pascal_triangle_original(n - 1) + [0]
b = [a[i] + a[i + 1] for i in xrange(len(a) - 1)]
print b
return b
from .base import PyPascalTriangleBase
class PyPascalTriangleOriginal(PyPascalTriangleBase):
"""
Modified original solution presented on interview.
"""
original = True
max_height = 900
def build(self, height):
if height <= 0:
self._print([1])
return [1]
a = [0] + self.build(height - 1) + [0]
b = [a[i] + a[i + 1] for i in xrange(len(a) - 1)]
self._print(b)
return b # return last line for testing purposes
Limitations¶
When I started implementation of the Pascal’s Triangle building algorithm it turned out that different implementations have their own limitations. Therefore limitations should be described first.Here is the result of limitation test:
$ python -m pascal_triangle.tests.limitation_test
# | Implementation | Language | Height limit | Duration, s | Reason | Info |
1 | PyPascalTriangleOriginal | Python | 993 | 0.053755 | exception | RuntimeError(‘maximum recursion depth exceeded’,) |
2 | PyPascalTriangleOriginalPlus | Python | 993 | 0.053977 | exception | RuntimeError(‘maximum recursion depth exceeded’,) |
3 | PyPascalTriangleConstantLists | Python | 993 | 0.054690 | exception | RuntimeError(‘maximum recursion depth exceeded’,) |
4 | PyPascalTriangleConstantTuples | Python | 993 | 0.070621 | exception | RuntimeError(‘maximum recursion depth exceeded’,) |
5 | PyPascalTriangleIterators | Python | 993 | 0.047875 | exception | RuntimeError(‘maximum recursion depth exceeded’,) |
6 | PyPascalTriangleNonRecursive | Python | 3072 | 0.746115 | timeout | 1.000000 s, (duration: 1.474587 s) |
7 | PyPascalTriangleNonRecursiveCLike | Python | 3072 | 0.935708 | timeout | 1.000000 s, (duration: 1.883892 s) |
8 | PyPascalTriangleNonRecursiveCLikeImproved | Python | 3072 | 0.846512 | timeout | 1.000000 s, (duration: 1.728939 s) |
9 | PyPascalTriangleNonRecursiveLessCLike | Python | 3072 | 0.729132 | timeout | 1.000000 s, (duration: 1.533341 s) |
10 | PyPascalTriangleNonRecursiveArray | Python | 34 | 0.000096 | exception | OverflowError(‘unsigned int is greater than maximum’,) |
11 | PyPascalTriangleNonRecursiveIterators | Python | 3072 | 0.678832 | timeout | 1.000000 s, (duration: 1.382548 s) |
12 | CyPascalTriangleIterators | Cython | 992 | 0.036476 | exception | RuntimeError(‘maximum recursion depth exceeded’,) |
13 | CyPascalTriangleConstantLists | Cython | 992 | 0.043471 | exception | RuntimeError(‘maximum recursion depth exceeded’,) |
14 | CyPascalTriangleNonRecursive | Cython | 3072 | 0.610665 | timeout | 1.000000 s, (duration: 1.275582 s) |
15 | CyPascalTriangleNonRecursiveIterators | Cython | 3072 | 0.552308 | timeout | 1.000000 s, (duration: 1.181184 s) |
16 | CyPascalTriangleNonRecursiveCLikeImproved | Cython | 3072 | 0.690947 | timeout | 1.000000 s, (duration: 1.453662 s) |
17 | CyPascalTriangleNonRecursiveCLikeMoreImproved | Cython | 3072 | 0.728800 | timeout | 1.000000 s, (duration: 1.485473 s) |
18 | CyPascalTriangleNonRecursiveLessCLike | Cython | 3072 | 0.614456 | timeout | 1.000000 s, (duration: 1.314016 s) |
19 | CyPascalTriangleNonRecursiveCTypes | Cython | 34 | 0.000044 | exception | OverflowError(‘unsigned int is greater than maximum’,) |
20 | CyPascalTriangleNonRecursiveCTypesULong | Cython | 67 | 0.000178 | exception | OverflowError(‘long int too large to convert’,) |
21 | CPascalTriangleInitial | C | 34 | 0.000002 | exception | ValueError(“Unable to build Pascal’s Triangle higher than 34”,) |
22 | CPascalTriangleULong | C | 67 | 0.000006 | exception | ValueError(“Unable to build Pascal’s Triangle higher than 67”,) |
23 | CPascalTrianglePartialAsm | C/Assembler | 34 | 0.000003 | exception | ValueError(“Unable to build Pascal’s Triangle higher than 34”,) |
24 | CPascalTriangleFullAsm | C/Assembler | 34 | 0.000002 | exception | ValueError(“Unable to build Pascal’s Triangle higher than 34”,) |
- 34 is a height limit for all algorithms that use unsigned 32-bit integers to store resulting lines.
- 67 is a height limit for all algorithms that use unsigned 64-bit integers to store resulting lines.
- A just below 1000 height limit is valid for all recursive algorithms in Python since the default recursion limit in python is 1000 frames.
- The other implementations are virtually limited by the time required for calculation (1 second).
So all optimizations that are limited by 64-height make only scientific interest.
Another conclusion from limitation test is that we can only compare implementations for particular heights. There will be 4 benchmarks of different heights: 34, 67, 900, 3000.
Performance¶
CPython¶
- The best implementation is Assembler x86 implementation
- Non-recursive implementations do better than recursive implementations
- Cython implementations do better than Python implementations
- C implementations do better than Cython implementations
- Assembler x86 implementation does better than C implementations
- Best implementation is about 100 times faster than the original implementation
- Best C implementation is about 50 times faster than the original implementation
- Best Cython implementation is about 90% faster than the original implementation
- Best Python implementation is about 40% faster than the original implementation
- PyPascalTriangleNonRecursive, PyPascalTriangleNonRecursiveLessCLike and PyPascalTriangleNonRecursiveCLikeImproved are three best Python implementations
- Cython implementation that used C type is not the best Cython implementation
- 32-bit vs 64-bit integers do not make notable difference
- Sometimes more optimized Python implementation can do better than less optimized Cython implementation
- Some “optimizations” lead to up to 30% performance loss
# | Implementation | Language | Duration, secs | Fraction of (*) duration | % faster than (*) |
1 | CPascalTriangleFullAsm | C/Assembler | 0.000718 | 0.009341 | 10606.042497 |
2 | CPascalTrianglePartialAsm | C/Assembler | 0.001081 | 0.014060 | 7012.174680 |
3 | CPascalTriangleULong | C | 0.001115 | 0.014504 | 6794.718837 |
4 | CPascalTriangleInitial | C | 0.001393 | 0.018117 | 5419.787744 |
5 | CyPascalTriangleNonRecursive | Cython | 0.040292 | 0.524077 | 90.811671 |
6 | CyPascalTriangleNonRecursiveCTypes | Cython | 0.041818 | 0.543927 | 83.848162 |
7 | CyPascalTriangleNonRecursiveCLikeImproved | Cython | 0.042011 | 0.546433 | 83.005119 |
8 | CyPascalTriangleNonRecursiveLessCLike | Cython | 0.042353 | 0.550883 | 81.526787 |
9 | CyPascalTriangleNonRecursiveCLikeMoreImproved | Cython | 0.042596 | 0.554046 | 80.490426 |
10 | CyPascalTriangleNonRecursiveCTypesULong | Cython | 0.045652 | 0.593793 | 68.408903 |
11 | CyPascalTriangleNonRecursiveIterators | Cython | 0.049919 | 0.649293 | 54.013612 |
12 | CyPascalTriangleIterators | Cython | 0.052587 | 0.683998 | 46.199324 |
13 | CyPascalTriangleConstantLists | Cython | 0.053835 | 0.700229 | 42.810452 |
14 | PyPascalTriangleNonRecursive | Python | 0.054130 | 0.704068 | 42.031730 |
15 | PyPascalTriangleNonRecursiveLessCLike | Python | 0.055244 | 0.718556 | 39.167925 |
16 | PyPascalTriangleNonRecursiveCLikeImproved | Python | 0.064518 | 0.839183 | 19.163513 |
17 | PyPascalTriangleNonRecursiveIterators | Python | 0.065121 | 0.847029 | 18.059735 |
18 | PyPascalTriangleConstantLists | Python | 0.071139 | 0.925304 | 8.072619 |
19 | PyPascalTriangleIterators | Python | 0.071390 | 0.928566 | 7.692924 |
20 | PyPascalTriangleOriginalPlus | Python | 0.073988 | 0.962359 | 3.911345 |
21 | PyPascalTriangleNonRecursiveCLike | Python | 0.074031 | 0.962920 | 3.850774 |
22 | PyPascalTriangleOriginal | Python | 0.076882 | 1.000000 | 0.000000 |
23 | PyPascalTriangleNonRecursiveArray | Python | 0.092010 | 1.196771 | -16.441835 |
24 | PyPascalTriangleConstantTuples | Python | 0.108509 | 1.411371 | -29.146882 |
# | Implementation | Language | Duration, secs | Fraction of (*) duration | % faster than (*) |
1 | CPascalTriangleULong | C | 0.004018 | 0.017983 | 5460.716829 |
2 | CyPascalTriangleNonRecursive | Cython | 0.132844 | 0.594593 | 68.182373 |
3 | CyPascalTriangleNonRecursiveLessCLike | Cython | 0.136494 | 0.610929 | 63.685050 |
4 | CyPascalTriangleNonRecursiveIterators | Cython | 0.145995 | 0.653455 | 53.032839 |
5 | CyPascalTriangleNonRecursiveCLikeImproved | Cython | 0.149224 | 0.667908 | 49.721277 |
6 | CyPascalTriangleIterators | Cython | 0.151895 | 0.679863 | 47.088499 |
7 | CyPascalTriangleNonRecursiveCLikeMoreImproved | Cython | 0.163172 | 0.730337 | 36.923084 |
8 | CyPascalTriangleConstantLists | Cython | 0.167823 | 0.751154 | 33.128569 |
9 | CyPascalTriangleNonRecursiveCTypesULong | Cython | 0.178073 | 0.797033 | 25.465361 |
10 | PyPascalTriangleNonRecursiveLessCLike | Python | 0.185789 | 0.831567 | 20.254859 |
11 | PyPascalTriangleNonRecursive | Python | 0.186182 | 0.833327 | 20.000922 |
12 | PyPascalTriangleNonRecursiveIterators | Python | 0.196337 | 0.878779 | 13.794221 |
13 | PyPascalTriangleIterators | Python | 0.211882 | 0.948357 | 5.445494 |
14 | PyPascalTriangleConstantLists | Python | 0.220266 | 0.985882 | 1.432032 |
15 | PyPascalTriangleOriginal | Python | 0.223420 | 1.000000 | 0.000000 |
16 | PyPascalTriangleOriginalPlus | Python | 0.224271 | 1.003808 | -0.379308 |
17 | PyPascalTriangleNonRecursiveCLikeImproved | Python | 0.234445 | 1.049345 | -4.702476 |
18 | PyPascalTriangleNonRecursiveCLike | Python | 0.270704 | 1.211637 | -17.467005 |
19 | PyPascalTriangleConstantTuples | Python | 0.322150 | 1.441902 | -30.647168 |
# | Implementation | Language | Duration, secs | Fraction of (*) duration | % faster than (*) |
1 | CyPascalTriangleNonRecursiveIterators | Cython | 0.140652 | 0.651107 | 53.584705 |
2 | CyPascalTriangleIterators | Cython | 0.141067 | 0.653028 | 53.132785 |
3 | CyPascalTriangleNonRecursiveLessCLike | Cython | 0.164642 | 0.762161 | 31.205923 |
4 | CyPascalTriangleNonRecursive | Cython | 0.169332 | 0.783872 | 27.571773 |
5 | CyPascalTriangleConstantLists | Cython | 0.170383 | 0.788738 | 26.784890 |
6 | PyPascalTriangleNonRecursiveIterators | Python | 0.186486 | 0.863282 | 15.837040 |
7 | PyPascalTriangleIterators | Python | 0.186617 | 0.863888 | 15.755793 |
8 | CyPascalTriangleNonRecursiveCLikeImproved | Cython | 0.194658 | 0.901112 | 10.974028 |
9 | PyPascalTriangleOriginalPlus | Python | 0.213122 | 0.986585 | 1.359775 |
10 | PyPascalTriangleOriginal | Python | 0.216020 | 1.000000 | 0.000000 |
11 | PyPascalTriangleConstantLists | Python | 0.217825 | 1.008356 | -0.828678 |
12 | PyPascalTriangleNonRecursiveLessCLike | Python | 0.219190 | 1.014675 | -1.446241 |
13 | CyPascalTriangleNonRecursiveCLikeMoreImproved | Cython | 0.220848 | 1.022351 | -2.186215 |
14 | PyPascalTriangleNonRecursive | Python | 0.222758 | 1.031192 | -3.024890 |
15 | PyPascalTriangleNonRecursiveCLikeImproved | Python | 0.273294 | 1.265133 | -20.956958 |
16 | PyPascalTriangleConstantTuples | Python | 0.284513 | 1.317069 | -24.073813 |
17 | PyPascalTriangleNonRecursiveCLike | Python | 0.310638 | 1.438006 | -30.459280 |
# | Implementation | Language | Duration, secs | Fraction of (*) duration | % faster than (*) |
1 | CyPascalTriangleNonRecursiveIterators | Cython | 2.492651 | ||
2 | CyPascalTriangleNonRecursive | Cython | 2.849843 | ||
3 | CyPascalTriangleNonRecursiveLessCLike | Cython | 2.858547 | ||
4 | PyPascalTriangleNonRecursiveIterators | Python | 3.019004 | ||
5 | CyPascalTriangleNonRecursiveCLikeImproved | Cython | 3.216158 | ||
6 | CyPascalTriangleNonRecursiveCLikeMoreImproved | Cython | 3.429079 | ||
7 | PyPascalTriangleNonRecursiveLessCLike | Python | 3.462906 | ||
8 | PyPascalTriangleNonRecursive | Python | 3.464945 | ||
9 | PyPascalTriangleNonRecursiveCLikeImproved | Python | 4.003433 | ||
10 | PyPascalTriangleNonRecursiveCLike | Python | 4.414496 |
PyPy¶
- C-like Python implementations interpreted with PyPy do better than interpreted with CPython
- PyPascalTriangleNonRecursiveCLike, PyPascalTriangleNonRecursiveLessCLike and PyPascalTriangleNonRecursive are three best Python implementations interpreted with PyPy
- Best Python implementation is about 300% faster than the original implementation with PyPy
- It is easier to gain higher performance increase for Python with PyPy than with CPython
# | Implementation | Language | Duration, secs | Fraction of (*) duration | % faster than (*) |
1 | PyPascalTriangleNonRecursiveCLike | Python | 0.009965 | 0.255112 | 291.985071 |
2 | PyPascalTriangleNonRecursiveLessCLike | Python | 0.011276 | 0.288669 | 246.417169 |
3 | PyPascalTriangleNonRecursive | Python | 0.011767 | 0.301237 | 231.964988 |
4 | PyPascalTriangleNonRecursiveCLikeImproved | Python | 0.014874 | 0.380779 | 162.619819 |
5 | PyPascalTriangleNonRecursiveArray | Python | 0.018903 | 0.483923 | 106.644384 |
6 | PyPascalTriangleConstantLists | Python | 0.033238 | 0.850908 | 17.521573 |
7 | PyPascalTriangleOriginalPlus | Python | 0.038394 | 0.982898 | 1.739984 |
8 | PyPascalTriangleOriginal | Python | 0.039062 | 1.000000 | 0.000000 |
9 | PyPascalTriangleNonRecursiveIterators | Python | 0.041938 | 1.073628 | -6.857835 |
10 | PyPascalTriangleConstantTuples | Python | 0.050661 | 1.296940 | -22.895411 |
11 | PyPascalTriangleIterators | Python | 0.057787 | 1.479370 | -32.403652 |
# | Implementation | Language | Duration, secs | Fraction of (*) duration | % faster than (*) |
1 | PyPascalTriangleNonRecursiveCLikeImproved | Python | 0.021272 | 0.340252 | 193.899487 |
2 | PyPascalTriangleNonRecursiveLessCLike | Python | 0.021738 | 0.347700 | 187.604058 |
3 | PyPascalTriangleNonRecursiveCLike | Python | 0.022324 | 0.357078 | 180.051050 |
4 | PyPascalTriangleNonRecursive | Python | 0.023544 | 0.376592 | 165.539589 |
5 | PyPascalTriangleConstantLists | Python | 0.061327 | 0.980936 | 1.943442 |
6 | PyPascalTriangleOriginalPlus | Python | 0.062375 | 0.997700 | 0.230487 |
7 | PyPascalTriangleOriginal | Python | 0.062519 | 1.000000 | 0.000000 |
8 | PyPascalTriangleNonRecursiveIterators | Python | 0.076089 | 1.217060 | -17.834750 |
9 | PyPascalTriangleConstantTuples | Python | 0.101974 | 1.631093 | -38.691403 |
10 | PyPascalTriangleIterators | Python | 0.103806 | 1.660396 | -39.773401 |
# | Implementation | Language | Duration, secs | Fraction of (*) duration | % faster than (*) |
1 | PyPascalTriangleNonRecursiveLessCLike | Python | 0.124087 | 0.269902 | 270.505017 |
2 | PyPascalTriangleNonRecursive | Python | 0.127564 | 0.277465 | 260.405912 |
3 | PyPascalTriangleNonRecursiveCLikeImproved | Python | 0.129059 | 0.280717 | 256.230684 |
4 | PyPascalTriangleNonRecursiveCLike | Python | 0.134702 | 0.292991 | 241.307584 |
5 | PyPascalTriangleNonRecursiveIterators | Python | 0.206257 | 0.448630 | 122.900715 |
6 | PyPascalTriangleConstantLists | Python | 0.358617 | 0.780030 | 28.200265 |
7 | PyPascalTriangleOriginalPlus | Python | 0.362363 | 0.788177 | 26.875041 |
8 | PyPascalTriangleIterators | Python | 0.364880 | 0.793652 | 25.999760 |
9 | PyPascalTriangleConstantTuples | Python | 0.374680 | 0.814968 | 22.704246 |
10 | PyPascalTriangleOriginal | Python | 0.459748 | 1.000000 | 0.000000 |
# | Implementation | Language | Duration, secs | Fraction of (*) duration | % faster than (*) |
1 | PyPascalTriangleNonRecursiveCLikeImproved | Python | 4.349007 | ||
2 | PyPascalTriangleNonRecursiveLessCLike | Python | 4.362742 | ||
3 | PyPascalTriangleNonRecursive | Python | 4.366922 | ||
4 | PyPascalTriangleNonRecursiveCLike | Python | 4.501581 | ||
5 | PyPascalTriangleNonRecursiveIterators | Python | 5.453328 |
CPython vs Cython vs PyPy¶
- PyPy is faster than Cython
- Cython is faster than CPython
- Cython increases performance for tens of percents
- PyPy increase performance in several times
- CPython, Cython and PyPy do their best at different implementations
- PyPascalTriangleNonRecursive is the best for CPython
- PyPascalTriangleNonRecursiveLessCLike is the best for Cython
- PyPascalTriangleNonRecursiveCLike is the best for PyPy
# | Implementation | CPython, s | Cython, s / % faster | PyPy, s / % faster |
1 | PyPascalTriangleNonRecursive | 0.054124 | 0.040927 / +32.24 | 0.011875 / +355.78 |
2 | PyPascalTriangleNonRecursiveLessCLike | 0.054172 | 0.040580 / +33.49 | 0.011593 / +367.28 |
3 | PyPascalTriangleNonRecursiveCLikeImproved | 0.065096 | 0.043069 / +51.14 | 0.014917 / +336.39 |
4 | PyPascalTriangleNonRecursiveIterators | 0.065820 | 0.049754 / +32.29 | 0.036597 / +79.85 |
5 | PyPascalTriangleIterators | 0.072989 | 0.053422 / +36.63 | 0.059628 / +22.41 |
6 | PyPascalTriangleOriginalPlus | 0.073143 | 0.038411 / +90.42 | |
7 | PyPascalTriangleNonRecursiveCLike | 0.073623 | 0.009877 / +645.40 | |
8 | PyPascalTriangleConstantLists | 0.074328 | 0.054876 / +35.45 | 0.033842 / +119.63 |
9 | PyPascalTriangleOriginal | 0.075939 | 0.039253 / +93.46 | |
10 | PyPascalTriangleNonRecursiveArray | 0.092238 | 0.018299 / +404.06 | |
11 | PyPascalTriangleConstantTuples | 0.105243 | 0.049954 / +110.68 |
Pure C implementations¶
- Most optimal version is more than 200 times faster than original CPython version
- Compiler-optimized (-O3 and -Ofast options) C code runs faster than Assembler code
- Python types conversion required for C extensions come with up to 10% overhead on function call
- gcc optimization options may improve run time up to 3 times
#!/usr/bin/env bash
set -x
gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -D_FORTIFY_SOURCE=2 -g -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -o ccpascal_triangle_normal ./pascal_triangle/ccpascal_triangle.c
chmod a+x ./ccpascal_triangle_normal
gcc -O2 -o ccpascal_triangle_O2 ./pascal_triangle/ccpascal_triangle.c
chmod a+x ./ccpascal_triangle_O2
gcc -O3 -o ccpascal_triangle_O3 ./pascal_triangle/ccpascal_triangle.c
chmod a+x ./ccpascal_triangle_O3
gcc -Ofast -o ccpascal_triangle_Ofast ./pascal_triangle/ccpascal_triangle.c
chmod a+x ./ccpascal_triangle_Ofast
$ ./ccpascal_triangle_normal
0.001524 seconds: 1000 times: c_print_pascal_triangle(34)
0.001165 seconds: 1000 times: c_print_pascal_triangle_ulong(34)
0.000761 seconds: 1000 times: c_print_pascal_triangle_full_asm_implementation(34)
$ ./ccpascal_triangle_O2
0.001530 seconds: 1000 times: c_print_pascal_triangle(34)
0.001279 seconds: 1000 times: c_print_pascal_triangle_ulong(34)
0.000578 seconds: 1000 times: c_print_pascal_triangle_full_asm_implementation(34)
$ ./ccpascal_triangle_O3
0.000290 seconds: 1000 times: c_print_pascal_triangle(34)
0.000340 seconds: 1000 times: c_print_pascal_triangle_ulong(34)
0.000480 seconds: 1000 times: c_print_pascal_triangle_full_asm_implementation(34)
$ ./ccpascal_triangle_Ofast
0.000291 seconds: 1000 times: c_print_pascal_triangle(34)
0.000352 seconds: 1000 times: c_print_pascal_triangle_ulong(34)
0.000538 seconds: 1000 times: c_print_pascal_triangle_full_asm_implementation(34)
Environment¶
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 2
Core(s) per socket: 2
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 69
Stepping: 1
CPU MHz: 2601.000
BogoMIPS: 5188.20
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 3072K
NUMA node0 CPU(s): 0-3
current CPU frequency is 2.60 GHz.
current CPU frequency is 2.60 GHz.
current CPU frequency is 2.60 GHz.
current CPU frequency is 2.60 GHz.
Linux 3.13.0-24-generic #47-Ubuntu SMP Fri May 2 23:30:00 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
Linux Mint 17.2 Rafaela \n \l
Python 2.7.6
Cython version 0.23.4
gcc (Ubuntu 4.8.4-2ubuntu1~14.04) 4.8.4
Copyright (C) 2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Source code¶
Please, see class’s docstring and diff to review the difference with base implementation.PascalTriangleBase¶
class PascalTriangleBase(object):
language = NotImplemented
original = False
max_height = None
EMPTY_LIST = []
ZERO_LIST = [0]
ONE_LIST = [1]
EMPTY_TUPLE = ()
ZERO_TUPLE = (0,)
ONE_TUPLE = (1,)
def __init__(self, verbose=False, return_list=False):
self.verbose = verbose
self.return_list = return_list
def _test_print(self, arg):
print arg
def _print(self, arg):
"""
Print Pascal's Triangle line
:param arg: triangle line
"""
if not self.verbose:
return
if not isinstance(arg, list):
arg = list(arg)
self._test_print(arg)
def build(self, height):
"""Build Pascal's Triangle of given height.
:param height: height of the triangle to print (zero-based)
:return: last line of Pascal's Triangle
"""
raise NotImplementedError()
PyPascalTriangleBase¶
from pascal_triangle.implementations.base import PascalTriangleBase
class PyPascalTriangleBase(PascalTriangleBase):
language = 'Python'
PyPascalTriangleOriginal¶
from .base import PyPascalTriangleBase
class PyPascalTriangleOriginal(PyPascalTriangleBase):
"""
Modified original solution presented on interview.
"""
original = True
max_height = 900
def build(self, height):
if height <= 0:
self._print([1])
return [1]
a = [0] + self.build(height - 1) + [0]
b = [a[i] + a[i + 1] for i in xrange(len(a) - 1)]
self._print(b)
return b # return last line for testing purposes
PyPascalTriangleOriginalPlus¶
from .base import PyPascalTriangleBase
class PyPascalTriangleOriginalPlus(PyPascalTriangleBase):
"""
Based on :py:class::`PascalTriangleOriginal`.
Difference:
- Better naming of local variables
- Fixed corner cases
"""
max_height = 900
def build(self, height):
if height == 0:
self._print([1])
return [1]
elif height < 0:
return []
prev_line = [0] + self.build(height - 1) + [0]
line = [prev_line[i] + prev_line[i + 1] for i in xrange(len(prev_line) - 1)]
self._print(line)
return line # return last line for testing purposes
Diff¶
--- ../implementations/python/original.py
+++ ../implementations/python/original_plus.py
@@ -1,20 +1,25 @@
from .base import PyPascalTriangleBase
-class PyPascalTriangleOriginal(PyPascalTriangleBase):
+class PyPascalTriangleOriginalPlus(PyPascalTriangleBase):
"""
- Modified original solution presented on interview.
+ Based on :py:class::`PascalTriangleOriginal`.
+
+ Difference:
+ - Better naming of local variables
+ - Fixed corner cases
"""
- original = True
max_height = 900
def build(self, height):
- if height <= 0:
+ if height == 0:
self._print([1])
return [1]
+ elif height < 0:
+ return []
- a = [0] + self.build(height - 1) + [0]
- b = [a[i] + a[i + 1] for i in xrange(len(a) - 1)]
- self._print(b)
- return b # return last line for testing purposes
+ prev_line = [0] + self.build(height - 1) + [0]
+ line = [prev_line[i] + prev_line[i + 1] for i in xrange(len(prev_line) - 1)]
+ self._print(line)
+ return line # return last line for testing purposes
PyPascalTriangleConstantLists¶
from .base import PyPascalTriangleBase
class PyPascalTriangleConstantLists(PyPascalTriangleBase):
"""
Based on :py:class::`PascalTriangleOriginalPlus`.
Difference:
- Constant lists replaced with named constants
"""
max_height = 900
def build(self, height):
if height == 0:
self._print(self.ONE_LIST)
return self.ONE_LIST
elif height < 0:
return self.EMPTY_LIST
prev_line = self.ZERO_LIST + self.build(height - 1) + self.ZERO_LIST
line = [prev_line[i] + prev_line[i + 1] for i in xrange(len(prev_line) - 1)]
self._print(line)
return line # return last line for testing purposes
Diff¶
--- ../implementations/python/original_plus.py
+++ ../implementations/python/constant_lists.py
@@ -1,25 +1,24 @@
from .base import PyPascalTriangleBase
-class PyPascalTriangleOriginalPlus(PyPascalTriangleBase):
+class PyPascalTriangleConstantLists(PyPascalTriangleBase):
"""
- Based on :py:class::`PascalTriangleOriginal`.
+ Based on :py:class::`PascalTriangleOriginalPlus`.
Difference:
- - Better naming of local variables
- - Fixed corner cases
+ - Constant lists replaced with named constants
"""
max_height = 900
def build(self, height):
if height == 0:
- self._print([1])
- return [1]
+ self._print(self.ONE_LIST)
+ return self.ONE_LIST
elif height < 0:
- return []
+ return self.EMPTY_LIST
- prev_line = [0] + self.build(height - 1) + [0]
+ prev_line = self.ZERO_LIST + self.build(height - 1) + self.ZERO_LIST
line = [prev_line[i] + prev_line[i + 1] for i in xrange(len(prev_line) - 1)]
self._print(line)
return line # return last line for testing purposes
PyPascalTriangleConstantTuples¶
from .base import PyPascalTriangleBase
class PyPascalTriangleConstantTuples(PyPascalTriangleBase):
"""
Based on :py:class::`PascalTriangleConstantLists`.
Difference:
- Lists replaced with tuples
"""
max_height = 900
def build(self, height):
if height == 0:
self._print(self.ONE_TUPLE)
return self.ONE_TUPLE
elif height < 0:
return self.EMPTY_TUPLE
prev_line = self.ZERO_TUPLE + self.build(height - 1) + self.ZERO_TUPLE
line = tuple(prev_line[i] + prev_line[i + 1] for i in xrange(len(prev_line) - 1))
self._print(line)
return line # return last line for testing purposes
Diff¶
--- ../implementations/python/constant_lists.py
+++ ../implementations/python/constant_tuples.py
@@ -1,24 +1,24 @@
from .base import PyPascalTriangleBase
-class PyPascalTriangleConstantLists(PyPascalTriangleBase):
+class PyPascalTriangleConstantTuples(PyPascalTriangleBase):
"""
- Based on :py:class::`PascalTriangleOriginalPlus`.
+ Based on :py:class::`PascalTriangleConstantLists`.
Difference:
- - Constant lists replaced with named constants
+ - Lists replaced with tuples
"""
max_height = 900
def build(self, height):
if height == 0:
- self._print(self.ONE_LIST)
- return self.ONE_LIST
+ self._print(self.ONE_TUPLE)
+ return self.ONE_TUPLE
elif height < 0:
- return self.EMPTY_LIST
+ return self.EMPTY_TUPLE
- prev_line = self.ZERO_LIST + self.build(height - 1) + self.ZERO_LIST
- line = [prev_line[i] + prev_line[i + 1] for i in xrange(len(prev_line) - 1)]
+ prev_line = self.ZERO_TUPLE + self.build(height - 1) + self.ZERO_TUPLE
+ line = tuple(prev_line[i] + prev_line[i + 1] for i in xrange(len(prev_line) - 1))
self._print(line)
return line # return last line for testing purposes
PyPascalTriangleIterators¶
import itertools
from .base import PyPascalTriangleBase
class PyPascalTriangleIterators(PyPascalTriangleBase):
"""
Based on :py:class::`PascalTriangleConstantLists`.
Difference:
- Iterators are used instead of lists where it is possible
"""
max_height = 900
def build(self, height):
if height == 0:
self._print(self.ONE_LIST)
return self.ONE_LIST
elif height < 0:
return self.EMPTY_LIST
prev_line = self.build(height - 1)
iterator = itertools.chain(self.ZERO_LIST, prev_line)
ahead_iterator = itertools.chain(prev_line, self.ZERO_LIST)
line = [x + y for x, y in itertools.izip(iterator, ahead_iterator)]
self._print(line)
return line # return last line for testing purposes
Diff¶
--- ../implementations/python/constant_lists.py
+++ ../implementations/python/iterators.py
@@ -1,12 +1,14 @@
+import itertools
+
from .base import PyPascalTriangleBase
-class PyPascalTriangleConstantLists(PyPascalTriangleBase):
+class PyPascalTriangleIterators(PyPascalTriangleBase):
"""
- Based on :py:class::`PascalTriangleOriginalPlus`.
+ Based on :py:class::`PascalTriangleConstantLists`.
Difference:
- - Constant lists replaced with named constants
+ - Iterators are used instead of lists where it is possible
"""
max_height = 900
@@ -18,7 +20,9 @@
elif height < 0:
return self.EMPTY_LIST
- prev_line = self.ZERO_LIST + self.build(height - 1) + self.ZERO_LIST
- line = [prev_line[i] + prev_line[i + 1] for i in xrange(len(prev_line) - 1)]
+ prev_line = self.build(height - 1)
+ iterator = itertools.chain(self.ZERO_LIST, prev_line)
+ ahead_iterator = itertools.chain(prev_line, self.ZERO_LIST)
+ line = [x + y for x, y in itertools.izip(iterator, ahead_iterator)]
self._print(line)
return line # return last line for testing purposes
PyPascalTriangleNonRecursive¶
from .base import PyPascalTriangleBase
class PyPascalTriangleNonRecursive(PyPascalTriangleBase):
"""
A non-recursive implementation of Pascal Triangle printing algorithm.
"""
def build(self, height):
line = []
for x in xrange(height + 1):
line = self.ONE_LIST + line
for i in xrange(1, x):
line[i] += line[i + 1]
self._print(line)
return line # return last line for testing purposes
PyPascalTriangleNonRecursiveIterators¶
import itertools
from .base import PyPascalTriangleBase
class PyPascalTriangleNonRecursiveIterators(PyPascalTriangleBase):
"""
Based on :py:class::`PascalTriangleNonRecursive`.
Difference:
- iterators are used instead of lists
"""
def build(self, height):
if height < 0:
return self.EMPTY_LIST
line = self.ONE_LIST
self._print(line)
for _ in xrange(height):
iterator = itertools.chain(self.ZERO_LIST, line)
ahead_iterator = itertools.chain(line, self.ZERO_LIST)
line = [x + y for x, y in itertools.izip(iterator, ahead_iterator)]
self._print(line)
return line # return last line for testing purposes
Diff¶
--- ../implementations/python/non_recursive.py
+++ ../implementations/python/non_recursive_iterators.py
@@ -1,17 +1,26 @@
+import itertools
+
from .base import PyPascalTriangleBase
-class PyPascalTriangleNonRecursive(PyPascalTriangleBase):
+class PyPascalTriangleNonRecursiveIterators(PyPascalTriangleBase):
"""
- A non-recursive implementation of Pascal Triangle printing algorithm.
+ Based on :py:class::`PascalTriangleNonRecursive`.
+
+ Difference:
+ - iterators are used instead of lists
"""
def build(self, height):
- line = []
- for x in xrange(height + 1):
- line = self.ONE_LIST + line
- for i in xrange(1, x):
- line[i] += line[i + 1]
+ if height < 0:
+ return self.EMPTY_LIST
+
+ line = self.ONE_LIST
+ self._print(line)
+ for _ in xrange(height):
+ iterator = itertools.chain(self.ZERO_LIST, line)
+ ahead_iterator = itertools.chain(line, self.ZERO_LIST)
+ line = [x + y for x, y in itertools.izip(iterator, ahead_iterator)]
self._print(line)
return line # return last line for testing purposes
PyPascalTriangleNonRecursiveCLike¶
from .base import PyPascalTriangleBase
class PyPascalTriangleNonRecursiveCLike(PyPascalTriangleBase):
"""
Based on :py:class::`PascalTriangleNonRecursive`.
Difference:
- All required memory is allocated at once
- List concatenations are replaced with value assignment by index
"""
def build(self, height):
line = self.ZERO_LIST * (height + 1)
start = height
size = 0
while size <= height:
line[start] = 1
i = 1
while i < size:
index = start + i
line[index] += line[index + 1]
i += 1
print_line = line[start:start + size + 1]
self._print(print_line)
start -= 1
size += 1
return line # return last line for testing purposes
Diff¶
--- ../implementations/python/non_recursive.py
+++ ../implementations/python/non_recursive_c_like.py
@@ -1,17 +1,29 @@
from .base import PyPascalTriangleBase
-class PyPascalTriangleNonRecursive(PyPascalTriangleBase):
+class PyPascalTriangleNonRecursiveCLike(PyPascalTriangleBase):
"""
- A non-recursive implementation of Pascal Triangle printing algorithm.
+ Based on :py:class::`PascalTriangleNonRecursive`.
+
+ Difference:
+ - All required memory is allocated at once
+ - List concatenations are replaced with value assignment by index
"""
def build(self, height):
- line = []
- for x in xrange(height + 1):
- line = self.ONE_LIST + line
- for i in xrange(1, x):
- line[i] += line[i + 1]
- self._print(line)
+ line = self.ZERO_LIST * (height + 1)
+ start = height
+ size = 0
+ while size <= height:
+ line[start] = 1
+ i = 1
+ while i < size:
+ index = start + i
+ line[index] += line[index + 1]
+ i += 1
+ print_line = line[start:start + size + 1]
+ self._print(print_line)
+ start -= 1
+ size += 1
return line # return last line for testing purposes
PyPascalTriangleNonRecursiveCLikeImproved¶
from .base import PyPascalTriangleBase
class PyPascalTriangleNonRecursiveCLikeImproved(PyPascalTriangleBase):
"""
Based on :py:class::`PascalTriangleNonRecursiveCLike`.
Difference:
- Removed some unneeded computations
"""
def build(self, height):
line = self.ONE_LIST * (height + 1)
start = height
size = 0
while size <= height:
index = start + 1
while index < height:
line[index] += line[index + 1]
index += 1
print_line = line[start:start + size + 1]
self._print(print_line)
start -= 1
size += 1
return line # return last line for testing purposes
Diff¶
--- ../implementations/python/non_recursive_c_like.py
+++ ../implementations/python/non_recursive_c_like_improved.py
@@ -1,26 +1,23 @@
from .base import PyPascalTriangleBase
-class PyPascalTriangleNonRecursiveCLike(PyPascalTriangleBase):
+class PyPascalTriangleNonRecursiveCLikeImproved(PyPascalTriangleBase):
"""
- Based on :py:class::`PascalTriangleNonRecursive`.
+ Based on :py:class::`PascalTriangleNonRecursiveCLike`.
Difference:
- - All required memory is allocated at once
- - List concatenations are replaced with value assignment by index
+ - Removed some unneeded computations
"""
def build(self, height):
- line = self.ZERO_LIST * (height + 1)
+ line = self.ONE_LIST * (height + 1)
start = height
size = 0
while size <= height:
- line[start] = 1
- i = 1
- while i < size:
- index = start + i
+ index = start + 1
+ while index < height:
line[index] += line[index + 1]
- i += 1
+ index += 1
print_line = line[start:start + size + 1]
self._print(print_line)
start -= 1
PyPascalTriangleNonRecursiveLessCLike¶
from .base import PyPascalTriangleBase
class PyPascalTriangleNonRecursiveLessCLike(PyPascalTriangleBase):
"""
Based on :py:class::`PascalTriangleNonRecursiveCLike`.
Difference:
- xrange objects are used instead of while-loops
"""
def build(self, height):
line = self.ONE_LIST * (height + 1)
start = height
for size in xrange(height + 1):
for index in xrange(start + 1, height):
line[index] += line[index + 1]
print_line = line[start:]
self._print(print_line)
start -= 1
return line # return last line for testing purposes
Diff¶
--- ../implementations/python/non_recursive_c_like.py
+++ ../implementations/python/non_recursive_less_c_like.py
@@ -1,29 +1,22 @@
from .base import PyPascalTriangleBase
-class PyPascalTriangleNonRecursiveCLike(PyPascalTriangleBase):
+class PyPascalTriangleNonRecursiveLessCLike(PyPascalTriangleBase):
"""
- Based on :py:class::`PascalTriangleNonRecursive`.
+ Based on :py:class::`PascalTriangleNonRecursiveCLike`.
Difference:
- - All required memory is allocated at once
- - List concatenations are replaced with value assignment by index
+ - xrange objects are used instead of while-loops
"""
def build(self, height):
- line = self.ZERO_LIST * (height + 1)
+ line = self.ONE_LIST * (height + 1)
start = height
- size = 0
- while size <= height:
- line[start] = 1
- i = 1
- while i < size:
- index = start + i
+ for size in xrange(height + 1):
+ for index in xrange(start + 1, height):
line[index] += line[index + 1]
- i += 1
- print_line = line[start:start + size + 1]
+ print_line = line[start:]
self._print(print_line)
start -= 1
- size += 1
return line # return last line for testing purposes
PyPascalTriangleNonRecursiveArray¶
import itertools
from array import array
from .base import PyPascalTriangleBase
class PyPascalTriangleNonRecursiveArray(PyPascalTriangleBase):
"""
Based on :py:class::`PascalTriangleNonRecursiveLessCLike`.
Difference:
- array object is used instead of list
"""
max_height = 34
def build(self, height):
line = array('I', itertools.repeat(1, height + 1))
start = height
for size in xrange(height + 1):
for index in xrange(start + 1, height):
line[index] += line[index + 1]
print_line = line[start:]
self._print(print_line)
start -= 1
return line # return last line for testing purposes
Diff¶
--- ../implementations/python/non_recursive_less_c_like.py
+++ ../implementations/python/non_recursive_array.py
@@ -1,16 +1,21 @@
+import itertools
+from array import array
+
from .base import PyPascalTriangleBase
-class PyPascalTriangleNonRecursiveLessCLike(PyPascalTriangleBase):
+class PyPascalTriangleNonRecursiveArray(PyPascalTriangleBase):
"""
- Based on :py:class::`PascalTriangleNonRecursiveCLike`.
+ Based on :py:class::`PascalTriangleNonRecursiveLessCLike`.
Difference:
- - xrange objects are used instead of while-loops
+ - array object is used instead of list
"""
+ max_height = 34
+
def build(self, height):
- line = self.ONE_LIST * (height + 1)
+ line = array('I', itertools.repeat(1, height + 1))
start = height
for size in xrange(height + 1):
for index in xrange(start + 1, height):
CyPascalTriangleBase¶
from pascal_triangle.implementations.base import PascalTriangleBase
class CyPascalTriangleBase(PascalTriangleBase):
language = 'Cython'
CyPascalTriangleConstantLists¶
from .base import CyPascalTriangleBase
class CyPascalTriangleConstantLists(CyPascalTriangleBase):
"""
Based on :py:class::`PascalTriangleConstantLists`.
Difference:
- Compiled with Cython
"""
max_height = 900
def build(self, height):
if height == 0:
self._print(self.ONE_LIST)
return self.ONE_LIST
elif height < 0:
return self.EMPTY_LIST
prev_line = self.ZERO_LIST + self.build(height - 1) + self.ZERO_LIST
line = [prev_line[i] + prev_line[i + 1] for i in xrange(len(prev_line) - 1)]
self._print(line)
return line # return last line for testing purposes
Diff¶
--- ../implementations/python/constant_lists.py
+++ ../implementations/cython/constant_lists.pyx
@@ -1,12 +1,12 @@
-from .base import PyPascalTriangleBase
+from .base import CyPascalTriangleBase
-class PyPascalTriangleConstantLists(PyPascalTriangleBase):
+class CyPascalTriangleConstantLists(CyPascalTriangleBase):
"""
- Based on :py:class::`PascalTriangleOriginalPlus`.
+ Based on :py:class::`PascalTriangleConstantLists`.
Difference:
- - Constant lists replaced with named constants
+ - Compiled with Cython
"""
max_height = 900
CyPascalTriangleIterators¶
import itertools
from .base import CyPascalTriangleBase
class CyPascalTriangleIterators(CyPascalTriangleBase):
"""
Based on :py:class::`PascalTriangleIterators`.
Difference:
- Compiled with Cython
"""
max_height = 900
def build(self, height):
if height == 0:
self._print(self.ONE_LIST)
return self.ONE_LIST
elif height < 0:
return self.EMPTY_LIST
prev_line = self.build(height - 1)
iterator = itertools.chain(self.ZERO_LIST, prev_line)
ahead_iterator = itertools.chain(prev_line, self.ZERO_LIST)
line = [x + y for x, y in itertools.izip(iterator, ahead_iterator)]
self._print(line)
return line # return last line for testing purposes
Diff¶
--- ../implementations/python/iterators.py
+++ ../implementations/cython/iterators.pyx
@@ -1,14 +1,14 @@
import itertools
-from .base import PyPascalTriangleBase
+from .base import CyPascalTriangleBase
-class PyPascalTriangleIterators(PyPascalTriangleBase):
+class CyPascalTriangleIterators(CyPascalTriangleBase):
"""
- Based on :py:class::`PascalTriangleConstantLists`.
+ Based on :py:class::`PascalTriangleIterators`.
Difference:
- - Iterators are used instead of lists where it is possible
+ - Compiled with Cython
"""
max_height = 900
CyPascalTriangleNonRecursive¶
from .base import CyPascalTriangleBase
class CyPascalTriangleNonRecursive(CyPascalTriangleBase):
"""
Based on :py:class::`PyPascalTriangleNonRecursive`.
Difference:
- Compiled with Cython
"""
def build(self, height):
line = []
for x in xrange(height + 1):
line = self.ONE_LIST + line
for i in xrange(1, x):
line[i] += line[i + 1]
self._print(line)
return line # return last line for testing purposes
Diff¶
--- ../implementations/python/non_recursive.py
+++ ../implementations/cython/non_recursive.pyx
@@ -1,9 +1,12 @@
-from .base import PyPascalTriangleBase
+from .base import CyPascalTriangleBase
-class PyPascalTriangleNonRecursive(PyPascalTriangleBase):
+class CyPascalTriangleNonRecursive(CyPascalTriangleBase):
"""
- A non-recursive implementation of Pascal Triangle printing algorithm.
+ Based on :py:class::`PyPascalTriangleNonRecursive`.
+
+ Difference:
+ - Compiled with Cython
"""
def build(self, height):
CyPascalTriangleNonRecursiveIterators¶
import itertools
from .base import CyPascalTriangleBase
class CyPascalTriangleNonRecursiveIterators(CyPascalTriangleBase):
"""
Based on :py:class::`PyPascalTriangleNonRecursiveIterators`.
Difference:
- Compiled with Cython
"""
def build(self, height):
if height < 0:
return self.EMPTY_LIST
line = self.ONE_LIST
self._print(line)
for _ in xrange(height):
iterator = itertools.chain(self.ZERO_LIST, line)
ahead_iterator = itertools.chain(line, self.ZERO_LIST)
line = [x + y for x, y in itertools.izip(iterator, ahead_iterator)]
self._print(line)
return line # return last line for testing purposes
Diff¶
--- ../implementations/python/non_recursive_iterators.py
+++ ../implementations/cython/non_recursive_iterators.pyx
@@ -1,14 +1,14 @@
import itertools
-from .base import PyPascalTriangleBase
+from .base import CyPascalTriangleBase
-class PyPascalTriangleNonRecursiveIterators(PyPascalTriangleBase):
+class CyPascalTriangleNonRecursiveIterators(CyPascalTriangleBase):
"""
- Based on :py:class::`PascalTriangleNonRecursive`.
+ Based on :py:class::`PyPascalTriangleNonRecursiveIterators`.
Difference:
- - iterators are used instead of lists
+ - Compiled with Cython
"""
def build(self, height):
CyPascalTriangleNonRecursiveCLikeImproved¶
from .base import CyPascalTriangleBase
class CyPascalTriangleNonRecursiveCLikeImproved(CyPascalTriangleBase):
"""
Based on :py:class::`PyPascalTriangleNonRecursiveCLikeImproved`.
Difference:
- Compiled with Cython
"""
def build(self, height):
line = self.ONE_LIST * (height + 1)
start = height
size = 0
while size <= height:
index = start + 1
while index < height:
line[index] += line[index + 1]
index += 1
print_line = line[start:start + size + 1]
self._print(print_line)
start -= 1
size += 1
return line # return last line for testing purposes
Diff¶
--- ../implementations/python/non_recursive_c_like_improved.py
+++ ../implementations/cython/non_recursive_c_like_improved.pyx
@@ -1,12 +1,12 @@
-from .base import PyPascalTriangleBase
+from .base import CyPascalTriangleBase
-class PyPascalTriangleNonRecursiveCLikeImproved(PyPascalTriangleBase):
+class CyPascalTriangleNonRecursiveCLikeImproved(CyPascalTriangleBase):
"""
- Based on :py:class::`PascalTriangleNonRecursiveCLike`.
+ Based on :py:class::`PyPascalTriangleNonRecursiveCLikeImproved`.
Difference:
- - Removed some unneeded computations
+ - Compiled with Cython
"""
def build(self, height):
CyPascalTriangleNonRecursiveCLikeMoreImproved¶
import sys
from .base import CyPascalTriangleBase
class CyPascalTriangleNonRecursiveCLikeMoreImproved(CyPascalTriangleBase):
"""
Based on :py:class::`CyPascalTriangleNonRecursiveCLikeImproved`.
Difference:
- List slicing replaced with character by character output
"""
def build(self, height, verbose=False):
line = self.ONE_LIST * (height + 1)
start = height
size = 0
while size <= height:
index = start + 1
while index < height:
line[index] += line[index + 1]
index += 1
i = start
finish = start + size + 1
while i < finish:
if verbose:
sys.stdout.write(str(line[i]) + " ")
i += 1
if verbose:
print
start -= 1
size += 1
return line # return last line for testing purposes
Diff¶
--- ../implementations/cython/non_recursive_c_like_improved.pyx
+++ ../implementations/cython/non_recursive_c_like_more_improved.pyx
@@ -1,15 +1,17 @@
+import sys
+
from .base import CyPascalTriangleBase
-class CyPascalTriangleNonRecursiveCLikeImproved(CyPascalTriangleBase):
+class CyPascalTriangleNonRecursiveCLikeMoreImproved(CyPascalTriangleBase):
"""
- Based on :py:class::`PyPascalTriangleNonRecursiveCLikeImproved`.
+ Based on :py:class::`CyPascalTriangleNonRecursiveCLikeImproved`.
Difference:
- - Compiled with Cython
+ - List slicing replaced with character by character output
"""
- def build(self, height):
+ def build(self, height, verbose=False):
line = self.ONE_LIST * (height + 1)
start = height
size = 0
@@ -18,8 +20,16 @@
while index < height:
line[index] += line[index + 1]
index += 1
- print_line = line[start:start + size + 1]
- self._print(print_line)
+
+ i = start
+ finish = start + size + 1
+ while i < finish:
+ if verbose:
+ sys.stdout.write(str(line[i]) + " ")
+ i += 1
+ if verbose:
+ print
+
start -= 1
size += 1
CyPascalTriangleNonRecursiveLessCLike¶
from .base import CyPascalTriangleBase
class CyPascalTriangleNonRecursiveLessCLike(CyPascalTriangleBase):
"""
Based on :py:class::`PyPascalTriangleNonRecursiveLessCLike`.
Difference:
- Compiled with Cython
"""
def build(self, height):
line = self.ONE_LIST * (height + 1)
start = height
for size in xrange(height + 1):
for index in xrange(start + 1, height):
line[index] += line[index + 1]
print_line = line[start:]
self._print(print_line)
start -= 1
return line # return last line for testing purposes
Diff¶
--- ../implementations/python/non_recursive_less_c_like.py
+++ ../implementations/cython/non_recursive_less_c_like.pyx
@@ -1,12 +1,12 @@
-from .base import PyPascalTriangleBase
+from .base import CyPascalTriangleBase
-class PyPascalTriangleNonRecursiveLessCLike(PyPascalTriangleBase):
+class CyPascalTriangleNonRecursiveLessCLike(CyPascalTriangleBase):
"""
- Based on :py:class::`PascalTriangleNonRecursiveCLike`.
+ Based on :py:class::`PyPascalTriangleNonRecursiveLessCLike`.
Difference:
- - xrange objects are used instead of while-loops
+ - Compiled with Cython
"""
def build(self, height):
CyPascalTriangleNonRecursiveCTypes¶
import sys
import itertools
from array import array
from cpython cimport array as c_array
from .base import CyPascalTriangleBase
class CyPascalTriangleNonRecursiveCTypes(CyPascalTriangleBase):
"""
Based on :py:class::`PyPascalTriangleNonRecursiveArray`.
Difference:
- Compiled with Cython
- C type are used instead of Python types
"""
max_height = 34
def build(self, height, verbose=False):
cdef c_array.array line = array('I', itertools.repeat(1, height + 1))
cdef int start = height
cdef int size = 0
cdef int index
cdef int i
cdef int finish
while size <= height:
index = start + 1
while index < height:
line[index] += line[index + 1]
index += 1
i = start
finish = start + size + 1
while i < finish:
if verbose:
sys.stdout.write(str(line[i]) + ' ')
i += 1
if verbose:
print
start -= 1
size += 1
return line # return last line for testing purposes
Diff¶
--- ../implementations/python/non_recursive_array.py
+++ ../implementations/cython/c_types.pyx
@@ -1,27 +1,45 @@
+import sys
+
import itertools
from array import array
+from cpython cimport array as c_array
-from .base import PyPascalTriangleBase
+from .base import CyPascalTriangleBase
-class PyPascalTriangleNonRecursiveArray(PyPascalTriangleBase):
+class CyPascalTriangleNonRecursiveCTypes(CyPascalTriangleBase):
"""
- Based on :py:class::`PascalTriangleNonRecursiveLessCLike`.
+ Based on :py:class::`PyPascalTriangleNonRecursiveArray`.
Difference:
- - array object is used instead of list
+ - Compiled with Cython
+ - C type are used instead of Python types
"""
max_height = 34
- def build(self, height):
- line = array('I', itertools.repeat(1, height + 1))
- start = height
- for size in xrange(height + 1):
- for index in xrange(start + 1, height):
+ def build(self, height, verbose=False):
+ cdef c_array.array line = array('I', itertools.repeat(1, height + 1))
+ cdef int start = height
+ cdef int size = 0
+ cdef int index
+ cdef int i
+ cdef int finish
+ while size <= height:
+ index = start + 1
+ while index < height:
line[index] += line[index + 1]
- print_line = line[start:]
- self._print(print_line)
+ index += 1
+
+ i = start
+ finish = start + size + 1
+ while i < finish:
+ if verbose:
+ sys.stdout.write(str(line[i]) + ' ')
+ i += 1
+ if verbose:
+ print
+
start -= 1
-
+ size += 1
return line # return last line for testing purposes
CyPascalTriangleNonRecursiveCTypesULong¶
import sys
import itertools
from array import array
from cpython cimport array as c_array
from .base import CyPascalTriangleBase
class CyPascalTriangleNonRecursiveCTypesULong(CyPascalTriangleBase):
"""
Based on :py:class::`CyPascalTriangleNonRecursiveCTypes`.
Difference:
- 64-bit integers are used to store values instead of 32-bit integers
"""
max_height = 67
def build(self, height, verbose=False):
cdef c_array.array line = array('L', itertools.repeat(1, height + 1))
cdef int start = height
cdef int size = 0
cdef int index
cdef int i
cdef int finish
while size <= height:
index = start + 1
while index < height:
line[index] += line[index + 1]
index += 1
i = start
finish = start + size + 1
while i < finish:
if verbose:
sys.stdout.write(str(line[i]) + ' ')
i += 1
if verbose:
print
start -= 1
size += 1
return line # return last line for testing purposes
Diff¶
--- ../implementations/cython/c_types.pyx
+++ ../implementations/cython/c_types_ulong.pyx
@@ -7,19 +7,18 @@
from .base import CyPascalTriangleBase
-class CyPascalTriangleNonRecursiveCTypes(CyPascalTriangleBase):
+class CyPascalTriangleNonRecursiveCTypesULong(CyPascalTriangleBase):
"""
- Based on :py:class::`PyPascalTriangleNonRecursiveArray`.
+ Based on :py:class::`CyPascalTriangleNonRecursiveCTypes`.
Difference:
- - Compiled with Cython
- - C type are used instead of Python types
+ - 64-bit integers are used to store values instead of 32-bit integers
"""
- max_height = 34
+ max_height = 67
def build(self, height, verbose=False):
- cdef c_array.array line = array('I', itertools.repeat(1, height + 1))
+ cdef c_array.array line = array('L', itertools.repeat(1, height + 1))
cdef int start = height
cdef int size = 0
cdef int index
CPascalTriangleInitial¶
from .base import CPascalTriangleBase
from .cinitial import c_pascal_triangle_initial
class CPascalTriangleInitial(CPascalTriangleBase):
"""
Based on :py:class::`CyPascalTriangleNonRecursiveCTypes`.
Difference:
- C implementation
"""
max_height = 34
def build(self, height):
return c_pascal_triangle_initial(height, 0, self.return_list)
#include <Python.h>
#include <stdio.h>
// TODO LOW: Allocate memory on heap using malloc() instead of stack
#define MAX_HEIGHT_UINT 34
static PyObject* c_pascal_triangle_initial(PyObject* self, PyObject* args) {
int height, verbose=0, return_list=0;
if (!PyArg_ParseTuple(args, "i|ii", &height, &verbose, &return_list)) return NULL;
unsigned int line[height + 1];
int start = height;
int size;
int index;
if(height > MAX_HEIGHT_UINT) {
PyErr_Format(PyExc_ValueError, "Unable to build Pascal's Triangle higher than %d", MAX_HEIGHT_UINT);
return NULL;
}
for (size = 0; size <= height; size++) {
line[start] = 1;
for (index = start + 1; index < height; index++) {
line[index] += line[index + 1];
}
for (index = start + size; index >= start; index--) {
if(verbose) printf("%u ", line[index]);
}
if(verbose) printf("\n");
start--;
}
if(return_list) {
PyObject* py_line = PyList_New(height + 1);
for (index = height; index >= 0; index--) {
PyList_SetItem(py_line, index, PyInt_FromLong((long) line[index]));
}
return py_line;
} else {
Py_INCREF(Py_None);
return Py_None;
}
}
static PyMethodDef cpascal_triangle_methods[] = {
{"c_pascal_triangle_initial", c_pascal_triangle_initial, METH_VARARGS},
{NULL, NULL}
};
void initcinitial(void) {
(void) Py_InitModule("cinitial", cpascal_triangle_methods);
}
CPascalTriangleULong¶
from .base import CPascalTriangleBase
from .culong import c_pascal_triangle_ulong
class CPascalTriangleULong(CPascalTriangleBase):
"""
Based on :py:class::`CPascalTriangleInitial`.
Difference:
- Unsigned long is used instead of unsigned int
"""
max_height = 67
def build(self, height):
return c_pascal_triangle_ulong(height, 0, self.return_list)
#include <Python.h>
#include <stdio.h>
// TODO LOW: Allocate memory on heap using malloc() instead of stack
#define MAX_HEIGHT_ULONG 67
static PyObject* c_pascal_triangle_ulong(PyObject* self, PyObject* args) {
int height, verbose=0, return_list=0;
if (!PyArg_ParseTuple(args, "i|ii", &height, &verbose, &return_list)) return NULL;
unsigned long line[height + 1];
int start = height;
int size;
int index;
if(height > MAX_HEIGHT_ULONG) {
PyErr_Format(PyExc_ValueError, "Unable to build Pascal's Triangle higher than %d", MAX_HEIGHT_ULONG);
return NULL;
}
for (size = 0; size <= height; size++) {
line[start] = 1;
for (index = start + 1; index < height; index++) {
line[index] += line[index + 1];
}
for (index = start + size; index >= start; index--) {
if(verbose) printf("%lu ", line[index]);
}
if(verbose) printf("\n");
start--;
}
if(return_list) {
PyObject* py_line = PyList_New(height + 1);
for (index = height; index >= 0; index--) {
PyList_SetItem(py_line, index, PyLong_FromUnsignedLong(line[index]));
}
return py_line;
} else {
Py_INCREF(Py_None);
return Py_None;
}
}
static PyMethodDef cpascal_triangle_methods[] = {
{"c_pascal_triangle_ulong", c_pascal_triangle_ulong, METH_VARARGS},
{NULL, NULL}
};
void initculong(void) {
(void) Py_InitModule("culong", cpascal_triangle_methods);
}
Diff¶
--- ../implementations/cextension/initial.py
+++ ../implementations/cextension/ulong.py
@@ -1,16 +1,16 @@
from .base import CPascalTriangleBase
-from .cinitial import c_pascal_triangle_initial
+from .culong import c_pascal_triangle_ulong
-class CPascalTriangleInitial(CPascalTriangleBase):
+class CPascalTriangleULong(CPascalTriangleBase):
"""
- Based on :py:class::`CyPascalTriangleNonRecursiveCTypes`.
+ Based on :py:class::`CPascalTriangleInitial`.
Difference:
- - C implementation
+ - Unsigned long is used instead of unsigned int
"""
- max_height = 34
+ max_height = 67
def build(self, height):
- return c_pascal_triangle_initial(height, 0, self.return_list)
+ return c_pascal_triangle_ulong(height, 0, self.return_list)
--- ../implementations/cextension/cinitial.c
+++ ../implementations/cextension/culong.c
@@ -2,21 +2,22 @@
#include <stdio.h>
// TODO LOW: Allocate memory on heap using malloc() instead of stack
-#define MAX_HEIGHT_UINT 34
+
+#define MAX_HEIGHT_ULONG 67
-static PyObject* c_pascal_triangle_initial(PyObject* self, PyObject* args) {
+static PyObject* c_pascal_triangle_ulong(PyObject* self, PyObject* args) {
int height, verbose=0, return_list=0;
if (!PyArg_ParseTuple(args, "i|ii", &height, &verbose, &return_list)) return NULL;
- unsigned int line[height + 1];
+ unsigned long line[height + 1];
int start = height;
int size;
int index;
- if(height > MAX_HEIGHT_UINT) {
- PyErr_Format(PyExc_ValueError, "Unable to build Pascal's Triangle higher than %d", MAX_HEIGHT_UINT);
+ if(height > MAX_HEIGHT_ULONG) {
+ PyErr_Format(PyExc_ValueError, "Unable to build Pascal's Triangle higher than %d", MAX_HEIGHT_ULONG);
return NULL;
}
@@ -27,7 +28,7 @@
}
for (index = start + size; index >= start; index--) {
- if(verbose) printf("%u ", line[index]);
+ if(verbose) printf("%lu ", line[index]);
}
if(verbose) printf("\n");
@@ -37,7 +38,7 @@
if(return_list) {
PyObject* py_line = PyList_New(height + 1);
for (index = height; index >= 0; index--) {
- PyList_SetItem(py_line, index, PyInt_FromLong((long) line[index]));
+ PyList_SetItem(py_line, index, PyLong_FromUnsignedLong(line[index]));
}
return py_line;
} else {
@@ -48,10 +49,10 @@
static PyMethodDef cpascal_triangle_methods[] = {
- {"c_pascal_triangle_initial", c_pascal_triangle_initial, METH_VARARGS},
+ {"c_pascal_triangle_ulong", c_pascal_triangle_ulong, METH_VARARGS},
{NULL, NULL}
};
-void initcinitial(void) {
- (void) Py_InitModule("cinitial", cpascal_triangle_methods);
+void initculong(void) {
+ (void) Py_InitModule("culong", cpascal_triangle_methods);
}
CPascalTrianglePartialAsm¶
from .base import CPascalTriangleBase
from .cpartial_asm import c_pascal_triangle_partial_asm
class CPascalTrianglePartialAsm(CPascalTriangleBase):
"""
Based on :py:class::`CPascalTriangleInitial`.
Difference:
- Array initialization is implemented in Assembler
"""
language = 'C/Assembler'
max_height = 34
def build(self, height):
return c_pascal_triangle_partial_asm(height, 0, self.return_list)
#include <Python.h>
#include <stdio.h>
// TODO LOW: Allocate memory on heap using malloc() instead of stack
#define MAX_HEIGHT_UINT 34
static PyObject* c_pascal_triangle_partial_asm(PyObject* self, PyObject* args) {
int height, verbose=0, return_list=0;
if (!PyArg_ParseTuple(args, "i|ii", &height, &verbose, &return_list)) return NULL;
unsigned int line[height + 1];
int start = height;
int size;
int index;
if(height > MAX_HEIGHT_UINT) {
PyErr_Format(PyExc_ValueError, "Unable to build Pascal's Triangle higher than %d", MAX_HEIGHT_UINT);
return NULL;
}
asm (
// Fill in line with 1 and put 0 in the end
"movl %0, %%ecx\n\t"
"movq %1, %%rdi\n\t"
"incl %%ecx\n\t"
"movl $1, %%eax\n\t"
"cld\n\t"
"rep\n\t"
"stosl\n\t"
: /**/
: "g" (height), "g" (line)
: "eax", "ecx", "rdi"
);
for (size = 0; size <= height; size++) {
for (index = start + 1; index < height; index++) {
line[index] += line[index + 1];
}
for (index = start + size; index >= start; index--) {
if(verbose) printf("%u ", line[index]);
}
if(verbose) printf("\n");
start--;
}
if(return_list) {
PyObject* py_line = PyList_New(height + 1);
for (index = height; index >= 0; index--) {
PyList_SetItem(py_line, index, PyInt_FromLong((long) line[index]));
}
return py_line;
} else {
Py_INCREF(Py_None);
return Py_None;
}
}
static PyMethodDef cpascal_triangle_methods[] = {
{"c_pascal_triangle_partial_asm", c_pascal_triangle_partial_asm, METH_VARARGS},
{NULL, NULL}
};
void initcpartial_asm(void) {
(void) Py_InitModule("cpartial_asm", cpascal_triangle_methods);
}
Diff¶
--- ../implementations/cextension/initial.py
+++ ../implementations/cextension/partial_asm.py
@@ -1,16 +1,17 @@
from .base import CPascalTriangleBase
-from .cinitial import c_pascal_triangle_initial
+from .cpartial_asm import c_pascal_triangle_partial_asm
-class CPascalTriangleInitial(CPascalTriangleBase):
+class CPascalTrianglePartialAsm(CPascalTriangleBase):
"""
- Based on :py:class::`CyPascalTriangleNonRecursiveCTypes`.
+ Based on :py:class::`CPascalTriangleInitial`.
Difference:
- - C implementation
+ - Array initialization is implemented in Assembler
"""
+ language = 'C/Assembler'
max_height = 34
def build(self, height):
- return c_pascal_triangle_initial(height, 0, self.return_list)
+ return c_pascal_triangle_partial_asm(height, 0, self.return_list)
--- ../implementations/cextension/cinitial.c
+++ ../implementations/cextension/cpartial_asm.c
@@ -2,10 +2,11 @@
#include <stdio.h>
// TODO LOW: Allocate memory on heap using malloc() instead of stack
+
#define MAX_HEIGHT_UINT 34
-static PyObject* c_pascal_triangle_initial(PyObject* self, PyObject* args) {
+static PyObject* c_pascal_triangle_partial_asm(PyObject* self, PyObject* args) {
int height, verbose=0, return_list=0;
if (!PyArg_ParseTuple(args, "i|ii", &height, &verbose, &return_list)) return NULL;
@@ -20,8 +21,21 @@
return NULL;
}
+ asm (
+ // Fill in line with 1 and put 0 in the end
+ "movl %0, %%ecx\n\t"
+ "movq %1, %%rdi\n\t"
+ "incl %%ecx\n\t"
+ "movl $1, %%eax\n\t"
+ "cld\n\t"
+ "rep\n\t"
+ "stosl\n\t"
+ : /**/
+ : "g" (height), "g" (line)
+ : "eax", "ecx", "rdi"
+ );
+
for (size = 0; size <= height; size++) {
- line[start] = 1;
for (index = start + 1; index < height; index++) {
line[index] += line[index + 1];
}
@@ -48,10 +62,10 @@
static PyMethodDef cpascal_triangle_methods[] = {
- {"c_pascal_triangle_initial", c_pascal_triangle_initial, METH_VARARGS},
+ {"c_pascal_triangle_partial_asm", c_pascal_triangle_partial_asm, METH_VARARGS},
{NULL, NULL}
};
-void initcinitial(void) {
- (void) Py_InitModule("cinitial", cpascal_triangle_methods);
+void initcpartial_asm(void) {
+ (void) Py_InitModule("cpartial_asm", cpascal_triangle_methods);
}
CPascalTriangleFullAsm¶
from .base import CPascalTriangleBase
from .cfull_asm import c_pascal_triangle_full_asm
class CPascalTriangleFullAsm(CPascalTriangleBase):
"""
Based on :py:class::`CPascalTrianglePartialAsm`.
Difference:
- The algorithm is implemented in Assembler completely
"""
language = 'C/Assembler'
max_height = 34
def build(self, height):
return c_pascal_triangle_full_asm(height, 0, self.return_list)
#include <Python.h>
#include <stdio.h>
// TODO LOW: Allocate memory on heap using malloc() instead of stack
#define MAX_HEIGHT_UINT 34
static PyObject* c_pascal_triangle_full_asm(PyObject* self, PyObject* args) {
int height, verbose=0, return_list=0;
if (!PyArg_ParseTuple(args, "i|ii", &height, &verbose, &return_list)) return NULL;
// TODO MEDIUM: Make a better fix to support zero-based Pascal's Triangle
height++;
unsigned int line[height + 1];
int index;
if(height > MAX_HEIGHT_UINT + 1) {
PyErr_Format(PyExc_ValueError, "Unable to build Pascal's Triangle higher than %d", MAX_HEIGHT_UINT);
return NULL;
}
asm (
// Fill in line with 1 and put 0 in the end
"movl %0, %%ecx\n\t"
"movq %1, %%rdi\n\t"
"movl $1, %%eax\n\t"
"cld\n\t"
"rep\n\t"
"stosl\n\t"
// End of filling
"movq $0, (%%rdi)\n\t" // TODO MEDIUM: There is no need for zero-padding
"movq $1, %%rcx\n\t" // line number being processed
"xorq %%rdx, %%rdx\n\t"
"movl %0, %%edx\n\t" // maximum number of lines
"movq %1, %%rdi\n\t" // start of "line" array
"outer:\n\t"
"movq %%rdx, %%rbx\n\t" // offset of current element
"subq %%rcx, %%rbx\n\t"
"xorq %%r8, %%r8\n\t"
// print %%rcx dwords starting from (%%rdi,%%rbx,4)
"inner:\n\t"
"movl 4(%%rdi,%%rbx,4), %%eax\n\t" // read line[index + 1]
"addl %%eax, (%%rdi,%%rbx,4)\n\t" // line[index] += line[index + 1]
"addq $1, %%rbx\n\t"
"incq %%r8\n\t"
"cmpq %%r8, %%rcx\n\t"
"jne inner\n\t"
"incq %%rcx\n\t"
"cmpq %%rdx, %%rcx\n\t"
"jl outer\n\t"
// print %%rdx dwords starting from (%%rdi)
: /**/
: "g" (height), "g" (line)
: "eax", "rbx", "rcx", "rdx", "rdi", "r8"
);
// Some code that used results to ensure that optimization is not applied and assembly code is actually executing
if(verbose) {
for (index = height - 1; index >= 0; index--) {
if(verbose) printf("%u ", line[index]);
}
if(verbose) printf("\n");
}
if(return_list) {
PyObject* py_line = PyList_New(height);
for (index = height - 1; index >= 0; index--) {
PyList_SetItem(py_line, index, PyInt_FromLong((long) line[index]));
}
return py_line;
} else {
Py_INCREF(Py_None);
return Py_None;
}
}
static PyMethodDef cpascal_triangle_methods[] = {
{"c_pascal_triangle_full_asm", c_pascal_triangle_full_asm, METH_VARARGS},
{NULL, NULL}
};
void initcfull_asm(void) {
(void) Py_InitModule("cfull_asm", cpascal_triangle_methods);
}
Diff¶
--- ../implementations/cextension/partial_asm.py
+++ ../implementations/cextension/full_asm.py
@@ -1,17 +1,17 @@
from .base import CPascalTriangleBase
-from .cpartial_asm import c_pascal_triangle_partial_asm
+from .cfull_asm import c_pascal_triangle_full_asm
-class CPascalTrianglePartialAsm(CPascalTriangleBase):
+class CPascalTriangleFullAsm(CPascalTriangleBase):
"""
- Based on :py:class::`CPascalTriangleInitial`.
+ Based on :py:class::`CPascalTrianglePartialAsm`.
Difference:
- - Array initialization is implemented in Assembler
+ - The algorithm is implemented in Assembler completely
"""
language = 'C/Assembler'
max_height = 34
def build(self, height):
- return c_pascal_triangle_partial_asm(height, 0, self.return_list)
+ return c_pascal_triangle_full_asm(height, 0, self.return_list)
--- ../implementations/cextension/cpartial_asm.c
+++ ../implementations/cextension/cfull_asm.c
@@ -6,17 +6,16 @@
#define MAX_HEIGHT_UINT 34
-static PyObject* c_pascal_triangle_partial_asm(PyObject* self, PyObject* args) {
+static PyObject* c_pascal_triangle_full_asm(PyObject* self, PyObject* args) {
int height, verbose=0, return_list=0;
if (!PyArg_ParseTuple(args, "i|ii", &height, &verbose, &return_list)) return NULL;
+ // TODO MEDIUM: Make a better fix to support zero-based Pascal's Triangle
+ height++;
unsigned int line[height + 1];
-
- int start = height;
- int size;
int index;
- if(height > MAX_HEIGHT_UINT) {
+ if(height > MAX_HEIGHT_UINT + 1) {
PyErr_Format(PyExc_ValueError, "Unable to build Pascal's Triangle higher than %d", MAX_HEIGHT_UINT);
return NULL;
}
@@ -25,32 +24,48 @@
// Fill in line with 1 and put 0 in the end
"movl %0, %%ecx\n\t"
"movq %1, %%rdi\n\t"
- "incl %%ecx\n\t"
"movl $1, %%eax\n\t"
"cld\n\t"
"rep\n\t"
"stosl\n\t"
+ // End of filling
+ "movq $0, (%%rdi)\n\t" // TODO MEDIUM: There is no need for zero-padding
+ "movq $1, %%rcx\n\t" // line number being processed
+ "xorq %%rdx, %%rdx\n\t"
+ "movl %0, %%edx\n\t" // maximum number of lines
+ "movq %1, %%rdi\n\t" // start of "line" array
+ "outer:\n\t"
+ "movq %%rdx, %%rbx\n\t" // offset of current element
+ "subq %%rcx, %%rbx\n\t"
+ "xorq %%r8, %%r8\n\t"
+ // print %%rcx dwords starting from (%%rdi,%%rbx,4)
+ "inner:\n\t"
+ "movl 4(%%rdi,%%rbx,4), %%eax\n\t" // read line[index + 1]
+ "addl %%eax, (%%rdi,%%rbx,4)\n\t" // line[index] += line[index + 1]
+ "addq $1, %%rbx\n\t"
+ "incq %%r8\n\t"
+ "cmpq %%r8, %%rcx\n\t"
+ "jne inner\n\t"
+ "incq %%rcx\n\t"
+ "cmpq %%rdx, %%rcx\n\t"
+ "jl outer\n\t"
+ // print %%rdx dwords starting from (%%rdi)
: /**/
: "g" (height), "g" (line)
- : "eax", "ecx", "rdi"
+ : "eax", "rbx", "rcx", "rdx", "rdi", "r8"
);
- for (size = 0; size <= height; size++) {
- for (index = start + 1; index < height; index++) {
- line[index] += line[index + 1];
- }
-
- for (index = start + size; index >= start; index--) {
+ // Some code that used results to ensure that optimization is not applied and assembly code is actually executing
+ if(verbose) {
+ for (index = height - 1; index >= 0; index--) {
if(verbose) printf("%u ", line[index]);
}
if(verbose) printf("\n");
-
- start--;
}
if(return_list) {
- PyObject* py_line = PyList_New(height + 1);
- for (index = height; index >= 0; index--) {
+ PyObject* py_line = PyList_New(height);
+ for (index = height - 1; index >= 0; index--) {
PyList_SetItem(py_line, index, PyInt_FromLong((long) line[index]));
}
return py_line;
@@ -62,10 +77,10 @@
static PyMethodDef cpascal_triangle_methods[] = {
- {"c_pascal_triangle_partial_asm", c_pascal_triangle_partial_asm, METH_VARARGS},
+ {"c_pascal_triangle_full_asm", c_pascal_triangle_full_asm, METH_VARARGS},
{NULL, NULL}
};
-void initcpartial_asm(void) {
- (void) Py_InitModule("cpartial_asm", cpascal_triangle_methods);
+void initcfull_asm(void) {
+ (void) Py_InitModule("cfull_asm", cpascal_triangle_methods);
}
TODO¶
- Provide Assembler x86 implementation for 64-bit integer values
- Support Python 3.x (port Python, Cython and C extensions code)
No comments:
Post a Comment