-
Notifications
You must be signed in to change notification settings - Fork 5
/
Copy pathadding-new-primitive-operations.trac
354 lines (287 loc) · 14.8 KB
/
adding-new-primitive-operations.trac
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
= Adding new primitive operations to GHC Haskell: An amateur explains =
Quoting from "The Glasgow Haskell Copiler User's Guide, Version 6.2",
section 7.2, "GHC is built on a raft of primitive data types and
operations".
Here, I walk the reader through the steps of adding a particular
new primitive operation to GHC Haskell,
one that calls the GMP function {{{mpz_powm}}}.
This is for non-.NET platforms, since .NET uses
{{{BigInteger}}} rather than GMP.
I should warn the reader that I, in writing this piece, am a rank
amateur in many of the involved issues, including working with the
GHC compiler, darcs, and trac. Hence, I will often digress in various
directions, reflecting that this is a learning experience for me.
== Why do I want to add new primitive operations to GHC Haskell? ==
My basic interest is computing with rather large integers. This means
integers with hundreds or even thousands of decimal digits.
Haskell's {{{Integer}}} type fits this purpose
perfectly. However, I am ultimately concerned about performance and
some of the more complex algorithms for operating on large integers
require the use of operations, such as shifting, that cannot
be implemented efficiently in standard Haskell.
The {{{Integer}}} operations of GHC Haskell for non-.Net platforms use GMP.
And, actually, GMP includes some of the functions on large integers that I
am looking for. So, adding new primitive operations that call
these GMP functions solves my problem.
== Why do I not use FFI? ==
Quite simply because I have so far been unable to figure out how
to pass {{{Integer}}} operands efficiently between Haskell and C.
I am still, however, in the market for a solution of this nature.
== Why do I need {{{mpz_powm}}}? ==
The GMP {{{mpz_powm}}} function computes "(BASE raised to EXP) modulo MOD"
for some integers BASE, EXP, and MOD. So this is really not the sort
of low-level function (like shifting) that I am looking for. But it
provides a nice example for the present text. And also allows me to
compare my Haskell implementation of the same function with the GMP
implementation and thereby gain some general insight into the overhead
incurred when using Haskell rather than GMP called directly from C.
== Basics: Versions of GHC ==
The GHC that I am working on is ghc-6.4.1-src.tar.bz2 that I
downloaded 2006-March-01. I also found it valuable to consult the
version that I downloaded from the darcs respository on 2006-March-21.
== Steps required to add a new primop ==
The steps are actually described in "ghc/compiler/prelude/primops.txt.pp"
of the 2006-March-21 darcs version of GHC as follows:
{{{
-- To add a new primop, you currently need to update the following files:
--
-- - this file (ghc/compiler/prelude/primops.txt.pp), which includes
-- the type of the primop, and various other properties (its
-- strictness attributes, whether it is defined as a macro
-- or as out-of-line code, etc.)
--
-- - if the primop is inline (i.e. a macro), then:
-- ghc/compiler/AbsCUtils.lhs (dscCOpStmt)
-- defines the translation of the primop into simpler
-- abstract C operations.
--
-- - or, for an out-of-line primop:
-- ghc/includes/StgMiscClosures.h (just add the declaration)
-- ghc/rts/PrimOps.cmm (define it here)
-- ghc/rts/Linker.c (declare the symbol for GHCi)
--
-- - the User's Guide
}}}
The part about {{{ghc/compiler/AbsCUtils.lhs}}} is no longer valid, at least I
cannot figure out the details related to this. But is any case, the
present piece
is about adding {{{Integer}}} primitive operations and they are not inline
(as we will see in a minute), so I don't care right now.
The part about the User's Guide is also not accurate: In "The Glasgow
Haskell Copiler User's Guide, Version 6.2", section 7.2, I read that
There used to be long section about [the primitives] here in the User
Guide, but it became out of date and wrong information is worse than
none.
So that's a relief.
== Digression: How to refer to things? ==
The basic mechanism is, of course, the hypertext link. Unfortunately, I cannot
seem to find a way of referring to things that won't easily break. It is
comforting to know that others appear to have the same problem: I noted the
other day in passing a letter from Simon Marlow (I believe it was) that
explained how he had now eliminated a level (ghc) of the directory tree. As
a result, several links in DebuggingGhcCrashes are now broken.
(And please forgive me here: Although an amateur in many respects, I am a
professional programmer and I have had years of experience dealing with
multiple versions of things, relations that shouldn't break, etc. So I
know that the solution is not easy. But I need some guidance in this case:
Should I just quietly go and correct these few links that are wrong?
The problem is that I know I could find dozens, if not hundreds, of broken
links, so I could spend the rest of my life doing this sort of thing.)
== {{{ghc/compiler/prelude/primops.txt.pp}}} changes ==
This file "should first be preprocessed" and is then "processed by the
utility program genprimopcode to produce a number of include files
within the compiler"
In this file we have, for example, the following:
{{{
primop IntegerQuotOp "quotInteger#" GenPrimOp
Int# -> ByteArr# -> Int# -> ByteArr# -> (# Int#, ByteArr# #)
{Rounds towards zero.}
with out_of_line = True
}}}
which is the basis of the {{{quot::Integer->Integer->Integer}}} function.
A Haskell {{{Integer}}} "is represented as a pair consisting of an {{{Int#}}}
representing the number of 'limbs' in use and the sign, and a
{{{ByteArr#}}} containing the 'limbs' themselves."
Mimicking {{{IntegerQuotOp}}}, we add:
{{{
primop IntegerPowerModOp "powerModInteger#" GenPrimOp
Int# -> ByteArr# -> Int# -> ByteArr# -> Int# -> ByteArr# -> (# Int#, ByteArr# #)
{Base raised to a power modulo a modulus.}
with out_of_line = True
}}}
for our power operation: It needs three {{{Integer}}} operands and returns one
{{{Integer}}} result, hence the type has an {{{Int#}}} and a {{{ByteArr#}}}
for each.
== {{{ghc/includes/StgMiscClosures.h}}} changes ==
In this file, we should just "add the declaration". Hence, we add the line
{{{
RTS_FUN(powerModIntegerzh_fast);
}}}
somewhere suitable in the (alphabetically sorted) list of other symbols.
The name used here ({{{powerModIntegerzh_fast}}}) is what is called
the Z-encoding of the primitive operation {{{powerModInteger#}}}.
A nice description of Z-encoding can be found in DebuggingGhcCrashes.
== {{{ghc/rts/PrimOps.cmm}}} changes ==
This is where the code that actually implements (out of line)
primitive operations go. In this file we find, for example:
{{{
#define GMP_TAKE2_RET1(name,mp_fun) \
name \
{ \
CInt s1, s2; \
W_ d1, d2; \
\
/* call doYouWantToGC() */ \
MAYBE_GC(R2_PTR & R4_PTR, name); \
\
s1 = W_TO_INT(R1); \
d1 = R2; \
s2 = W_TO_INT(R3); \
d2 = R4; \
\
MP_INT__mp_alloc(mp_tmp1) = W_TO_INT(StgArrWords_words(d1)); \
MP_INT__mp_size(mp_tmp1) = (s1); \
MP_INT__mp_d(mp_tmp1) = BYTE_ARR_CTS(d1); \
MP_INT__mp_alloc(mp_tmp2) = W_TO_INT(StgArrWords_words(d2)); \
MP_INT__mp_size(mp_tmp2) = (s2); \
MP_INT__mp_d(mp_tmp2) = BYTE_ARR_CTS(d2); \
\
foreign "C" mpz_init(result1); \
\
/* Perform the operation */ \
foreign "C" mp_fun(result1,mp_tmp1,mp_tmp2); \
\
RET_NP(TO_W_(MP_INT__mp_size(result1)), \
MP_INT__mp_d(result1) - SIZEOF_StgArrWords); \
}
}}}
defining a macro which is called, for example, like this:
{{{
GMP_TAKE2_RET1(quotIntegerzh_fast, mpz_tdiv_q)
}}}
to provide an implementation of the {{{quotInteger#}}} primitive
operation. So we go ahead and define
{{{
#define GMP_TAKE3_RET1(name,mp_fun) \
name \
{ \
CInt s1, s2, s3; \
W_ d1, d2, d3; \
\
/* call doYouWantToGC() */ \
MAYBE_GC(R2_PTR & R4_PTR & R6_PTR, name); \
\
s1 = W_TO_INT(R1); \
d1 = R2; \
s2 = W_TO_INT(R3); \
d2 = R4; \
s3 = W_TO_INT(R5); \
d3 = R6; \
\
MP_INT__mp_alloc(mp_tmp1) = W_TO_INT(StgArrWords_words(d1)); \
MP_INT__mp_size(mp_tmp1) = (s1); \
MP_INT__mp_d(mp_tmp1) = BYTE_ARR_CTS(d1); \
MP_INT__mp_alloc(mp_tmp2) = W_TO_INT(StgArrWords_words(d2)); \
MP_INT__mp_size(mp_tmp2) = (s2); \
MP_INT__mp_d(mp_tmp2) = BYTE_ARR_CTS(d2); \
MP_INT__mp_alloc(mp_tmp3) = W_TO_INT(StgArrWords_words(d3)); \
MP_INT__mp_size(mp_tmp3) = (s3); \
MP_INT__mp_d(mp_tmp3) = BYTE_ARR_CTS(d3); \
\
foreign "C" mpz_init(result1); \
\
/* Perform the operation */ \
foreign "C" mp_fun(result1,mp_tmp1,mp_tmp2,mp_tmp3); \
\
RET_NP(TO_W_(MP_INT__mp_size(result1)), \
MP_INT__mp_d(result1) - SIZEOF_StgArrWords); \
}
}}}
and a call
{{{
GMP_TAKE3_RET1(powerModIntegerzh_fast, mpz_powm)
}}}
(Actually, defining a macro for this single call may seem needless,
but it is not particularly more troublesome, and leaves some reduced
work ahead.)
We have also added
{{{
section "bss" {
mp_tmp3:
bits8 [SIZEOF_MP_INT];
}
}}}
to cater for the (new) situation of having to deal with 3 {{{Integer}}}
operands.
(Caring for efficiency, I am somewhat dismayed by the remark
{{{
/* ToDo: this is shockingly inefficient */
}}}
describing the !PrimOps.cmm implementation of {{{Integer}}} primitive operations.
But that will have to be dealt with later.)
== {{{ghc/rts/Linker.c}}} changes ==
In this file, we should just "declare the symbol for GHCi".
Adding
{{{
SymX(powerModIntegerzh_fast) \
}}}
takes care of that.
== {{{libraries/base/GHC/Num.lhs}}} changes ==
The internal {{{Integer}}} representation is defined here as:
{{{
data Integer
= S# Int# -- small integers
#ifndef ILX
| J# Int# ByteArray# -- large integers
#else
| J# Void BigInteger -- .NET big ints
foreign type dotnet "BigInteger" BigInteger
#endif
}}}
that is, for the non-.NET situation envisioned here, either a single
precision integer or a GMP integer represented by a limb count and a limb
pointer. The {{{Integer}}} functions defined in this file provide an interface
between the primitive operations defined in terms of this internal {{{Integer}}}
representation and the externally visible {{{Integer}}} operations in GHC.Num
(like {{{quotInteger}}}). So, similar to
{{{
quotInteger :: Integer -> Integer -> Integer
quotInteger ia 0
= error "Prelude.Integral.quot{Integer}: divide by 0"
quotInteger a@(S# (-LEFTMOST_BIT#)) b = quotInteger (toBig a) b
quotInteger (S# a) (S# b) = S# (quotInt# a b)
{- Special case disabled, see remInteger above
quotInteger (S# a) (J# sb b)
| sb ==# 1# = S# (quotInt# a (word2Int# (integer2Word# sb b)))
| sb ==# -1# = S# (quotInt# a (0# -# (word2Int# (integer2Word# sb b))))
| otherwise = zeroInteger
-}
quotInteger ia@(S# _) ib@(J# _ _) = quotInteger (toBig ia) ib
quotInteger (J# sa a) (S# b)
= case int2Integer# b of { (# sb, b #) ->
case quotInteger# sa a sb b of (# sq, q #) -> J# sq q }
quotInteger (J# sa a) (J# sb b)
= case quotInteger# sa a sb b of (# sg, g #) -> J# sg g
}}}
we define
{{{
powerModInteger :: Integer -> Integer -> Integer -> Integer
powerModInteger a@(S# _) b c = powerModInteger (toBig a) b c
powerModInteger a b@(S# _) c = powerModInteger a (toBig b) c
powerModInteger a b c@(S# _) = powerModInteger a b (toBig c)
powerModInteger (J# sa a) (J# sb b) (J# sc c)
= case powerModInteger# sa a sb b sc c of (# sr, r #) -> ( J# sr r )
}}}
to gain access to our power modulo operation.
== "Make" the changes ==
'''First: make clean! ''' GHC doesn't correctly track changes to the `GHC.Prim`, so it won't recompile enough things if you just type `make`, so you need to `make clean` first. (strictly speaking you don't need to clean the stage 1 compiler, so if you know what you're doing you might want to try cleaning just the right bits).
Then we {{{make all}}} in the top-level directory. The error message that
we get is
{{{
GHC/Num.lhs:507:9: Not in scope: `powerModInteger#'
}}}
So dependencies are missing. I have not bothered to track down
why this happens in detail. But, guided by wise persons, I simply say
{{{make clean}}} in both {{{ghc/compiler}}} and {{{libraries/base}}}
and that helps: The compiler builds and actually works, including the
newly implemented primitive operation {{{powerModInteger}}}.