Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inexplicable arithmetic overflow ?? #15341

Open
jzakiya opened this issue Jan 13, 2025 · 11 comments · May be fixed by #15347
Open

Inexplicable arithmetic overflow ?? #15341

jzakiya opened this issue Jan 13, 2025 · 11 comments · May be fixed by #15347

Comments

@jzakiya
Copy link

jzakiya commented Jan 13, 2025

I'm encountering a seemingly random arithmetic overflow error message in code that shouldn't overflow.

Below is the code.

def prime_pairs(n, flag = false)
  return puts "Input not even n > 2" unless n.even? && n > 2
  return (pp [n, 1]; (pp [n//2, n//2] if flag)) if n <= 6

  # generate the low-half-residues (lhr) r < n/2
  lhr = 3u64.step(to: n//2, by: 2).select { |r| r if r.gcd(n) == 1 }.to_a
  ndiv2, rhi = n//2, n-2           # set process constants
  lhr_mults = [] of typeof(n)      # for lhr values not part of a pcp

  # store all the powers of the lhr members < n-2
  lhr.each do |r|                  # step thru the lhr members
    r_pwr = r                      # set initial power of r
    break if r > rhi // r_pwr      # exit if r^2 > n-2, as all others are too
    while    r < rhi // r_pwr      # while r^e < n-2
      lhr_mults << (r_pwr *= r)    # store its current power of r
    end
  end

  # store all the cross-products of the lhr members < n-2
  lhr_dup = lhr.dup                # make copy of the lhr members list
  while (r = lhr_dup.shift) && !lhr_dup.empty? # do mults of 1st list r w/others
    ri_max = rhi // r              # ri can't multiply r with values > this
    break if lhr_dup[0] > ri_max   # exit if product of consecutive r’s > n-2
    lhr_dup.each do |ri|           # for each residue in reduced list
      puts "r = #{r}, ri = #{ri}, ri_max = #{ri_max}"
      break if ri > ri_max         # exit for r if cross-product with ri > n-2
      lhr_mults << r * ri          # store value if < n-2
    end                            # check cross-products of next lhr  member
  end

  # convert lhr_mults vals > n/2 to their lhr complements n-r,
  # store them, those < n/2, in lhr_del; it now holds non-pcp lhr vals
  lhr_del = lhr_mults.map { |r_del| r_del > ndiv2 ? n - r_del : r_del }

  pp [n, (lhr.size - lhr_del.uniq.size)]                      # show pcp count
  (print (lhr -= lhr_del).map { |p| [p, n-p] }; puts) if flag # show pcp values
end

def tm; t = Time.monotonic; yield; Time.monotonic - t end     # time code execution

def gen_pcp
  n = (ARGV[0].to_u64 underscore: true)
  flag = ARGV.size > 1 ? true : false
  puts tm { prime_pairs(n, flag)}
end

gen_pcp

I compile it like this.

$ crystal build --release --mcpu native prime_pairs.cr

Run it like this,

$  ./prime_pairs 123_456

The overflow message occurs in this part.

  # store all the cross-products of the lhr members < n-2
  lhr_dup = lhr.dup                # make copy of the lhr members list
  while (r = lhr_dup.shift) && !lhr_dup.empty? # do mults of 1st list r w/others
    ri_max = rhi // r              # ri can't multiply r with values > this
    break if lhr_dup[0] > ri_max   # exit if product of consecutive r’s > n-2
    lhr_dup.each do |ri|           # for each residue in reduced list
      puts "r = #{r}, ri = #{ri}, ri_max = #{ri_max}"
      break if ri > ri_max         # exit for r if cross-product with ri > n-2
      lhr_mults << r * ri          # store value if < n-2
    end                            # check cross-products of next lhr  member
  end

At no time does the math create values > n-2.

Here is truncated output with the print statement for n = 150_000_006 that doesn't overflow.

r = 12241, ri = 12247, ri_max = 12253
r = 12241, ri = 12251, ri_max = 12253
r = 12241, ri = 12253, ri_max = 12253
r = 12241, ri = 12257, ri_max = 12253
r = 12245, ri = 12247, ri_max = 12249
r = 12245, ri = 12251, ri_max = 12249
[150000006, 708844]
00:03:16.280705215

This executes correctly for n = 150_000_006, when ri > ri_max at the end.
You see, ri_max is continually getting smaller.

This overflows for a smaller value of n = 148_000_006, when ri > ri_max at the end.
All I can think of is > is somehow the culprit?

r = 12161, ri = 12167, ri_max = 12170
r = 12161, ri = 12169, ri_max = 12170
r = 12161, ri = 12171, ri_max = 12170
r = 12163, ri = 12165, ri_max = 12168
r = 12163, ri = 12167, ri_max = 12168
r = 12163, ri = 12169, ri_max = 12168
Unhandled exception: Arithmetic overflow (OverflowError)
  from ./prime_pairs1 in '??'
  from ./prime_pairs1 in '??'
  from ./prime_pairs1 in '??'
  from ./prime_pairs1 in '__crystal_main'
  from ./prime_pairs1 in '??'
  from ./prime_pairs1 in 'main'
  from /lib64/libc.so.6 in '??'
  from /lib64/libc.so.6 in '__libc_start_main'
  from ./prime_pairs1 in '_start'
  from ???
➜  ~

For some reason, it bombs for n= 148_000_006 when ri = 12169 > ri_max = 12168

Both these values are way, way less than 148_000_006, so how can there be an overflow?
And it occurs with apparently random values of different sizes (larger|smaller).

The problem occurs using Crystal versions 1.15.0, 1.14.0, 1.13.2 and 1.12.2.

The Ruby version, which is almost code identical, works for all values with no problems.

@jzakiya jzakiya added the kind:bug A bug in the code. Does not apply to documentation, specs, etc. label Jan 13, 2025
@straight-shoota
Copy link
Member

Thanks for this bug report. Unfortunately, it's not clear how to reproduce this issue.
Could you please provide exact instructions with code, build command and runtime arguments for how to reproduce the error.

Also it would help if you could try to reduce the code (i.e. throw away stuff that isn't necessary) and add some instrumentation in to figure out where exactly the exception is thrown. Enabling --debug information could also help for pinpointing the origin.

@Blacksmoke16
Copy link
Member

It's not super clear, but you can reproduce this by:

crystal build --release --mcpu native prime_pairs.cr
./prime_pairs 148_000_006

Which results in the error within a few minutes. I ran it with a locally built compiler, which resulted in this trace:

Unhandled exception: Arithmetic overflow (OverflowError)
  from /Users/george/crystal/src/hash.cr:833:10 in 'indices_malloc_size'
  from /Users/george/crystal/src/hash.cr:828:5 in 'malloc_indices'
  from /Users/george/crystal/src/hash.cr:252:9 in 'initialize:initial_capacity'
  from /Users/george/crystal/src/hash.cr:225:3 in 'new:initial_capacity'
  from /Users/george/crystal/src/set.cr:37:5 in 'initialize'
  from /Users/george/crystal/src/set.cr:36:3 in 'new'
  from /Users/george/crystal/src/set.cr:45:5 in 'new'
  from /Users/george/crystal/src/set.cr:509:5 in 'to_set'
  from /Users/george/crystal/src/array.cr:1908:5 in 'uniq'
  from prime_pairs.cr:35:22 in 'prime_pairs'
  from prime_pairs.cr:40:30 in 'gen_pcp'
  from prime_pairs.cr:49:1 in '__crystal_main'
  from /Users/george/crystal/src/crystal/main.cr:118:5 in 'main_user_code'
  from /Users/george/crystal/src/crystal/main.cr:104:7 in 'main'
  from /Users/george/crystal/src/crystal/system/unix/main.cr:9:3 in 'main'

So seems to actually be a limitation of Hash?

@jzakiya
Copy link
Author

jzakiya commented Jan 13, 2025

Here is a shortened version that runs just the problem section

def test_code(n)
  # generate the low-half-residues (lhr) r < n/2
  lhr = 3u64.step(to: n//2, by: 2).select { |r| r if r.gcd(n) == 1 }.to_a

  # store all the cross-products of the lhr members < n-2
  lhr_dup = lhr.dup                # make copy of the lhr members list
  while (r = lhr_dup.shift) && !lhr_dup.empty? # do mults of 1st list r w/others
    ri_max = (n-2) // r            # ri can't multiply r with values > this
    break if lhr_dup[0] > ri_max   # exit if product of consecutive r’s > n-2
    lhr_dup.each do |ri|           # for each residue in reduced list
      puts "r = #{r}, ri = #{ri}, ri_max = #{ri_max}"
      break if ri > ri_max         # exit for r if cross-product with ri > n-2
    end                            # check cross-products of next lhr  member
  end
end

def tm; t = Time.monotonic; yield; Time.monotonic - t end     # time code execution

def gen_test
  n = (ARGV[0].to_u64 underscore: true)
  puts tm { test_code(n) }
end

gen_test

Sample output.

➜  ~ ./test_code 100                                      
r = 3, ri = 7, ri_max = 32
r = 3, ri = 9, ri_max = 32
r = 3, ri = 11, ri_max = 32
r = 3, ri = 13, ri_max = 32
r = 3, ri = 17, ri_max = 32
r = 3, ri = 19, ri_max = 32
r = 3, ri = 21, ri_max = 32
r = 3, ri = 23, ri_max = 32
r = 3, ri = 27, ri_max = 32
r = 3, ri = 29, ri_max = 32
r = 3, ri = 31, ri_max = 32
r = 3, ri = 33, ri_max = 32
r = 7, ri = 9, ri_max = 14
r = 7, ri = 11, ri_max = 14
r = 7, ri = 13, ri_max = 14
r = 7, ri = 17, ri_max = 14
00:00:00.000062056

@straight-shoota
Copy link
Member

So looks like lhr_del is so big that allocating a Hash to hold its items fails. I'm sure the error message should be improved to clearly point out the issue.

@jzakiya
Copy link
Author

jzakiya commented Jan 13, 2025

Yes, it wasn't an arithmetic overflow, which makes me feel sane again.

I ran test_code with n = 148_000_006 and it ran and completed (like it should have).

Here's truncated test data for it to show it completed without a problem.

r = 12161, ri = 12165, ri_max = 12170
r = 12161, ri = 12167, ri_max = 12170
r = 12161, ri = 12169, ri_max = 12170
r = 12161, ri = 12171, ri_max = 12170
r = 12163, ri = 12165, ri_max = 12168
r = 12163, ri = 12167, ri_max = 12168
r = 12163, ri = 12169, ri_max = 12168
00:07:27.839411344

@straight-shoota straight-shoota added kind:feature and removed kind:bug A bug in the code. Does not apply to documentation, specs, etc. labels Jan 13, 2025
@jzakiya
Copy link
Author

jzakiya commented Jan 14, 2025

FYI
My system is Lenovo Legion 5 Slim laptop, 16GB memory, running Linux.

I'm able to run the Ruby version for n = 148_000_006 for Ruby 3.4.1|3.3.6, JRuby 9.4.9.0 (Truffleruby 24.1.1 gives out of mem ).

Ruby 3.4.1 (Dec 25, 2025 release)

➜  ~ ruby prime_pairs.rb 148_000_006

[148000006, 436612]
42.623468019


Ruby 3.3.6

➜  ~ ruby prime_pairs.rb 148_000_006 

[148000006, 436612]
45.15575637

JRuby 9.4.9.0

➜  ~ ruby -J-Xmx14000M prime_pairs.rb 148_000_006 

[148000006, 436612]
108.151445

So 16GB was enough for them to run this in.

Here's the almost identical Ruby code.

def prime_pairs(n, flag = false)
  return puts "Input not even n > 2" unless n.even? && n > 2
  return (pp [n, 1]; (pp [n/2, n/2] if flag)) if n <= 6

  # generate the low-half-residues (lhr) r < n/2
  lhr = 3.step(n/2, 2).select { |r| r if r.gcd(n) == 1 }
  ndiv2, rhi = n/2, n-2            # set process constants
  lhr_mults = []                   # for lhr values not part of a pcp

  # store all the powers of the lhr members < n-2
  lhr.each do |r|                  # step thru the lhr members
    r_pwr = r                      # set to first power of r
    break if r > rhi / r_pwr       # exit if r^2 > n-2, as all others are too
    while    r < rhi / r_pwr       # while r^e < n-2
      lhr_mults << (r_pwr *= r)    # store its current power of r
    end
  end

  # store all the cross-products of the lhr members < n-2
  lhr_dup = lhr.dup                # make copy of the lhr members list
  while (r = lhr_dup.shift) && !lhr_dup.empty? # do mults of 1st list r w/others
    ri_max = rhi / r               # ri can't multiply r with values > this
    break if lhr_dup[0] > ri_max   # exit if product of consecutive r’s > n-2
    lhr_dup.each do |ri|           # for each residue in reduced list
      break if ri > ri_max         # exit for r if cross-product with ri > n-2
      lhr_mults << r * ri          # store value if < n-2
    end                            # check cross-products of next lhr member
  end

  # convert lhr_mults vals > n/2 to their lhr complements n-r,
  # store them, those < n/2, in lhr_del; it now holds non-pcp lhr vals
  lhr_del = lhr_mults.map { |r_del| r_del > ndiv2 ? n - r_del : r_del }

  pp [n, (lhr.size - lhr_del.uniq.size)]                      # show pcp count
  (print (lhr -= lhr_del).map { |p| [p, n-p] }; puts) if flag # show pcp valuess
end

def tm; t = Time.now; yield; Time.now - t end                 # time code execution

n = ARGV[0].to_i
flag = ARGV.size > 1 ? true : false
puts tm { prime_pairs(n, flag) }   # show execution runtime as last output

I would think if there's enough memory for Ruby to run and completer then so should Crystal.

@straight-shoota
Copy link
Member

straight-shoota commented Jan 14, 2025

The exception raises on the multiplication 536870912 * 4, which would be 2147483648. But that's Int32::MAX + 1, hence it doesn't fit into the default number type.
So the underlying issue is that collection types in Crystal are limited to the size of Int32 (cf. #4011).

This works fine in Ruby because Ruby's Integer type uses arbitrary precision, similar to BigInt in Crystal (and potentially, the Hash implementation works differently).

@ysbaddaden
Copy link
Contributor

More specifically: you can only address as many as Int32::MAX entries in a stdlib collection in Crystal (be it Slice, Array, Hash, ...).

You can allocate more than Int32::MAX bytes in a collection but the maximum contiguous memory will always be Int32::MAX * sizeof(T).

To overcome this limitation, you need specific types (or wrapper types).

@straight-shoota
Copy link
Member

straight-shoota commented Jan 14, 2025

Hm, actually it seems we can enable this by performing the multiplication in indices_malloc_size in a larger type:

@@ -831,5 +831,5 @@ class Hash(K, V)
   # The actual number of bytes needed to allocate `@indices`.
   private def indices_malloc_size(size)
-    size * @indices_bytesize
+    size.to_u32 * @indices_bytesize
    end

This works because @indices is not a collection type. It's just a pointer and uses pointer arithmetics directly. The actual number of collection entries (536870912) is way below Int32::MAX.

@ysbaddaden
Copy link
Contributor

Oh right. I guess we could even go for size.to_u64 since it's the byte allocation size 🤔

@crysbot
Copy link

crysbot commented Jan 14, 2025

This issue has been mentioned on Crystal Forum. There might be relevant details there:

https://forum.crystal-lang.org/t/goldbachs-conjecture-and-crystal-code/7579/18

ysbaddaden added a commit to ysbaddaden/crystal that referenced this issue Jan 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants