Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error summary: ChecksumException: Checksum error #3584

Open
iris-garden opened this issue May 6, 2024 · 0 comments
Open

Error summary: ChecksumException: Checksum error #3584

iris-garden opened this issue May 6, 2024 · 0 comments
Labels
discourse migrated from discuss.hail.is

Comments

@iris-garden
Copy link
Owner

Note

The following post was exported from discuss.hail.is, a forum for asking questions about Hail which has since been deprecated.

(Jan 11, 2024 at 02:19) hipark said:

Hi All,

I’m currently facing issues with a ChecksumException error during interval filtering.
Interestingly, this problem has also surfaced during other processing steps such as table annotation.

I’m wondering if there’s an option to bypass the checksum steps or if anyone has a solution for this particular error.

Details provided below.
Thank you.

code

kor_cm_tb = hl.read_table(korcm_tb_dir) 
kor_cm_tb = kor_cm_tb.key_by('locus','alleles')

first_intervals = ['chr1', 'chr2', 'chr3', 'chr4']
kor_cm_tb1 = hl.filter_intervals(kor_cm_tb,  [hl.parse_locus_interval(x,) for x in first_intervals])
print('kor_cm_tb1 count: ',kor_cm_tb1.count())

error message summary

Error summary: ChecksumException: Checksum error: /home01/k099a02/kor_retro/Inputs/kor_retro_cm_231128_tidy.ht/rows/parts/part-082-43d5dd77-36f0-41c5-885d-21ff349ad590 at 79933440 exp: 993925245 got: 1930839029

full error messages

Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Running on Apache Spark version 3.3.4
SparkUI available at http://cpu64-only-001:4040
Welcome to
     __  __     <>__
    / /_/ /__  __/ /
   / __  / _ `/ / /
  /_/ /_/\_,_/_/_/   version 0.2.115-10932c754edb
LOGGING: writing to /home01/k099a02/kor_retro/log/hail_231128_cm_test3.log
Traceback (most recent call last):                              (35 + 60) / 199]
  File "/home01/k099a02/script/kor_retro/kor_retro_cm.py", line 134, in <module>
    print('kor_cm_tb1 count: ',kor_cm_tb1.count())
  File "/home01/k099a02/.conda/envs/test2/lib/python3.7/site-packages/hail/table.py", line 434, in count
    return Env.backend().execute(ir.TableCount(self._tir))
  File "/home01/k099a02/.conda/envs/test2/lib/python3.7/site-packages/hail/backend/py4j_backend.py", line 82, in execute
    raise e.maybe_user_error(ir) from None
  File "/home01/k099a02/.conda/envs/test2/lib/python3.7/site-packages/hail/backend/py4j_backend.py", line 76, in execute
    result_tuple = self._jbackend.executeEncode(jir, stream_codec, timed)
  File "/home01/k099a02/.conda/envs/test2/lib/python3.7/site-packages/py4j/java_gateway.py", line 1322, in __call__
    answer, self.gateway_client, self.target_id, self.name)
  File "/home01/k099a02/.conda/envs/test2/lib/python3.7/site-packages/hail/backend/py4j_backend.py", line 35, in deco
    raise fatal_error_from_java_error_triplet(deepest, full, error_id) from None
hail.utils.java.FatalError: ChecksumException: Checksum error: /home01/k099a02/kor_retro/Inputs/kor_retro_cm_231128_tidy.ht/rows/parts/part-082-43d5dd77-36f0-41c5-885d-21ff349ad590 at 79933440 exp: 993925245 got: 1930839029

Java stack trace:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 82 in stage 0.0 failed 1 times, most recent failure: Lost task 82.0 in stage 0.0 (TID 82) (cpu64-only-001 executor driver): org.apache.hadoop.fs.ChecksumException: Checksum error: /home01/k099a02/kor_retro/Inputs/kor_retro_cm_231128_tidy.ht/rows/parts/part-082-43d5dd77-36f0-41c5-885d-21ff349ad590 at 79933440 exp: 993925245 got: 1930839029
	at org.apache.hadoop.fs.FSInputChecker.verifySums(FSInputChecker.java:347)
	at org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:303)
	at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:252)
	at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:197)
	at java.base/java.io.DataInputStream.read(DataInputStream.java:149)
	at is.hail.io.fs.HadoopFS$$anon$2.read(HadoopFS.scala:55)
	at java.base/java.io.DataInputStream.read(DataInputStream.java:149)
	at is.hail.utils.richUtils.RichInputStream$.readRepeatedly$extension0(RichInputStream.scala:21)
	at is.hail.utils.richUtils.RichInputStream$.readFully$extension1(RichInputStream.scala:12)
	at is.hail.io.StreamBlockInputBuffer.readBlock(InputBuffers.scala:550)
	at is.hail.io.LZ4InputBlockBuffer.readBlock(InputBuffers.scala:584)
	at is.hail.io.BlockingInputBuffer.readBlock(InputBuffers.scala:382)
	at is.hail.io.BlockingInputBuffer.ensure(InputBuffers.scala:388)
	at is.hail.io.BlockingInputBuffer.skipDouble(InputBuffers.scala:499)
	at is.hail.io.LEB128InputBuffer.skipDouble(InputBuffers.scala:270)
	at __C485stream_Let.__m507SKIP_o_float64(Emit.scala)
	at __C485stream_Let.__m497DECODE_r_struct_of_r_struct_of_r_binaryANDr_int32ENDANDr_array_of_r_binaryANDo_binaryANDo_binaryANDr_binaryANDo_binaryANDo_binaryANDo_binaryANDo_binaryANDr_binaryANDr_int32ANDo_binaryANDr_float64ANDo_int32ANDo_int32ANDo_float64ANDo_int32ANDo_float64ANDo_int32ANDo_float64ANDo_float64ANDo_binaryANDo_binaryANDo_binaryANDo_binaryANDo_float64ANDo_binaryANDo_binaryANDo_array_of_o_struct_of_o_binaryANDo_binaryENDANDo_binaryANDo_binaryANDo_binaryANDo_binaryANDo_binaryANDo_int32ANDo_binaryANDo_binaryANDo_float64ANDo_binaryANDo_float64ANDo_binaryANDo_int32ANDo_int32ANDo_int32ANDo_binaryANDo_binaryANDo_binaryANDo_float64ANDo_float64ANDo_float64ANDo_float64ANDo_int32ANDo_float64ANDo_int32ANDo_float64ANDo_int32ANDo_binaryANDo_binaryEND_TO_SBaseStructPointer(Emit.scala)
	at __C485stream_Let.apply(Emit.scala)
	at is.hail.expr.ir.CompileIterator$$anon$2.step(Compile.scala:303)
	at is.hail.expr.ir.CompileIterator$LongIteratorWrapper.hasNext(Compile.scala:156)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490)
	at is.hail.rvd.RVDPartitionInfo$.$anonfun$apply$1(RVDPartitionInfo.scala:70)
	at is.hail.utils.package$.using(package.scala:635)
	at is.hail.rvd.RVDPartitionInfo$.apply(RVDPartitionInfo.scala:42)
	at is.hail.rvd.RVD$.$anonfun$getKeyInfo$2(RVD.scala:1049)
	at is.hail.rvd.RVD$.$anonfun$getKeyInfo$2$adapted(RVD.scala:1047)
	at is.hail.sparkextras.ContextRDD.$anonfun$crunJobWithIndex$1(ContextRDD.scala:242)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:136)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:834)

Driver stacktrace:
	at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2668)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2604)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2603)
	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2603)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1178)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1178)
	at scala.Option.foreach(Option.scala:407)
	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1178)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2856)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2798)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2787)
	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
	at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:952)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2238)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2259)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2291)
	at is.hail.sparkextras.ContextRDD.crunJobWithIndex(ContextRDD.scala:238)
	at is.hail.rvd.RVD$.getKeyInfo(RVD.scala:1047)
	at is.hail.rvd.RVD$.makeCoercer(RVD.scala:1122)
	at is.hail.rvd.RVD$.coerce(RVD.scala:1078)
	at is.hail.rvd.RVD.changeKey(RVD.scala:142)
	at is.hail.rvd.RVD.changeKey(RVD.scala:135)
	at is.hail.backend.spark.SparkBackend.lowerDistributedSort(SparkBackend.scala:735)
	at is.hail.backend.Backend.lowerDistributedSort(Backend.scala:100)
	at is.hail.expr.ir.lowering.LowerAndExecuteShuffles$.$anonfun$apply$1(LowerAndExecuteShuffles.scala:23)
	at is.hail.expr.ir.RewriteBottomUp$.$anonfun$apply$4(RewriteBottomUp.scala:26)
	at is.hail.utils.StackSafe$More.advance(StackSafe.scala:60)
	at is.hail.utils.StackSafe$.run(StackSafe.scala:16)
	at is.hail.utils.StackSafe$StackFrame.run(StackSafe.scala:32)
	at is.hail.expr.ir.RewriteBottomUp$.apply(RewriteBottomUp.scala:36)
	at is.hail.expr.ir.lowering.LowerAndExecuteShuffles$.apply(LowerAndExecuteShuffles.scala:20)
	at is.hail.expr.ir.lowering.LowerAndExecuteShufflesPass.transform(LoweringPass.scala:157)
	at is.hail.expr.ir.lowering.LoweringPass.$anonfun$apply$3(LoweringPass.scala:16)
	at is.hail.utils.ExecutionTimer.time(ExecutionTimer.scala:81)
	at is.hail.expr.ir.lowering.LoweringPass.$anonfun$apply$1(LoweringPass.scala:16)
	at is.hail.utils.ExecutionTimer.time(ExecutionTimer.scala:81)
	at is.hail.expr.ir.lowering.LoweringPass.apply(LoweringPass.scala:14)
	at is.hail.expr.ir.lowering.LoweringPass.apply$(LoweringPass.scala:13)
	at is.hail.expr.ir.lowering.LowerAndExecuteShufflesPass.apply(LoweringPass.scala:151)
	at is.hail.expr.ir.lowering.LoweringPipeline.$anonfun$apply$1(LoweringPipeline.scala:22)
	at is.hail.expr.ir.lowering.LoweringPipeline.$anonfun$apply$1$adapted(LoweringPipeline.scala:20)
	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
	at is.hail.expr.ir.lowering.LoweringPipeline.apply(LoweringPipeline.scala:20)
	at is.hail.expr.ir.lowering.EvalRelationalLets$.execute$1(EvalRelationalLets.scala:10)
	at is.hail.expr.ir.lowering.EvalRelationalLets$.lower$1(EvalRelationalLets.scala:18)
	at is.hail.expr.ir.lowering.EvalRelationalLets$.apply(EvalRelationalLets.scala:37)
	at is.hail.expr.ir.lowering.EvalRelationalLetsPass.transform(LoweringPass.scala:147)
	at is.hail.expr.ir.lowering.LoweringPass.$anonfun$apply$3(LoweringPass.scala:16)
	at is.hail.utils.ExecutionTimer.time(ExecutionTimer.scala:81)
	at is.hail.expr.ir.lowering.LoweringPass.$anonfun$apply$1(LoweringPass.scala:16)
	at is.hail.utils.ExecutionTimer.time(ExecutionTimer.scala:81)
	at is.hail.expr.ir.lowering.LoweringPass.apply(LoweringPass.scala:14)
	at is.hail.expr.ir.lowering.LoweringPass.apply$(LoweringPass.scala:13)
	at is.hail.expr.ir.lowering.EvalRelationalLetsPass.apply(LoweringPass.scala:141)
	at is.hail.expr.ir.lowering.LoweringPipeline.$anonfun$apply$1(LoweringPipeline.scala:22)
	at is.hail.expr.ir.lowering.LoweringPipeline.$anonfun$apply$1$adapted(LoweringPipeline.scala:20)
	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
	at is.hail.expr.ir.lowering.LoweringPipeline.apply(LoweringPipeline.scala:20)
	at is.hail.expr.ir.CompileAndEvaluate$._apply(CompileAndEvaluate.scala:50)
	at is.hail.backend.spark.SparkBackend._execute(SparkBackend.scala:463)
	at is.hail.backend.spark.SparkBackend.$anonfun$executeEncode$2(SparkBackend.scala:499)
	at is.hail.backend.ExecuteContext$.$anonfun$scoped$3(ExecuteContext.scala:75)
	at is.hail.utils.package$.using(package.scala:635)
	at is.hail.backend.ExecuteContext$.$anonfun$scoped$2(ExecuteContext.scala:75)
	at is.hail.utils.package$.using(package.scala:635)
	at is.hail.annotations.RegionPool$.scoped(RegionPool.scala:17)
	at is.hail.backend.ExecuteContext$.scoped(ExecuteContext.scala:63)
	at is.hail.backend.spark.SparkBackend.withExecuteContext(SparkBackend.scala:351)
	at is.hail.backend.spark.SparkBackend.$anonfun$executeEncode$1(SparkBackend.scala:496)
	at is.hail.utils.ExecutionTimer$.time(ExecutionTimer.scala:52)
	at is.hail.backend.spark.SparkBackend.executeEncode(SparkBackend.scala:495)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
	at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
	at java.base/java.lang.Thread.run(Thread.java:834)

org.apache.hadoop.fs.ChecksumException: Checksum error: /home01/k099a02/kor_retro/Inputs/kor_retro_cm_231128_tidy.ht/rows/parts/part-082-43d5dd77-36f0-41c5-885d-21ff349ad590 at 79933440 exp: 993925245 got: 1930839029
	at org.apache.hadoop.fs.FSInputChecker.verifySums(FSInputChecker.java:347)
	at org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:303)
	at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:252)
	at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:197)
	at java.base/java.io.DataInputStream.read(DataInputStream.java:149)
	at is.hail.io.fs.HadoopFS$$anon$2.read(HadoopFS.scala:55)
	at java.base/java.io.DataInputStream.read(DataInputStream.java:149)
	at is.hail.utils.richUtils.RichInputStream$.readRepeatedly$extension0(RichInputStream.scala:21)
	at is.hail.utils.richUtils.RichInputStream$.readFully$extension1(RichInputStream.scala:12)
	at is.hail.io.StreamBlockInputBuffer.readBlock(InputBuffers.scala:550)
	at is.hail.io.LZ4InputBlockBuffer.readBlock(InputBuffers.scala:584)
	at is.hail.io.BlockingInputBuffer.readBlock(InputBuffers.scala:382)
	at is.hail.io.BlockingInputBuffer.ensure(InputBuffers.scala:388)
	at is.hail.io.BlockingInputBuffer.skipDouble(InputBuffers.scala:499)
	at is.hail.io.LEB128InputBuffer.skipDouble(InputBuffers.scala:270)
	at __C485stream_Let.__m507SKIP_o_float64(Emit.scala)
	at __C485stream_Let.__m497DECODE_r_struct_of_r_struct_of_r_binaryANDr_int32ENDANDr_array_of_r_binaryANDo_binaryANDo_binaryANDr_binaryANDo_binaryANDo_binaryANDo_binaryANDo_binaryANDr_binaryANDr_int32ANDo_binaryANDr_float64ANDo_int32ANDo_int32ANDo_float64ANDo_int32ANDo_float64ANDo_int32ANDo_float64ANDo_float64ANDo_binaryANDo_binaryANDo_binaryANDo_binaryANDo_float64ANDo_binaryANDo_binaryANDo_array_of_o_struct_of_o_binaryANDo_binaryENDANDo_binaryANDo_binaryANDo_binaryANDo_binaryANDo_binaryANDo_int32ANDo_binaryANDo_binaryANDo_float64ANDo_binaryANDo_float64ANDo_binaryANDo_int32ANDo_int32ANDo_int32ANDo_binaryANDo_binaryANDo_binaryANDo_float64ANDo_float64ANDo_float64ANDo_float64ANDo_int32ANDo_float64ANDo_int32ANDo_float64ANDo_int32ANDo_binaryANDo_binaryEND_TO_SBaseStructPointer(Emit.scala)
	at __C485stream_Let.apply(Emit.scala)
	at is.hail.expr.ir.CompileIterator$$anon$2.step(Compile.scala:303)
	at is.hail.expr.ir.CompileIterator$LongIteratorWrapper.hasNext(Compile.scala:156)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490)
	at is.hail.rvd.RVDPartitionInfo$.$anonfun$apply$1(RVDPartitionInfo.scala:70)
	at is.hail.utils.package$.using(package.scala:635)
	at is.hail.rvd.RVDPartitionInfo$.apply(RVDPartitionInfo.scala:42)
	at is.hail.rvd.RVD$.$anonfun$getKeyInfo$2(RVD.scala:1049)
	at is.hail.rvd.RVD$.$anonfun$getKeyInfo$2$adapted(RVD.scala:1047)
	at is.hail.sparkextras.ContextRDD.$anonfun$crunJobWithIndex$1(ContextRDD.scala:242)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:136)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:834)




Hail version: 0.2.115-10932c754edb
Error summary: ChecksumException: Checksum error: /home01/k099a02/kor_retro/Inputs/kor_retro_cm_231128_tidy.ht/rows/parts/part-082-43d5dd77-36f0-41c5-885d-21ff349ad590 at 79933440 exp: 993925245 got: 1930839029
@iris-garden iris-garden added the discourse migrated from discuss.hail.is label May 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discourse migrated from discuss.hail.is
Projects
None yet
Development

No branches or pull requests

1 participant