-
Notifications
You must be signed in to change notification settings - Fork 404
java.lang.ClassCastException: java.lang.Object cannot be cast to java.lang.String #353
Comments
This is what I see in browser:
|
This is larger log fragment:
|
After restarting serving layer bug gone away. (temporary?) |
But some time later error is back.... |
BTW you can just post a small relevant snippet of logs, rather than a huge section of them. |
I don't see a workaround. How can I diagnose this to help you? |
I honestly could not figure out how it could possibly happen. Any hint about a reproduction would help a lot. |
Collection is assumed to have only Strings inside, but having Object provokes crash, right? I propose to supply some logging to all places which put objects into this collection and check data type. Can you do it? |
I don't know how to reproduce it. I can send you hdfs. |
I've observed kafka was dead (probably because /tmp run out of space), so may be kafka sent some object, which is not String but rather error message which was serialised to the collection. |
I've also observed the problem without hdd overflow. My guess is that I was calling /ingest from multiple threads, so that data streams of different blocks has collided. Do you have synchronized block on /ingest handler? |
I don't think it's related to ingest directly; that's not where the issue occurs. It's actually in the path that reads updates from the speed layer. You can check the source code, but yes the data structure here is protected by a read/write lock. I checked it again and can't see any issue. It's not a concurrent modification issue, but somehow a raw Object gets in here, where nothing in the whole framework should use or allocate an Object. No idea how it happens. You're right that it's worth looking at how values are read from Kafka and parsed, but anything that I can imagine causing an Object to be created would cause other failures first. I think an internal array is resized somewhere with the wrong type or something. Truly puzzling, I looked at this for a long time and could never reproduce it or guess the problem. |
So what about adding checking code? |
I've tried lots of that sort of thing before and never been able to reproduce it locally. Other reporters couldn't reproduce it even with similar debug code. |
That is something else; best not to mix issues in one thread. |
I've meditated a bit on FeatureVectorsPartitions and MutableLHashParallelKVObjObjMapGO a bit and I have hypothesis explaining whats going on during crash. Crash happens near this line: if ((key = (K) tab[i]) != FREE) { and you can see from MutableLHashParallelKVObjObjMapGO code that inside this data structure they keep data in table[] array like this: [K1,V1,K2,V2,...]. Due to some unknown reason it turns out that key gets assigned from tab[i], while tab[i] is float[]. Generics are removed after compilation, so inside .class file key is just Object. On the next line we are about to construct MutableEntry object and java tries to cast key to String, but this is impossible, because it contains float[]. Now, the question is what might have caused this data corruption of this hashtable? I don't know. It may be an algorithmic bug, which happens under some special circumstances. Or it may be concurrency problem (I see that all calls to vectors are under lock, but i don't have 100% understanding of the code). So I'd propose simply to replace this hashmap with traditional java HashMap or even concurrent hashmap, which doesn't have this inner problems. What do you think? |
Objects in the JVM still have a type at runtime, after erasure, and an Object is not the same type as a float[]. The exception shows the actual runtime type of the object, and it's an Object, not a float[] or String. I don't think a float[] can be put into this data structure; it would cause a compile-time error or require casting to happen, and I don't see that. An Object is typically only used as a marker object in internal data structures. There's little reason to allocate an Object otherwise. It's likely somehow related to this, that the internal state is corrupted an a special marker object is treated as data. I can't see how it happens yet nor see if it's a client-side problem or library side, if so. |
HI! Yes, this is a piece of code where this Object I think is created: static final Object REMOVED = new Object(), FREE = new Object(); and these FREE/REMOVED are used to identify something in tab[] |
This issue is bug inside this collection, I propose to replace it with regular hashmap: |
essentially it looks like hashtable contains REMOVED entry and when you try to removeIf, then removeIf has no check on REMOVED, they only check on FREE, which looks like a bug. Think about removing these unsafe collections completely, they have not been tested under really heavy load and are not supported since 2016.... |
Hey good find. That does quite sound like the same problem. We can change the implementation, though it will cost some performance. I'll have a look later and see if there's any other way to work around this. |
Sometimes I observe crashes like this:
The text was updated successfully, but these errors were encountered: