-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SEGSEGV during goroutine switch while utils.int32MaxMinAVX2 is running #279
Comments
Is it consistently reproducible in any way (perhaps running it N times in a loop until it happens?) If so are you able to share the parquet file you're using and any code? You could also try running with the I'll take a look into this in the morning, it's really odd that locking the OS thread fixes it. Maybe something weird in the register usage? |
That seemed to drop performance rather significantly, which is primary in my use-case. I don't mind managing the os thread locking personally, but it felt hacky enough that I thought double checking might be a good idea. I'm unfortunately not at liberty to give the parquet files away without checking with some other people first. There's nested columns in some cases, but the leaves are the only thing that are nullable I wrote a simplified version of the code that's being used that does essentially the same thing. const batchSize = 4096
var errWrongType = errors.New("enountered column of wrong type")
func ParseFloatCol(rdr *file.Reader, colName string) ([]float64, error) {
idx := rdr.MetaData().Schema.ColumnIndexByName(colName)
floatBatch := [batchSize]float64{}
defLevels := [batchSize]int16{}
current := int64(0)
result := make([]float64, rdr.NumRows())
for i := range rdr.NumRowGroups() {
rgr := rdr.RowGroup(i)
col, err := rgr.Column(idx)
if err != nil {
return nil, err
}
f64Col, ok := col.(*file.Float64ColumnChunkReader)
if !ok {
return nil, errWrongType
}
total, _, err := f64Col.ReadBatch(batchSize, floatBatch[:], defLevels[:], nil)
if err != nil {
return nil, err
}
batchIdx := 0
maxDefLevel := col.Descriptor().MaxDefinitionLevel()
for i := range total {
if defLevels[i] == maxDefLevel {
result[current+i] = floatBatch[batchIdx]
batchIdx++
} else {
result[current+i] = math.NaN()
}
}
}
return result, nil
} |
Can you give me any information on the Parquet file (number of rows/rowgroups/rows per rowgroup? number of columns?) Since you can't share the file, I have a few ideas that might affect things if you are able to use a local version of the library and make some code changes and test them out. For instance:
func int32MaxMinAVX2(values []int32) (int32, int32) {
var min, max int32
_int32_max_min_avx2(unsafe.Pointer(unsafe.SliceData(values)), len(values), unsafe.Pointer(&min), unsafe.Pointer(&max))
return min, max
} maybe the compiler is doing something weird with registers because of the named returns? What version of Go are you using? |
Further update, the We're using go 1.23. I found this issue which seems eerily similar, except for the cgo usage. I don't know how the bp would get nil'd out without it, but the assembly function is using that register, so just in case I updated to go 1.24 (needed an excuse to do that anyways). Just in case the compiler is doing something screwy that nobody was expecting. I'm going to monitor for a bit to see if I can find another example (we're down to ~1 per day for millions of requests at the moment). As much info on the parquet as I can share:
I'll see what I can do about getting a local copy of the library used. Unfortunately my dev platform is darwin/arm64, so I'm not confident of my ability to reproduce this exact scenario locally. If a deploy/wait cycle becomes too long, I'm just going to spin up something that calls the min/max function in a loop on as many goroutines as possible and leave it running for a while (assuming moving past go 1.23.0 doesn't eliminate this). |
Describe the bug, including details regarding any error messages, version, and platform.
I'm using this library to parse individual columns out of parquet files that have a dynamic schema into a struct of arrays.
This library works really nicely for that, and the examples + some debugging were enough for me to use the
parquet
package.I ran into an issue, however. I had several runtime exits that seem to happen when goroutine switches happened while
utils.int32MaxMinAVX2
is running. I gave the underlying assembly function (convenience link) a once over and it seemed fine. Thought it might be the lack of NOSPLIT, but it seems like you're not actually using anything on the stack save for the function arguments.I'm only seeing this on linux/amd64, but without a way to consistently trigger switches while the function is running, I wouldn't bet my life that it's the only place this is happening.
I managed to get around it with a
runtime.LockOSThread
call, and that seems to have solved the issue for now. There's not a lot of documentation in that package that I could find. I was wondering if this implied I was doing something wrong.I've included a cleaned stack trace of what I'm getting whenever this happens, it always seems to be during an
IsValid
call.Component(s)
Parquet
The text was updated successfully, but these errors were encountered: