Rename `main` branch #3

dey4ss · 2025-02-06T09:15:06Z

This PR adapts the GH action to the renamed main branch, adds a comparison of generated data and query parameters to the action, and fixes small compiler warnings encountered with gcc-13.

Note on the change in `phash.c`

jcch-dbgen/skew/phash.c

Lines 85 to 91 in 55da3ba

    
           // Hyrise: cast subtrahend as long. Otherwise, we got inconsistent offsets on macOS/clang/ARM. E.g., for 
        
           // `key = 79`, `tbl_size = 100`, we got `row = 3`, `(0.18 + row * 0.2) * tbl_size) = 78.0`, but `offset = 0` 
        
           // (instead of 1). With the cast, data is consistent across our tested systems. 
        
           // Note that this means we have a few customers/suppliers in the dataset (5 for SF 0.01, 8 for SFs 1 and 10) 
        
           // that have a different `nationkey` compared to the version without the cast. We chose to trade off these 
        
           // negligible discrepancies for consistency. 
        
           long offset = key - (long)((0.18 + row * 0.2) * tbl_size);

I generated some data and added a comparison to the GH action. Thus, I noticed that there were discrepancies between my machine (+ GH macOS runners) and the GH Ubuntu runners (+ nemea). I am not 100% sure if it's an STL or a CPU architecture thing.

The generated data is not 100% accurate compared to the current master branch BUT at least it is consistent now. For the customer/supplier tuples in the skewed dataset, we introduce a systematic bias in the nationkey assignment, but it is not clear if that error also favors a higher bin here:

jcch-dbgen/skew/phash.c

Lines 62 to 67 in a7cfdd2

    
           static uint16_t nations_map[25] = /* mapping between countries and their keys */ 
        
           {15, 0, 5, 14, 16,	/* AFRICA		(MOROCCO | ALGERIA, ETHIOPIA, KENYA, MOZAMBIQUE) */ 
        
           24, 1, 2, 3, 17, 	/* AMERICA		(UNITED STATES | ARGENTINA, BRAZIL, CANADA, PERU)*/ 
        
           18, 9, 12, 21, 8, 	/* ASIA 		(CHINA | INDONESIA, JAPAN, VIETNAM, INDIA)*/ 
        
           7, 6, 22, 19, 23, 	/* EUROPE 		(GERMANY | FRANCE, RUSSIA, ROMANIA, UNITED KINGDOM*/ 
        
           4, 10, 11, 13, 20};	/* MIDDLE EAST 	(EGYPT | IRAN, IRAQ, JORDAN, SAUDI ARABIA)*/

However, as it is a consistant bias affecting all tuples/nationkeys the same, it should not make a difference in the end.

An alternative to avoid the systematic error would be to use lround(). The results are also consistent across the systems. The differences for both compared to the master are the same: 5 tuples for SF 0.01, 8 tuples for SFs 1 and 10.

skew/phash.c

dey4ss added 7 commits February 6, 2025 10:12

update workflow, test generation

219ab5c

fix miscalculation under macOS

f09f1c2

silence warnings of gcc-13

e80dfd6

remove RNG_TEST flag in Makefile

3fe06d6

refine comment

55da3ba

-.-

90e7241

let CI fail on compiler warnings

57c7006

dey4ss requested a review from Bouncner February 12, 2025 14:08

dey4ss added 2 commits February 12, 2025 15:12

output condensed diffs

05e1845

add comment for removing test flag

e4f037d

Bouncner reviewed Feb 12, 2025

View reviewed changes

skew/phash.c Outdated Show resolved Hide resolved

Bouncner approved these changes Feb 12, 2025

View reviewed changes

case

db20a48

dey4ss changed the title ~~Prepare branch renaming~~ Rename main branch Feb 12, 2025

dey4ss merged commit e6c7df6 into main Feb 12, 2025
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rename `main` branch #3

Rename `main` branch #3

dey4ss commented Feb 6, 2025 •

edited

Loading

	// Hyrise: cast subtrahend as long. Otherwise, we got inconsistent offsets on macOS/clang/ARM. E.g., for
	// `key = 79`, `tbl_size = 100`, we got `row = 3`, `(0.18 + row * 0.2) * tbl_size) = 78.0`, but `offset = 0`
	// (instead of 1). With the cast, data is consistent across our tested systems.
	// Note that this means we have a few customers/suppliers in the dataset (5 for SF 0.01, 8 for SFs 1 and 10)
	// that have a different `nationkey` compared to the version without the cast. We chose to trade off these
	// negligible discrepancies for consistency.
	long offset = key - (long)((0.18 + row * 0.2) * tbl_size);

	static uint16_t nations_map[25] = /* mapping between countries and their keys */
	{15, 0, 5, 14, 16, /* AFRICA (MOROCCO \| ALGERIA, ETHIOPIA, KENYA, MOZAMBIQUE) */
	24, 1, 2, 3, 17, /* AMERICA (UNITED STATES \| ARGENTINA, BRAZIL, CANADA, PERU)*/
	18, 9, 12, 21, 8, /* ASIA (CHINA \| INDONESIA, JAPAN, VIETNAM, INDIA)*/
	7, 6, 22, 19, 23, /* EUROPE (GERMANY \| FRANCE, RUSSIA, ROMANIA, UNITED KINGDOM*/
	4, 10, 11, 13, 20}; /* MIDDLE EAST (EGYPT \| IRAN, IRAQ, JORDAN, SAUDI ARABIA)*/

Rename main branch #3

Rename main branch #3

Conversation

dey4ss commented Feb 6, 2025 • edited Loading

Note on the change in phash.c

Rename `main` branch #3

Rename `main` branch #3

dey4ss commented Feb 6, 2025 •

edited

Loading

Note on the change in `phash.c`