You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm running the code below, using a composition of groupBy, select and inflate and comparing it to a pivot call, both returning the same result. The first call runs in 0.235 ms while the pivot one runs in 146.8 ms, a 62,000% slower. A call to "toArray" takes 51.27 ms with the groupBy and 34.456 ms using pivot. 48 % faster.
Dataset is a 1.5 Mbytes file containing 27k rows.
constdataForge=require('data-forge');require('data-forge-fs');letstart=process.hrtime();constelapsed_time=function(note){constprecision=3;// 3 decimal placesconstelapsed=process.hrtime(start)[1]/1000000;// divide by a million to get nano to milliconsole.log(process.hrtime(start)[0]+" s, "+elapsed.toFixed(precision)+" ms - "+note);// print message + timestart=process.hrtime();// reset the timer}constdf=dataForge.readFileSync('./data.csv').parseCSV({dynamicTyping: true}).withIndex((row)=>`${row.meeting_id}_${row.item_id}_${row.user_id}_${row.source_id}`)elapsed_time('parsecsv')constsintetico=df.groupBy((row)=>`${row.meeting_id}_${row.item_id}_${row.vote}`).select((group)=>({meeting_id: group.first().meeting_id,item_id: group.first().item_id,vote: group.first().vote,stock: group.deflate(row=>row.stock).sum(),})).inflate()elapsed_time('groupBy, select, inflate')constsinteticoPivot=df.pivot(['meeting_id','item_id','vote'],{stock: dataForge.Series.sum})elapsed_time('pivot')constdata=sintetico.head(5).toArray()elapsed_time('groupBy, select, inflate => toArray')constdata2=sintetico.head(5).toArray()elapsed_time('groupBy, select, inflate => toArray again')constdata3=sinteticoPivot.head(5).toArray()elapsed_time('pivot => toArray')constdata4=sinteticoPivot.head(5).toArray()elapsed_time('pivot => toArray again')
These are the outputs:
0 s, 183.236 ms - parsecsv
0 s, 0.235 ms - groupBy, select, inflate
0 s, 146.789 ms - pivot
0 s, 51.270 ms - groupBy, select, inflate => toArray
0 s, 1.200 ms - groupBy, select, inflate => toArray again
0 s, 34.456 ms - pivot => toArray
0 s, 13.261 ms - pivot => toArray again
Is this intended? Should I dig deeper to fix it and make a pull request?
Thanks,
The text was updated successfully, but these errors were encountered:
Hello, is this the expected behavior?
I'm running the code below, using a composition of groupBy, select and inflate and comparing it to a pivot call, both returning the same result. The first call runs in 0.235 ms while the pivot one runs in 146.8 ms, a 62,000% slower. A call to "toArray" takes 51.27 ms with the groupBy and 34.456 ms using pivot. 48 % faster.
Dataset is a 1.5 Mbytes file containing 27k rows.
These are the outputs:
Is this intended? Should I dig deeper to fix it and make a pull request?
Thanks,
The text was updated successfully, but these errors were encountered: