Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GSoC 2022: Multiweight integration #125

Draft
wants to merge 57 commits into
base: substructure
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 55 commits
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
b4bdd79
added multiweight functions to RegionSelection and Cutflows
kfan326 Jul 1, 2022
d62b55e
added initialize for multiweight to regionselectionmanager
kfan326 Jul 7, 2022
20a2e6c
fixed regionSelectionManager destructor
kfan326 Jul 23, 2022
5b68fbe
Update tools/SampleAnalyzer/Process/RegionSelection/RegionSelectionMa…
kfan326 Jul 24, 2022
fecbffc
Update changelog-dev.md
kfan326 Jul 24, 2022
8451685
update changelog-dev.md
jackaraz Jul 25, 2022
ea1789b
Merge branch 'kfan326-main'
jackaraz Jul 25, 2022
beb3b7a
commented out debug for cutflow
kfan326 Jul 31, 2022
0f52721
Merge pull request #3 from MadAnalysis/substructure
kfan326 Aug 1, 2022
8552480
integrated SQLite3 output format for cutflows
kfan326 Aug 2, 2022
a1797b3
added database manager header file
kfan326 Aug 6, 2022
01cb86a
delegated WriteSQL to CounterManager from SampleAnalyzer::Finalize
kfan326 Aug 8, 2022
ac6d797
Merge branch 'MadAnalysis:main' into multi_weight/multi_thread
kfan326 Aug 23, 2022
fad5125
added operators to weight collections
kfan326 Aug 23, 2022
f39ceb6
added histogramming to multiweight integration with SQLite3 output
kfan326 Aug 24, 2022
48eac29
fixed Histogramming Statistics table/Histo WriteSQL to write unique s…
kfan326 Aug 24, 2022
b659986
fixed bug with missing 0 entries
kfan326 Sep 1, 2022
5e3a36c
fixed missing histogram data when entries are 0
kfan326 Sep 1, 2022
922ece8
removed databse entry insertion debug messages
kfan326 Sep 1, 2022
1e39fd2
added weight names to DB
kfan326 Sep 2, 2022
8a3b4d7
get weight names from first sample only, weight names should be ident…
kfan326 Sep 2, 2022
2882fc5
fixed typo in database HistoDescription table xmax
kfan326 Sep 2, 2022
9b879e0
changed cutflow db output file name
kfan326 Sep 8, 2022
c37c293
added detect script for sqlite3
kfan326 Sep 12, 2022
d48cae3
add checkup.py modification
kfan326 Sep 12, 2022
4b7b8a7
update detect sqlite
kfan326 Sep 18, 2022
698ea22
added multiweight to execute function writer
kfan326 Sep 22, 2022
dc1d3bd
edited makefile writers for sqlite3
kfan326 Sep 23, 2022
4e5abe0
interface currently links global version of SQLite3 if detected, Mult…
kfan326 Sep 28, 2022
3777adb
readded databasemanager to interfaces
kfan326 Sep 28, 2022
b393229
removed .DS file
kfan326 Sep 28, 2022
c505e01
Update madanalysis/system/architecture_info.py
kfan326 Oct 20, 2022
9c22841
Update madanalysis/IOinterface/library_writer.py
kfan326 Oct 20, 2022
430bf63
Update madanalysis/core/main.py
kfan326 Oct 20, 2022
3d94e2d
Update madanalysis/system/detect_sqlite.py
kfan326 Oct 20, 2022
af5fb18
Update madanalysis/system/session_info.py
kfan326 Oct 20, 2022
274e0d9
made base class for SQL
kfan326 Oct 21, 2022
0ce8e89
fixed interface for SQLite
kfan326 Oct 21, 2022
777dead
changed SQLite interface to use Pointer to implementation design patt…
kfan326 Nov 25, 2022
1d4762d
refactored database manager functionality to output manager, sample a…
kfan326 Dec 19, 2022
f719875
read sqlite db for histo data instead of SAF
kfan326 Feb 3, 2023
4aebe61
added HistoRequency Fill method for multiweight
kfan326 Feb 8, 2023
63b9f37
added HistoLogX Fill
kfan326 Feb 8, 2023
bc20d21
append stdev array to positive and negative HistogramCore objects
kfan326 Feb 10, 2023
5d326ee
added error bar to plots, not sure if scale is correct
kfan326 Feb 10, 2023
d71e17b
changed histo mean and variation calculation in sqlite reader, there …
kfan326 Feb 23, 2023
624a974
added weight statistics averages to sqlite loader and load from sqlit…
kfan326 Mar 14, 2023
4d0d1b7
fixed sqlite reader bug
kfan326 Mar 14, 2023
be33465
fix bugs with sqlite reader query
kfan326 Mar 14, 2023
a22ffc0
statistics table now uses averages of all weights
kfan326 Mar 15, 2023
9331848
Merge branch 'substructure' into multi_weight/multi_thread
jackaraz Mar 20, 2023
594af5d
duplicated weight names to histo db file
kfan326 Mar 20, 2023
7211fea
regularized mean/stdev by sumw
kfan326 Mar 21, 2023
faee6ff
fixed underflow and overflow bins
kfan326 Mar 21, 2023
984ec51
fixed histologx underflow/overflow
kfan326 Mar 21, 2023
c4619ce
minor fixes
kfan326 Mar 28, 2023
46c7292
removed .DS_store from sqlite interface"
kfan326 Mar 28, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 9 additions & 1 deletion doc/releases/changelog-dev.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,14 @@

## Bug fixes

* Fixed destructor in `RegionSelectionManager` so that `RegionSelection`
objects allocated inside the `region_vector` are properly destructed upon
existing `scope/destruction` of `RegionSelectionManager`.
([#113](https://github.com/MadAnalysis/madanalysis5/pull/113))


## Contributors

This release contains contributions from (in alphabetical order):
This release contains contributions from (in alphabetical order):

[Kyle Fan](https://github.com/kfan326)
216 changes: 177 additions & 39 deletions madanalysis/IOinterface/job_reader.py

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions madanalysis/IOinterface/job_writer.py
Original file line number Diff line number Diff line change
Expand Up @@ -750,6 +750,7 @@ def WriteMakefiles(self, option="", **kwargs):

options.has_root_inc = self.main.archi_info.has_root
options.has_root_lib = self.main.archi_info.has_root
options.has_sqlite = self.main.archi_info.has_sqlite3
#options.has_userpackage = True
toRemove=['Log/compilation.log','Log/linking.log','Log/cleanup.log','Log/mrproper.log']

Expand Down
12 changes: 10 additions & 2 deletions madanalysis/IOinterface/library_writer.py
Original file line number Diff line number Diff line change
Expand Up @@ -131,7 +131,7 @@ def WriteMakefileForInterfaces(self,package):
filename = self.path+"/SampleAnalyzer/Test/Makefile_delphesMA5tune"
elif package=='test_root':
filename = self.path+"/SampleAnalyzer/Test/Makefile_root"

# Header
title=''
if package=='commons':
Expand Down Expand Up @@ -239,10 +239,13 @@ def WriteMakefileForInterfaces(self,package):
options.ma5_fastjet_mode = self.main.archi_info.has_fastjet
options.has_fastjet_inc = self.main.archi_info.has_fastjet
options.has_fastjet_lib = self.main.archi_info.has_fastjet
#options.has_sqlite_lib = self.main.archi_info.has_sqlite3
options.has_sqlite_tag = self.main.archi_info.has_sqlite3
# options.has_fastjet_ma5lib = self.main.archi_info.has_fastjet
toRemove.extend(['compilation.log','linking.log','cleanup.log','mrproper.log'])
elif package=='test_commons':
options.has_commons = True
options.has_sqlite_tag = self.main.archi_info.has_sqlite3
toRemove.extend(['compilation_commons.log','linking_commons.log','cleanup_commons.log','mrproper_commons.log','../Bin/TestCommons.log'])
elif package=='zlib':
options.has_commons = True
Expand All @@ -252,6 +255,7 @@ def WriteMakefileForInterfaces(self,package):
elif package=='test_zlib':
options.has_commons = True
options.has_zlib_ma5lib = True
options.has_sqlite_tag = self.main.archi_info.has_sqlite3
# options.has_zlib_lib = True
toRemove.extend(['compilation_zlib.log','linking_zlib.log','cleanup_zlib.log','mrproper_zlib.log','../Bin/TestZlib.log'])
elif package=='delphes':
Expand Down Expand Up @@ -324,6 +328,8 @@ def WriteMakefileForInterfaces(self,package):
options.has_fastjet_lib = self.main.archi_info.has_fastjet
options.ma5_fastjet_mode = self.main.archi_info.has_fastjet
options.has_substructure = self.main.archi_info.has_fjcontrib and self.main.archi_info.has_fastjet
options.has_sqlite_tag = self.main.archi_info.has_sqlite3
options.has_sqlite_lib = self.main.archi_info.has_sqlite3

toRemove.extend(['compilation.log','linking.log','cleanup.log','mrproper.log'])
elif package=='test_process':
Expand All @@ -342,6 +348,8 @@ def WriteMakefileForInterfaces(self,package):
# options.has_delphesMA5tune_tag = self.main.archi_info.has_delphesMA5tune
# options.has_zlib_tag = self.main.archi_info.has_zlib
toRemove.extend(['compilation_process.log','linking_process.log','cleanup_process.log','mrproper_process.log','../Bin/TestSampleAnalyzer.log'])
elif package=='sqlite':
options.has_sqlite = self.main.archi_info.has_sqlite3

# file pattern
if package in ['commons','process','configuration']:
Expand Down Expand Up @@ -373,7 +381,7 @@ def WriteMakefileForInterfaces(self,package):
hfiles = ['DelphesMA5tune/*.h']
elif package=='test_root':
cppfiles = ['Root/*.cpp']
hfiles = ['Root/*.h']
hfiles = ['Root/*.h']
else:
cppfiles = [package+'/*.cpp']
hfiles = [package+'/*.h']
Expand Down
189 changes: 189 additions & 0 deletions madanalysis/IOinterface/sqlite_reader.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,189 @@
import sqlite3
from matplotlib import pyplot as plt
import numpy as np
import math
import statistics


def getMeanAndStdevOld(path):

con = sqlite3.connect(path)
cursor = con.cursor()

bin_data = cursor.execute("select * from data;").fetchall()

pos_bins = dict()
neg_bins = dict()

## bin_data has all data for the histogram, need to get mean and standard deviation for each bin
## each row of the query is a tuple of 5 elements [histo name, weight id, bin #, positive value, negative value]
## sort them into +bin/-bin[name] -> bin # -> [mean, standard deviation]

for row in bin_data:
## if the histo name is not inside the bin dictionaries, create a new dictionary for each of +/- bin dictionary
## append values to +/-bin[name][bin#]

if row[0] not in pos_bins or row[0] not in neg_bins:
pos_bins[row[0]] = dict()
neg_bins[row[0]] = dict()
pos_bins[row[0]][row[2]] = [float(row[3])]
neg_bins[row[0]][row[2]] = [float(row[4])]

else:
if row[2] in pos_bins[row[0]] or row[2] in neg_bins[row[0]]:
pos_bins[row[0]][row[2]].append(float(row[3]))
neg_bins[row[0]][row[2]].append(float(row[4]))
else :
pos_bins[row[0]][row[2]] = [float(row[3])]
neg_bins[row[0]][row[2]] = [float(row[4])]

output = dict()

for histo_name in pos_bins:
output[histo_name] = dict()
for bin_i in pos_bins[histo_name]:
output[histo_name][bin_i] = [statistics.mean(pos_bins[histo_name][bin_i]), statistics.stdev(pos_bins[histo_name][bin_i])]

for histo_name in neg_bins:
for bin_i in neg_bins[histo_name]:
output[histo_name][bin_i].extend([statistics.mean(neg_bins[histo_name][bin_i]), statistics.stdev(neg_bins[histo_name][bin_i])])

return output


def getStatistics(stats):
histoname_dict = dict()
for entry in stats:
if entry[0] not in histoname_dict:
histoname_dict[entry[0]] = dict()
histoname_dict[entry[0]][entry[1]] = float(entry[2]) - float(entry[3])
return histoname_dict


def getMeanAndStdev(path):

con = sqlite3.connect(path)
cursor = con.cursor()
bin_data = cursor.execute("select * from data;").fetchall()
stats_data = cursor.execute("select name, id, pos_sum_event_weights_over_events, neg_sum_event_weights_over_events from Statistics").fetchall()

statsdict = getStatistics(stats_data)


## parse data in the form of parsed_data[histo_name][bin #][{positive value, negative value}]
parsed_data = dict()
for row in bin_data:

histo_name = row[0]
weight_id = row[1]
bin_number = row[2]
sumw = statsdict[histo_name][str(weight_id)]
value = (float(row[3]) - abs(float(row[4]))) / sumw
if histo_name not in parsed_data:
## if histo name is not in the parsed_data dictionary, then create a new bin dictionary for that histo, then for the bin, create a weigh id dictionary
parsed_data[histo_name] = dict()
parsed_data[histo_name][bin_number] = []

else:
## since histo name is in the parsed_data dictionary, we need to check if the bin in the dictioary, if not then create a weight id dictionary for that bin
if bin_number not in parsed_data[histo_name]:
parsed_data[histo_name][bin_number] = []

parsed_data[histo_name][bin_number].append(value)

output = dict()
for histo_name in parsed_data:
output[histo_name] = dict()
for bin_number in parsed_data[histo_name]:
output[histo_name][bin_number] = [statistics.mean(parsed_data[histo_name][bin_number]), statistics.stdev(parsed_data[histo_name][bin_number])]

return output

def getHistoStatisticsAvg(path):

con = sqlite3.connect(path)
cursor = con.cursor()


statistics = cursor.execute("select name, avg(pos_num_events), avg(neg_num_events), avg(pos_sum_event_weights_over_events), avg(neg_sum_event_weights_over_events), avg(pos_entries), avg(neg_entries), avg(pos_sum_event_weights_over_entries), avg(neg_sum_event_weights_over_entries), avg(pos_sum_squared_weights), avg(neg_sum_squared_weights), avg(pos_value_times_weight), avg(neg_value_times_weight), avg(pos_value_squared_times_weight), avg(neg_value_squared_times_weight) from Statistics group by name;").fetchall()

statdict = dict()
for i in range(len(statistics)):
statdict[statistics[i][0]] = statistics[i][1:]

return statdict;






## debug for printing out output dictionary
## structure is as follows:
## output[histogram_name][bin #] = [positive mean, positive stdev, negative mean, negative stddev]


def DBreader_debug(output):

for name in output:
print(name)
for eachbin in output[name]:
print(eachbin)
for val in output[name][eachbin]:
print(val)


for histo in output:
num_of_keys = len(output[histo].keys())
labels = [None] * num_of_keys
for i in range(1,num_of_keys):
labels[i] = i
labels[0] = 'underflow'
labels[num_of_keys-1] = 'overflow'
positives = [None] * num_of_keys
negatives = [None] * num_of_keys
for row in output[histo]:
if(row == 'underflow'):
positives[0] = output[histo][row][0]
negatives[0] = output[histo][row][2]
elif(row == 'overflow'):
positives[num_of_keys-1] = output[histo][row][0]
negatives[num_of_keys-1] = output[histo][row][2]
else:
positives[int(row)] = output[histo][row][0]
negatives[int(row)] = output[histo][row][2]
#for lable in lables:
# print(lable)
#for val in positives:
# print(val)
#for val in negatives:
# print(val)
x = np.arange(num_of_keys)
width = 0.5
fig, ax = plt.subplots()
rects1 = ax.bar(x - width/3, positives, width, label="positives avg")
rects2 = ax.bar(x + width/3, negatives, width, label="negatives avg")

ax.set_ylabel('Events Luminosity = ')
ax.set_title(histo)
ax.set_xticks(x, labels, rotation = 65)
ax.legend()

#ax.bar_label(rects1, padding=3)
#ax.bar_label(rects2, padding=3)

fig.tight_layout()
plt.show()













35 changes: 32 additions & 3 deletions madanalysis/build/makefile_writer.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ def __init__(self):
self.has_fastjet = False
self.has_delphes = False
self.has_delphesMA5tune = False
self.has_sqlite3 = False


@staticmethod
Expand Down Expand Up @@ -98,7 +99,10 @@ def UserfriendlyMakefileForSampleAnalyzer(filename,options):
file.write('\tcd Test && $(MAKE) -f Makefile_delphesMA5tune\n')
if options.has_process:
file.write('\tcd Process && $(MAKE) -f Makefile\n')
file.write('\tcd Test && $(MAKE) -f Makefile_process\n')
file.write('\tcd Test && $(MAKE) -f Makefile_process\n')
if options.has_sqlite3:
file.write('\tcd Interfaces && $(MAKE) -f Makefile_sqlite\n')
file.write('\tcd Test && $(MAKE) -f Makefile_sqlite\n')
file.write('\n')

# Clean
Expand All @@ -125,6 +129,9 @@ def UserfriendlyMakefileForSampleAnalyzer(filename,options):
if options.has_process:
file.write('\tcd Process && $(MAKE) -f Makefile clean\n')
file.write('\tcd Test && $(MAKE) -f Makefile_process clean\n')
if options.has_sqlite3:
file.write('\tcd Interfaces && $(MAKE) -f Makefile_sqlite clean\n')
file.write('\tcd Test && $(MAKE) -f Makefile_sqlite clean\n')
file.write('\n')

# Mrproper
Expand Down Expand Up @@ -152,6 +159,9 @@ def UserfriendlyMakefileForSampleAnalyzer(filename,options):
if options.has_process:
file.write('\tcd Process && $(MAKE) -f Makefile mrproper\n')
file.write('\tcd Test && $(MAKE) -f Makefile_process mrproper\n')
if options.has_sqlite3:
file.write('\tcd Interfaces && $(MAKE) -f Makefile_sqlite mrproper\n')
file.write('\tcd Test && $(MAKE) -f Makefile_sqlite mrproper\n')
file.write('\n')

# Closing the file
Expand Down Expand Up @@ -194,6 +204,9 @@ def __init__(self):
self.has_root_tag = False
self.has_root_lib = False
self.has_root_ma5lib = False
self.has_sqlite = False
self.has_sqlite_tag = False
self.has_sqlite_lib = False



Expand Down Expand Up @@ -321,7 +334,9 @@ def Makefile(
for header in archi_info.delphesMA5tune_inc_paths:
cxxflags.extend(['-I'+header])
file.write('CXXFLAGS += '+' '.join(cxxflags)+'\n')




# - tags
cxxflags=[]
if options.has_root_tag:
Expand All @@ -338,6 +353,8 @@ def Makefile(
cxxflags.extend(['-DDELPHES_USE'])
if options.has_delphesMA5tune_tag:
cxxflags.extend(['-DDELPHESMA5TUNE_USE'])
if options.has_sqlite_tag:
cxxflags.extend(['-DSQLITE3_USE'])
Comment on lines +356 to +357
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if options.has_sqlite_tag:
cxxflags.extend(['-DSQLITE3_USE'])

See the comments above.

if len(cxxflags)!=0:
file.write('CXXFLAGS += '+' '.join(cxxflags)+'\n')
file.write('\n')
Expand All @@ -347,7 +364,9 @@ def Makefile(

# - general
libs=[]
file.write('LIBFLAGS = \n')

# added SQL
#file.write('LIBFLAGS = -l sqlite3\n')

# - commons
if options.has_commons:
Expand Down Expand Up @@ -429,6 +448,14 @@ def Makefile(
if options.has_heptoptagger:
file.write('LIBFLAGS += -lHEPTopTagger_for_ma5\n')

# SQLite3
if options.has_sqlite:
file.write('LIBFLAGS += -l sqlite3\n')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
file.write('LIBFLAGS += -l sqlite3\n')
file.write('LIBFLAGS += -lsqlite_for_ma5\n')

lets keep has_sqlite option for analysis and has_sqlite_lib for compiling the interface. Analysis should not be directly compiled with sqlite.


if options.has_sqlite_lib:
file.write('LIBFLAGS += -l sqlite_for_ma5\n')


# - Commons
if options.has_commons:
libs=[]
Expand Down Expand Up @@ -464,6 +491,8 @@ def Makefile(
libs.append('$(MA5_BASE)/tools/SampleAnalyzer/Lib/libsubstructure_for_ma5.so')
if options.has_heptoptagger:
libs.append('$(MA5_BASE)/tools/SampleAnalyzer/Lib/libHEPTopTagger_for_ma5.so')
if options.has_sqlite_lib:
libs.append('$(MA5_BASE)/tools/SampleAnalyzer/Lib/libsqlite_for_ma5.so')
if len(libs)!=0:
file.write('# Requirements to check before building\n')
for ind in range(0,len(libs)):
Expand Down
4 changes: 4 additions & 0 deletions madanalysis/core/library_builder.py
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,10 @@ def checkMA5(self):
libraries.append(self.archi_info.ma5dir+'/tools/SampleAnalyzer/Lib/libdelphes_for_ma5.so')
if self.archi_info.has_delphesMA5tune:
libraries.append(self.archi_info.ma5dir+'/tools/SampleAnalyzer/Lib/libdelphesMA5tune_for_ma5.so')
if self.archi_info.has_sqlite3:
libraries.append(self.archi_info.ma5dir+'/tools/SampleAnalyzer/Lib/libsqlite_for_ma5.so')


for library in libraries:
if not os.path.isfile(library):
self.logger.debug('\t-> library '+ library + " not found.")
Expand Down
Loading