-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GSoC 2022: Multiweight integration #125
Draft
kfan326
wants to merge
57
commits into
MadAnalysis:substructure
Choose a base branch
from
kfan326:multi_weight/multi_thread
base: substructure
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Changes from 55 commits
Commits
Show all changes
57 commits
Select commit
Hold shift + click to select a range
b4bdd79
added multiweight functions to RegionSelection and Cutflows
kfan326 d62b55e
added initialize for multiweight to regionselectionmanager
kfan326 20a2e6c
fixed regionSelectionManager destructor
kfan326 5b68fbe
Update tools/SampleAnalyzer/Process/RegionSelection/RegionSelectionMa…
kfan326 fecbffc
Update changelog-dev.md
kfan326 8451685
update changelog-dev.md
jackaraz ea1789b
Merge branch 'kfan326-main'
jackaraz beb3b7a
commented out debug for cutflow
kfan326 0f52721
Merge pull request #3 from MadAnalysis/substructure
kfan326 8552480
integrated SQLite3 output format for cutflows
kfan326 a1797b3
added database manager header file
kfan326 01cb86a
delegated WriteSQL to CounterManager from SampleAnalyzer::Finalize
kfan326 ac6d797
Merge branch 'MadAnalysis:main' into multi_weight/multi_thread
kfan326 fad5125
added operators to weight collections
kfan326 f39ceb6
added histogramming to multiweight integration with SQLite3 output
kfan326 48eac29
fixed Histogramming Statistics table/Histo WriteSQL to write unique s…
kfan326 b659986
fixed bug with missing 0 entries
kfan326 5e3a36c
fixed missing histogram data when entries are 0
kfan326 922ece8
removed databse entry insertion debug messages
kfan326 1e39fd2
added weight names to DB
kfan326 8a3b4d7
get weight names from first sample only, weight names should be ident…
kfan326 2882fc5
fixed typo in database HistoDescription table xmax
kfan326 9b879e0
changed cutflow db output file name
kfan326 c37c293
added detect script for sqlite3
kfan326 d48cae3
add checkup.py modification
kfan326 4b7b8a7
update detect sqlite
kfan326 698ea22
added multiweight to execute function writer
kfan326 dc1d3bd
edited makefile writers for sqlite3
kfan326 4e5abe0
interface currently links global version of SQLite3 if detected, Mult…
kfan326 3777adb
readded databasemanager to interfaces
kfan326 b393229
removed .DS file
kfan326 c505e01
Update madanalysis/system/architecture_info.py
kfan326 9c22841
Update madanalysis/IOinterface/library_writer.py
kfan326 430bf63
Update madanalysis/core/main.py
kfan326 3d94e2d
Update madanalysis/system/detect_sqlite.py
kfan326 af5fb18
Update madanalysis/system/session_info.py
kfan326 274e0d9
made base class for SQL
kfan326 0ce8e89
fixed interface for SQLite
kfan326 777dead
changed SQLite interface to use Pointer to implementation design patt…
kfan326 1d4762d
refactored database manager functionality to output manager, sample a…
kfan326 f719875
read sqlite db for histo data instead of SAF
kfan326 4aebe61
added HistoRequency Fill method for multiweight
kfan326 63b9f37
added HistoLogX Fill
kfan326 bc20d21
append stdev array to positive and negative HistogramCore objects
kfan326 5d326ee
added error bar to plots, not sure if scale is correct
kfan326 d71e17b
changed histo mean and variation calculation in sqlite reader, there …
kfan326 624a974
added weight statistics averages to sqlite loader and load from sqlit…
kfan326 4d0d1b7
fixed sqlite reader bug
kfan326 be33465
fix bugs with sqlite reader query
kfan326 a22ffc0
statistics table now uses averages of all weights
kfan326 9331848
Merge branch 'substructure' into multi_weight/multi_thread
jackaraz 594af5d
duplicated weight names to histo db file
kfan326 7211fea
regularized mean/stdev by sumw
kfan326 faee6ff
fixed underflow and overflow bins
kfan326 984ec51
fixed histologx underflow/overflow
kfan326 c4619ce
minor fixes
kfan326 46c7292
removed .DS_store from sqlite interface"
kfan326 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,189 @@ | ||
import sqlite3 | ||
from matplotlib import pyplot as plt | ||
import numpy as np | ||
import math | ||
import statistics | ||
|
||
|
||
def getMeanAndStdevOld(path): | ||
|
||
con = sqlite3.connect(path) | ||
cursor = con.cursor() | ||
|
||
bin_data = cursor.execute("select * from data;").fetchall() | ||
|
||
pos_bins = dict() | ||
neg_bins = dict() | ||
|
||
## bin_data has all data for the histogram, need to get mean and standard deviation for each bin | ||
## each row of the query is a tuple of 5 elements [histo name, weight id, bin #, positive value, negative value] | ||
## sort them into +bin/-bin[name] -> bin # -> [mean, standard deviation] | ||
|
||
for row in bin_data: | ||
## if the histo name is not inside the bin dictionaries, create a new dictionary for each of +/- bin dictionary | ||
## append values to +/-bin[name][bin#] | ||
|
||
if row[0] not in pos_bins or row[0] not in neg_bins: | ||
pos_bins[row[0]] = dict() | ||
neg_bins[row[0]] = dict() | ||
pos_bins[row[0]][row[2]] = [float(row[3])] | ||
neg_bins[row[0]][row[2]] = [float(row[4])] | ||
|
||
else: | ||
if row[2] in pos_bins[row[0]] or row[2] in neg_bins[row[0]]: | ||
pos_bins[row[0]][row[2]].append(float(row[3])) | ||
neg_bins[row[0]][row[2]].append(float(row[4])) | ||
else : | ||
pos_bins[row[0]][row[2]] = [float(row[3])] | ||
neg_bins[row[0]][row[2]] = [float(row[4])] | ||
|
||
output = dict() | ||
|
||
for histo_name in pos_bins: | ||
output[histo_name] = dict() | ||
for bin_i in pos_bins[histo_name]: | ||
output[histo_name][bin_i] = [statistics.mean(pos_bins[histo_name][bin_i]), statistics.stdev(pos_bins[histo_name][bin_i])] | ||
|
||
for histo_name in neg_bins: | ||
for bin_i in neg_bins[histo_name]: | ||
output[histo_name][bin_i].extend([statistics.mean(neg_bins[histo_name][bin_i]), statistics.stdev(neg_bins[histo_name][bin_i])]) | ||
|
||
return output | ||
|
||
|
||
def getStatistics(stats): | ||
histoname_dict = dict() | ||
for entry in stats: | ||
if entry[0] not in histoname_dict: | ||
histoname_dict[entry[0]] = dict() | ||
histoname_dict[entry[0]][entry[1]] = float(entry[2]) - float(entry[3]) | ||
return histoname_dict | ||
|
||
|
||
def getMeanAndStdev(path): | ||
|
||
con = sqlite3.connect(path) | ||
cursor = con.cursor() | ||
bin_data = cursor.execute("select * from data;").fetchall() | ||
stats_data = cursor.execute("select name, id, pos_sum_event_weights_over_events, neg_sum_event_weights_over_events from Statistics").fetchall() | ||
|
||
statsdict = getStatistics(stats_data) | ||
|
||
|
||
## parse data in the form of parsed_data[histo_name][bin #][{positive value, negative value}] | ||
parsed_data = dict() | ||
for row in bin_data: | ||
|
||
histo_name = row[0] | ||
weight_id = row[1] | ||
bin_number = row[2] | ||
sumw = statsdict[histo_name][str(weight_id)] | ||
value = (float(row[3]) - abs(float(row[4]))) / sumw | ||
if histo_name not in parsed_data: | ||
## if histo name is not in the parsed_data dictionary, then create a new bin dictionary for that histo, then for the bin, create a weigh id dictionary | ||
parsed_data[histo_name] = dict() | ||
parsed_data[histo_name][bin_number] = [] | ||
|
||
else: | ||
## since histo name is in the parsed_data dictionary, we need to check if the bin in the dictioary, if not then create a weight id dictionary for that bin | ||
if bin_number not in parsed_data[histo_name]: | ||
parsed_data[histo_name][bin_number] = [] | ||
|
||
parsed_data[histo_name][bin_number].append(value) | ||
|
||
output = dict() | ||
for histo_name in parsed_data: | ||
output[histo_name] = dict() | ||
for bin_number in parsed_data[histo_name]: | ||
output[histo_name][bin_number] = [statistics.mean(parsed_data[histo_name][bin_number]), statistics.stdev(parsed_data[histo_name][bin_number])] | ||
|
||
return output | ||
|
||
def getHistoStatisticsAvg(path): | ||
|
||
con = sqlite3.connect(path) | ||
cursor = con.cursor() | ||
|
||
|
||
statistics = cursor.execute("select name, avg(pos_num_events), avg(neg_num_events), avg(pos_sum_event_weights_over_events), avg(neg_sum_event_weights_over_events), avg(pos_entries), avg(neg_entries), avg(pos_sum_event_weights_over_entries), avg(neg_sum_event_weights_over_entries), avg(pos_sum_squared_weights), avg(neg_sum_squared_weights), avg(pos_value_times_weight), avg(neg_value_times_weight), avg(pos_value_squared_times_weight), avg(neg_value_squared_times_weight) from Statistics group by name;").fetchall() | ||
|
||
statdict = dict() | ||
for i in range(len(statistics)): | ||
statdict[statistics[i][0]] = statistics[i][1:] | ||
|
||
return statdict; | ||
|
||
|
||
|
||
|
||
|
||
|
||
## debug for printing out output dictionary | ||
## structure is as follows: | ||
## output[histogram_name][bin #] = [positive mean, positive stdev, negative mean, negative stddev] | ||
|
||
|
||
def DBreader_debug(output): | ||
|
||
for name in output: | ||
print(name) | ||
for eachbin in output[name]: | ||
print(eachbin) | ||
for val in output[name][eachbin]: | ||
print(val) | ||
|
||
|
||
for histo in output: | ||
num_of_keys = len(output[histo].keys()) | ||
labels = [None] * num_of_keys | ||
for i in range(1,num_of_keys): | ||
labels[i] = i | ||
labels[0] = 'underflow' | ||
labels[num_of_keys-1] = 'overflow' | ||
positives = [None] * num_of_keys | ||
negatives = [None] * num_of_keys | ||
for row in output[histo]: | ||
if(row == 'underflow'): | ||
positives[0] = output[histo][row][0] | ||
negatives[0] = output[histo][row][2] | ||
elif(row == 'overflow'): | ||
positives[num_of_keys-1] = output[histo][row][0] | ||
negatives[num_of_keys-1] = output[histo][row][2] | ||
else: | ||
positives[int(row)] = output[histo][row][0] | ||
negatives[int(row)] = output[histo][row][2] | ||
#for lable in lables: | ||
# print(lable) | ||
#for val in positives: | ||
# print(val) | ||
#for val in negatives: | ||
# print(val) | ||
x = np.arange(num_of_keys) | ||
width = 0.5 | ||
fig, ax = plt.subplots() | ||
rects1 = ax.bar(x - width/3, positives, width, label="positives avg") | ||
rects2 = ax.bar(x + width/3, negatives, width, label="negatives avg") | ||
|
||
ax.set_ylabel('Events Luminosity = ') | ||
ax.set_title(histo) | ||
ax.set_xticks(x, labels, rotation = 65) | ||
ax.legend() | ||
|
||
#ax.bar_label(rects1, padding=3) | ||
#ax.bar_label(rects2, padding=3) | ||
|
||
fig.tight_layout() | ||
plt.show() | ||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -41,6 +41,7 @@ def __init__(self): | |||||
self.has_fastjet = False | ||||||
self.has_delphes = False | ||||||
self.has_delphesMA5tune = False | ||||||
self.has_sqlite3 = False | ||||||
|
||||||
|
||||||
@staticmethod | ||||||
|
@@ -98,7 +99,10 @@ def UserfriendlyMakefileForSampleAnalyzer(filename,options): | |||||
file.write('\tcd Test && $(MAKE) -f Makefile_delphesMA5tune\n') | ||||||
if options.has_process: | ||||||
file.write('\tcd Process && $(MAKE) -f Makefile\n') | ||||||
file.write('\tcd Test && $(MAKE) -f Makefile_process\n') | ||||||
file.write('\tcd Test && $(MAKE) -f Makefile_process\n') | ||||||
if options.has_sqlite3: | ||||||
file.write('\tcd Interfaces && $(MAKE) -f Makefile_sqlite\n') | ||||||
file.write('\tcd Test && $(MAKE) -f Makefile_sqlite\n') | ||||||
file.write('\n') | ||||||
|
||||||
# Clean | ||||||
|
@@ -125,6 +129,9 @@ def UserfriendlyMakefileForSampleAnalyzer(filename,options): | |||||
if options.has_process: | ||||||
file.write('\tcd Process && $(MAKE) -f Makefile clean\n') | ||||||
file.write('\tcd Test && $(MAKE) -f Makefile_process clean\n') | ||||||
if options.has_sqlite3: | ||||||
file.write('\tcd Interfaces && $(MAKE) -f Makefile_sqlite clean\n') | ||||||
file.write('\tcd Test && $(MAKE) -f Makefile_sqlite clean\n') | ||||||
file.write('\n') | ||||||
|
||||||
# Mrproper | ||||||
|
@@ -152,6 +159,9 @@ def UserfriendlyMakefileForSampleAnalyzer(filename,options): | |||||
if options.has_process: | ||||||
file.write('\tcd Process && $(MAKE) -f Makefile mrproper\n') | ||||||
file.write('\tcd Test && $(MAKE) -f Makefile_process mrproper\n') | ||||||
if options.has_sqlite3: | ||||||
file.write('\tcd Interfaces && $(MAKE) -f Makefile_sqlite mrproper\n') | ||||||
file.write('\tcd Test && $(MAKE) -f Makefile_sqlite mrproper\n') | ||||||
file.write('\n') | ||||||
|
||||||
# Closing the file | ||||||
|
@@ -194,6 +204,9 @@ def __init__(self): | |||||
self.has_root_tag = False | ||||||
self.has_root_lib = False | ||||||
self.has_root_ma5lib = False | ||||||
self.has_sqlite = False | ||||||
self.has_sqlite_tag = False | ||||||
self.has_sqlite_lib = False | ||||||
|
||||||
|
||||||
|
||||||
|
@@ -321,7 +334,9 @@ def Makefile( | |||||
for header in archi_info.delphesMA5tune_inc_paths: | ||||||
cxxflags.extend(['-I'+header]) | ||||||
file.write('CXXFLAGS += '+' '.join(cxxflags)+'\n') | ||||||
|
||||||
|
||||||
|
||||||
|
||||||
# - tags | ||||||
cxxflags=[] | ||||||
if options.has_root_tag: | ||||||
|
@@ -338,6 +353,8 @@ def Makefile( | |||||
cxxflags.extend(['-DDELPHES_USE']) | ||||||
if options.has_delphesMA5tune_tag: | ||||||
cxxflags.extend(['-DDELPHESMA5TUNE_USE']) | ||||||
if options.has_sqlite_tag: | ||||||
cxxflags.extend(['-DSQLITE3_USE']) | ||||||
if len(cxxflags)!=0: | ||||||
file.write('CXXFLAGS += '+' '.join(cxxflags)+'\n') | ||||||
file.write('\n') | ||||||
|
@@ -347,7 +364,9 @@ def Makefile( | |||||
|
||||||
# - general | ||||||
libs=[] | ||||||
file.write('LIBFLAGS = \n') | ||||||
|
||||||
# added SQL | ||||||
#file.write('LIBFLAGS = -l sqlite3\n') | ||||||
|
||||||
# - commons | ||||||
if options.has_commons: | ||||||
|
@@ -429,6 +448,14 @@ def Makefile( | |||||
if options.has_heptoptagger: | ||||||
file.write('LIBFLAGS += -lHEPTopTagger_for_ma5\n') | ||||||
|
||||||
# SQLite3 | ||||||
if options.has_sqlite: | ||||||
file.write('LIBFLAGS += -l sqlite3\n') | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
lets keep |
||||||
|
||||||
if options.has_sqlite_lib: | ||||||
file.write('LIBFLAGS += -l sqlite_for_ma5\n') | ||||||
|
||||||
|
||||||
# - Commons | ||||||
if options.has_commons: | ||||||
libs=[] | ||||||
|
@@ -464,6 +491,8 @@ def Makefile( | |||||
libs.append('$(MA5_BASE)/tools/SampleAnalyzer/Lib/libsubstructure_for_ma5.so') | ||||||
if options.has_heptoptagger: | ||||||
libs.append('$(MA5_BASE)/tools/SampleAnalyzer/Lib/libHEPTopTagger_for_ma5.so') | ||||||
if options.has_sqlite_lib: | ||||||
libs.append('$(MA5_BASE)/tools/SampleAnalyzer/Lib/libsqlite_for_ma5.so') | ||||||
if len(libs)!=0: | ||||||
file.write('# Requirements to check before building\n') | ||||||
for ind in range(0,len(libs)): | ||||||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the comments above.