Suppose you were running an Online Electronic site and you need to get some insights from 2019 monthly sales reports. Main idea of the project is coming from this video.
Features:
- Dataset: Sample Folder
- Sample Notebook: Data_Analysis.ipynb
- Sample Period: Calendar Year 2019
See how it works below
The sample dataset is randomly added by this script. We can adjust the weight of each element to get different versions.
Data format
# Data header
columns = ['Order ID', 'Product', 'Quantity', 'Price', 'Order Date', 'Purchase Address']
# Location
street_names = ['1st', ' 2nd', '3rd', '4th', '5th', '6th', '7th', '8th', '9th', '10th', '11th', '12th', '13th', '14th', '15th', '16th', '17th', '18th', '19th', '20th', '21st', '22nd', '23rd', '24th', '25th']
districts = ['01', '02', '03', '04', '05', '06', '07', '08', '10', '11']
districts_weights = [9, 3, 4, 3, 5, 2, 6, 0.5, 3, 1]
# Product (Price and Frequency)
products = {
'iPhone': [700, 10],
'Google Phone': [600, 3],
'Samsung Phone': [500, 8],
'21in Monitor': [110, 6],
'27in Monitor': [150, 11],
'34in 2k Monitor': [380, 9],
'27in 4K Gaming Monitor': [400, 9],
'Dell Laptop': [800, 7],
'ThinkPad Laptop': [1000, 6],
'Macbook Pro Laptop': [1700, 7],
'Power Bank': [60, 30],
'USB-C Charging Cable': [12, 30],
'Lightning Charging Cable': [15, 30],
'Normal Headphones': [12, 30],
'Sony Headphones': [100, 19],
'Airpods Headphones': [150, 15]
}
Result
The sample analysis file is performed by this script.
Concatenate csv files
os.chdir("./Sales Data Examples")
extension = 'csv'
all_filenames = [i for i in glob.glob('*.{}'.format(extension), recursive=True)]
combined_csv = pd.concat([pd.read_csv(f) for f in all_filenames])
combined_csv.to_csv( "Sales_2019.csv", index=False, encoding='utf-8-sig')
Clear outliers
sales = sales.dropna(how='all')
sales = sales[sales['Order Date'].str[0:4] != 'Null']
Sales by Month
Sales by Territory
Top 5 bundles
sales = sales.dropna(how='all')
sales = sales[sales['Order Date'].str[0:4] != 'Null']
from itertools import combinations
from collections import Counter
count = Counter()
for row in df_pair['Bundle']:
row_list = row.split(', ')
count.update(Counter(combinations(row_list, 3)))
for key, value in count.most_common(5):
print(key, '-', value, 'times')
Please feel free to fork, comment or give feedback to [email protected]