For a desired price range and wine robe (e.g. red, white), which characteristics (Region, Year, Name) are most auspicious for choosing a well-rated wine bottle?
-
In 2017 in France, there were 46,600 vineyards producing AOP (Appellation d’Origine Contrôlée) wines, which covered 446,588 ha of land (Agrimer 2018).
-
AOP represents ⅔ of all wine produced in France; About 60% of all wines produced in France are consumed there (Agrimer 2018).
-
A french person drinks about 45 liters of wine per year on average today (eq. to about 60 standard size bottles), made of 52% red, 31% rosé, and 17% (still or sparkling) white (Vinexpo/IWSR 2020).
-
Wine production in 2017 generated over 11,2bn euros, i.e. about 16 % of the total value of french agriculture (Agrimer 2018).
- Main regions of AOP production: (1) Alsace, (2) Bordeaux, (4) Bourgogne, (6) Champagne, (9) Languedoc-Roussillon, (12) Provence, (11) Val de Loire, (14) Vallée du Rhône
-
Vivino.com is an online marketplace and rating app for wine founded in 2010 and headquartered in San Francisco.
-
Its database contains more than 10 million wines (Wikipedia).
Using a web scraping tool (Selenium) compatible with Python3, we extract the following data:
-
Country, Region, and Estate name
-
Robe (red, still white, sparkling white, rosé) & Vintage (year)
-
Ratings (1 to 5 scale) & number of reviews
-
Price of sale
For more details on how the data was scrapped, please refer to script web_scrapping.py
(a) Items produced and sold in France, (b) cheaper than 100 euros, and (c) reviewed by at least 20 people. Red and white (non-sparkling) wines were analyzed separately. Combining all criteria give 13729 red and 5207 white wines.
For more details on how the data was analyzed, please refer to script vin_analysis.py
Probability distribution function (PDF) of the sale price for (L) red and (R) white wines. Data is shown in an histogram with uniform bin size of 2 euros (black line), and by its Gaussian kernel density estimation [KDE] (orange line). Histogram and KDE are normalized such that all probabilities sum to 1 when integrating over the 0-100 euros price range.
PDF of the sale price for (L) red and (R) white wines, for each region of AOP production. The legend shows the percentage of wines coming from each region for each robe; the percentage of wine coming from all 5 regions with respect to all AOP wines of the same robe is shown in each panel's title.
Probability distribution function (PDF) for the Ratings for (L) red and (R) white wines. Ratings is on a scale of 1 (poor) to 5 (excellent). Same graphical conventions as previous figure apply.
PDF of the Vintage (year of production) for (L) red and (R) white wines. Same graphical conventions as previous figure apply.
Ratings vs. price for (L) red and (R) white wines; each dot is one item. The average (red line), median (magenta line), and the 95% enveloppe of all ratings (blue lines) are shown as a function of the price.
Likelihood of a (L) red or (R) white wine from each region being better rated than the average among its peers in 4 different price range: 5 to 15 euros, 15 to 25, 25 to 35, 35 to 45 euros. A 95% confidence interval on the Likelihood estimate is shown for each region and price range.
Fraction of variance in the item ratings for (L) red and (R) white wines explained by its 3 main predictors (Vintage Year, Province, and Price) in 4 different price range: 5 to 15 euros, 15 to 25, 25 to 35, 35 to 45 euros.