New New World

Thirsty Thursday (via GIPHY)

Objective

Discover which environmental properties are prime for Sauvignon Blanc grape cultivation and develop a web application for users to interact with and gather relevant data on a specified location.

Background

Sauvignon blanc is a green-skinned grape variety that originates from the Bordeaux region of France. The grape most likely gets its name from the French words sauvage (“wild”) and blanc (“white”) due to its early origins as an indigenous grape in South West France.

Sauvignon blanc is currently widely cultivated in France, Chile, Canada, Australia, New Zealand, South Africa, the states of Washington and California in the US. It can develop desirable flavors in both cool and warm environments. Such as, flavors of grass, green bell peppers, tropical fruit, and floral notes It buds early, grows quickly, and can produce several harvest within the same year.

SB has a Short fermentation cycle, that’s why we are only reviewing 2016 data

Why Sauvignon Blanc (SB)?

According to Wine Economist, Sauvignon Blanc is one of the most popular and most profitable wines in the world

Questions this application will attempt to answer:

What are the various geographical locations in which SB currently thrives?
What are the soil properties at those regions?
What are the climate conditions for those locations over the span of 2016?
Providing a location input, what are the soil and climate properties at that site?

Step 1 - Data Gathering

Data was collected from these 2 APIs

and scraped from:

Step 2 - Data Analysis

Analysis done strictly on vineyard environments rated by Wine Enthusiast

As you will notice in the first jupyter notebook analysis down below, initially correlations were not strong because the data was narrow. It only represents places that are known for moderate to great wine. By passing in a desert location, indicated in the second analysis, and using climate and soil components from a location that is far from ideal, it accentuated the correlation.

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

filename = 'Normalized Wine Correlations and Dropped.xlsx'
wine = pd.read_excel(filename)
wine.head()

	iso3	appell_city	appell_country	appell_region	appell_state	lat	lon	lat.1	lon.1	price	...	clay_correl	pH	ph_norm	ph_correl	water_at_withering	waw_norm	waw_correl	bulk_density	bulk_norm	bulk_correl
0	ARG	Argentina	Argentina	Mendoza Province	Mendoza Province	-32.889625	-68.852687	-32.889625	-68.852687	18	...	-0.027127	77	78.34375	-0.211055	17	34.0	0.092602	1489	66.089655	0.032192
1	ARG	Mendoza Province	Argentina	Luján de Cuyo	Mendoza Province	-33.039104	-68.879864	-33.039104	-68.879864	42	...	NaN	78	81.43750	NaN	17	34.0	NaN	1518	72.689655	NaN
2	ARG	Mendoza Province	Argentina	Mendoza	Mendoza Province	-32.889459	-68.845839	-32.889459	-68.845839	10	...	NaN	77	78.34375	NaN	17	34.0	NaN	1479	63.813793	NaN
3	ARG	Mendoza Province	Argentina	Uco Valley	Mendoza Province	-33.600036	-69.282703	-33.600036	-69.282703	15	...	NaN	78	81.43750	NaN	18	40.6	NaN	1421	50.613793	NaN
4	ARG	Mendoza Province	Argentina	Uco Valley	Mendoza Province	-33.600036	-69.282703	-33.600036	-69.282703	10	...	NaN	78	81.43750	NaN	18	40.6	NaN	1421	50.613793	NaN

5 rows × 105 columns

df = wine[['growing_temp_correl','year_temp_correl','max_temp_correl','min_temp_correl','max_min_correl',
      'hum_correl','precip_correl','carbon_correl','cec_correl','clay_correl','ph_correl',
      'waw_correl','bulk_correl']]
new_df = df[0:1]
new_df

	growing_temp_correl	year_temp_correl	max_temp_correl	min_temp_correl	max_min_correl	hum_correl	precip_correl	carbon_correl	cec_correl	clay_correl	ph_correl	waw_correl	bulk_correl
0	-0.134643	-0.044223	-0.152918	-0.02064	-0.104366	0.148011	0.116698	0.047044	0.044261	-0.027127	-0.211055	0.092602	0.032192

new_df.plot(kind='bar',figsize=(15,5))
plt.title("Correlation coefficient Comparison")
plt.ylabel("Correlation coefficient")
plt.xlabel("Factor")
plt.show()

png

df1 = wine[['growing_temp_norm','rating _norm','year_temp_norm','max_temp_norm','min_temp_norm','max_min_norm',
      'humidity_norm','precip_norm','carbon_norm','cec_norm','clay_norm','ph_norm','waw_norm','bulk_norm']]
df1.head()

	growing_temp_norm	rating _norm	year_temp_norm	max_temp_norm	min_temp_norm	max_min_norm	humidity_norm	precip_norm	carbon_norm	cec_norm	clay_norm	ph_norm	waw_norm	bulk_norm
0	66.961771	23.846154	73.608726	66.775003	59.742481	64.481524	33.70719	36.689004	10.658537	10.580645	67.00	78.34375	34.0	66.089655
1	66.961771	46.692308	73.608726	66.775003	59.742481	64.481524	33.70719	36.689004	8.243902	10.580645	64.25	81.43750	34.0	72.689655
2	66.961771	31.461538	73.608726	66.775003	59.742481	64.481524	33.70719	36.689004	10.658537	16.967742	64.25	78.34375	34.0	63.813793
3	66.961771	39.076923	73.608726	66.775003	59.742481	64.481524	33.70719	36.689004	17.902439	32.935484	61.50	81.43750	40.6	50.613793
4	66.961771	31.461538	73.608726	66.775003	59.742481	64.481524	33.70719	36.689004	17.902439	32.935484	61.50	81.43750	40.6	50.613793

plt.figure(figsize=(15,5))
plt.scatter(df1["rating _norm"], df1["year_temp_norm"],facecolors='blue', edgecolors='black', marker="o")
plt.title("rating _norm vs. year_temp_norm")
plt.ylabel("rating _norm")
plt.xlabel("year_temp_norm")
plt.grid(True)

# Show plot
sns.set()
plt.show()

png

plt.figure(figsize=(15,5))
plt.scatter(df1["rating _norm"], df1["max_temp_norm"],facecolors='blue', edgecolors='black', marker="o")
plt.title("rating _norm vs. max_temp_norm")
plt.ylabel("rating _norm")
plt.xlabel("max_temp_norm")
plt.grid(True)

# Show plot
sns.set()
plt.show()

png

plt.figure(figsize=(15,5))
plt.scatter(df1["rating _norm"], df1["max_min_norm"],facecolors='blue', edgecolors='black', marker="o")
plt.title("rating _norm vs. max_min_norm")
plt.ylabel("rating _norm")
plt.xlabel("max_min_norm")
plt.grid(True)

# Show plot
sns.set()
plt.show()

png

plt.figure(figsize=(15,5))
plt.scatter(df1["rating _norm"], df1["humidity_norm"],facecolors='blue', edgecolors='black', marker="o")
plt.title("rating _norm vs. humidity_norm")
plt.ylabel("rating _norm")
plt.xlabel("humidity_norm")
plt.grid(True)

# Show plot
sns.set()
plt.show()

png

plt.figure(figsize=(15,5))
plt.scatter(df1["rating _norm"], df1["precip_norm"],facecolors='blue', edgecolors='black', marker="o")
plt.title("rating _norm vs. precip_norm")
plt.ylabel("rating _norm")
plt.xlabel("precip_norm")
plt.grid(True)

# Show plot
sns.set()
plt.show()

png

plt.figure(figsize=(15,5))
plt.scatter(df1["rating _norm"], df1["carbon_norm"], facecolors='blue', edgecolors='black',marker="o")
plt.title("rating _norm vs. carbon_norm")
plt.ylabel("rating _norm")
plt.xlabel("carbon_norm")
plt.grid(True)

# Show plot
sns.set()
plt.show()

png

plt.figure(figsize=(15,5))
plt.scatter(df1["rating _norm"], df1["ph_norm"],facecolors='blue', edgecolors='black', marker="o")
plt.title("rating _norm vs. ph_norm")
plt.ylabel("rating _norm")
plt.xlabel("ph_norm")
plt.grid(True)

# Show plot
sns.set()
plt.show()

png

plt.figure(figsize=(15,5))
plt.scatter(df1["rating _norm"], df1["waw_norm"], facecolors='blue', edgecolors='black',marker="o")
plt.title("rating _norm vs. waw_norm")
plt.ylabel("rating _norm")
plt.xlabel("waw_norm")
plt.grid(True)

# Show plot
sns.set()
plt.show()

png

df2 = df1[["rating _norm","year_temp_norm"]]
sns.pairplot(df2, kind="reg")
plt.show()

png

# df3 = df1[["rating _norm","ph_norm"]]
# sns.pairplot(df3, kind="reg")
# plt.show()
sns.lmplot(x="ph_norm", y="rating _norm", data=df1[["rating _norm","ph_norm"]], x_jitter=.05)
plt.show()

png

sns.lmplot(x="humidity_norm", y="rating _norm", data=df1[["rating _norm","humidity_norm"]], x_jitter=.05)
plt.show()

png

sns.jointplot(x="ph_norm", y="rating _norm", data=df1[["rating _norm","ph_norm"]], kind="reg")
plt.show()

png

sns.lmplot(x='humidity_norm', y="rating _norm", data=df1[["rating _norm",'humidity_norm']], x_jitter=.05)
plt.show()

png

sns.jointplot(x="humidity_norm", y="rating _norm", data=df1[["rating _norm","humidity_norm"]], kind="reg")
plt.show()

png

Analysis done with the inclusion of a “not so ideal” environment - the desert

Here you can see that the Diurnal Temp Variance during the growing season rather than the entire year have the strongest correlation, followed by Humidity.

These actually are not independent, when you see the mapped locations you will notice the prevalence of coastal locations.

Large bodies of water can moderate temperature effects on surrounding areas. Such bodies of water have a large heat storage capability that can extend the growing season as well as minimize mid-winter temperatures enough to prevent vine damage.

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

filename = 'CopyofNormalizedWineCorrelations.xlsx'
wine = pd.read_excel(filename)
wine.head()

	iso3	appell_city	appell_country	appell_region	appell_state	lat	lon	lat.1	lon.1	price	...	clay_correl	pH	ph_norm	ph_correl	water_at_withering	waw_norm	waw_correl	bulk_density	bulk_norm	bulk_correl
0	ARG	Argentina	Argentina	Mendoza Province	Mendoza Province	-32.889625	-68.852687	-32.889625	-68.852687	18	...	0.063138	77	78.34375	-0.178641	17	34.0	0.157543	1489	66.089655	-0.029926
1	TEST	TEST	TEST	DESERT	TEST	35.011000	-115.473460	35.011000	-115.473460	0	...	NaN	80	87.62500	NaN	15	20.8	NaN	1591	89.303448	NaN
2	ARG	Mendoza Province	Argentina	Luján de Cuyo	Mendoza Province	-33.039104	-68.879864	-33.039104	-68.879864	42	...	NaN	78	81.43750	NaN	17	34.0	NaN	1518	72.689655	NaN
3	ARG	Mendoza Province	Argentina	Mendoza	Mendoza Province	-32.889459	-68.845839	-32.889459	-68.845839	10	...	NaN	77	78.34375	NaN	17	34.0	NaN	1479	63.813793	NaN
4	ARG	Mendoza Province	Argentina	Uco Valley	Mendoza Province	-33.600036	-69.282703	-33.600036	-69.282703	15	...	NaN	78	81.43750	NaN	18	40.6	NaN	1421	50.613793	NaN

5 rows × 105 columns

df = wine[['growing_temp_correl','year_temp_correl','max_temp_correl','min_temp_correl','max_min_correl',
      'hum_correl','precip_correl','carbon_correl','cec_correl','clay_correl','ph_correl',
      'waw_correl','bulk_correl']]
new_df = df[0:1]
new_df

	growing_temp_correl	year_temp_correl	max_temp_correl	min_temp_correl	max_min_correl	hum_correl	precip_correl	carbon_correl	cec_correl	clay_correl	ph_correl	waw_correl	bulk_correl
0	-0.441547	-0.403961	-0.230296	0.08805	-0.244262	0.328747	0.146282	0.080658	0.056495	0.063138	-0.178641	0.157543	-0.029926

new_df.plot(kind='bar',figsize=(15,5))
plt.title("Correlation coefficient Comparison")
plt.ylabel("Correlation coefficient")
plt.xlabel("Factor")
plt.show()

png

df1 = wine[['rating _norm','growing_temp_norm','year_temp_norm','max_temp_norm','min_temp_norm','max_min_norm',
      'humidity_norm','precip_norm','carbon_norm','cec_norm','clay_norm','ph_norm','waw_norm','bulk_norm']]
df1.head()

	rating _norm	growing_temp_norm	year_temp_norm	max_temp_norm	min_temp_norm	max_min_norm	humidity_norm	precip_norm	carbon_norm	cec_norm	clay_norm	ph_norm	waw_norm	bulk_norm
0	89.468085	26.573825	31.841776	53.176157	59.742481	48.879743	59.935612	36.689004	10.658537	10.580645	67.00	78.34375	34.0	66.089655
1	1.000000	100.000000	100.000000	100.000000	21.283835	100.000000	1.000000	5.118899	1.000000	29.741935	42.25	87.62500	20.8	89.303448
2	92.627660	26.573825	31.841776	53.176157	59.742481	48.879743	59.935612	36.689004	8.243902	10.580645	64.25	81.43750	34.0	72.689655
3	90.521277	26.573825	31.841776	53.176157	59.742481	48.879743	59.935612	36.689004	10.658537	16.967742	64.25	78.34375	34.0	63.813793
4	91.574468	26.573825	31.841776	53.176157	59.742481	48.879743	59.935612	36.689004	17.902439	32.935484	61.50	81.43750	40.6	50.613793

plt.figure(figsize=(15,5))
plt.scatter(df1["rating _norm"], df1["max_temp_norm"], marker="o")
plt.title("rating _norm vs. max_temp_norm")
plt.ylabel("rating _norm")
plt.xlabel("max_temp_norm")
plt.grid(True)

# Show plot
sns.set()
plt.show()

png

plt.figure(figsize=(15,5))
plt.scatter(df1["rating _norm"], df1["max_min_norm"], marker="o")
plt.title("rating _norm vs. max_min_norm")
plt.ylabel("rating _norm")
plt.xlabel("max_min_norm")
plt.grid(True)

# Show plot
sns.set()
plt.show()

png

plt.figure(figsize=(15,5))
plt.scatter(df1["rating _norm"], df1["humidity_norm"], marker="o")
plt.title("rating _norm vs. humidity_norm")
plt.ylabel("rating _norm")
plt.xlabel("humidity_norm")
plt.grid(True)

# Show plot
sns.set()
plt.show()

png

plt.figure(figsize=(15,5))
plt.scatter(df1["rating _norm"], df1["precip_norm"], marker="o")
plt.title("rating _norm vs. precip_norm")
plt.ylabel("rating _norm")
plt.xlabel("precip_norm")
plt.grid(True)

# Show plot
sns.set()
plt.show()

png

plt.figure(figsize=(15,5))
plt.scatter(df1["rating _norm"], df1["carbon_norm"], marker="o")
plt.title("rating _norm vs. carbon_norm")
plt.ylabel("rating _norm")
plt.xlabel("carbon_norm")
plt.grid(True)

# Show plot
sns.set()
plt.show()

png

plt.figure(figsize=(15,5))
plt.scatter(df1["rating _norm"], df1["ph_norm"], marker="o")
plt.title("rating _norm vs. ph_norm")
plt.ylabel("rating _norm")
plt.xlabel("ph_norm")
plt.grid(True)

# Show plot
sns.set()
plt.show()

png

plt.figure(figsize=(15,5))
plt.scatter(df1["rating _norm"], df1["waw_norm"], marker="o")
plt.title("rating _norm vs. waw_norm")
plt.ylabel("rating _norm")
plt.xlabel("waw_norm")
plt.grid(True)

# Show plot
sns.set()
plt.show()

png

sns.lmplot(x="growing_temp_norm", y="rating _norm", data=df1[["rating _norm","growing_temp_norm"]], x_jitter=.05)
plt.show()

png

sns.lmplot(x="humidity_norm", y="rating _norm", data=df1[["rating _norm","humidity_norm"]], x_jitter=.05)
plt.show()

png

Wine truly has a goldilocks zone for all of the factors. As mentioned in the website, Humidity extremes can be very damaging to the vine and fruit development, but a moderate amount combined with other factors can produce exceptional wine.

The same is true for Diurnal temp var. No temp variance will leave to under developed flavors, and just poor wine. The stress of temp variation is KEY to Producing the sugars needed to create delicious and alcoholic wines.

The web application touches on ‘stress’ in temperature and water availability, but to reiterate, a balance of soil and climate stresses are what truly build the character of this wine. This deserves a modification to the old addage - “it grows sweeter with time”” to - “it grows better with stress”.