Project #2 for USC's data bootcamp. This repo stores a Flask app utilizing various Javascript Libraries such as D3, and Leaflet to show optimal locations worldwide for the production of Sauvignon Blanc wine.
Discover which environmental properties are prime for Sauvignon Blanc grape cultivation and develop a web application for users to interact with and gather relevant data on a specified location.
Sauvignon blanc is a green-skinned grape variety that originates from the Bordeaux region of France. The grape most likely gets its name from the French words sauvage (“wild”) and blanc (“white”) due to its early origins as an indigenous grape in South West France.
Sauvignon blanc is currently widely cultivated in France, Chile, Canada, Australia, New Zealand, South Africa, the states of Washington and California in the US. It can develop desirable flavors in both cool and warm environments. Such as, flavors of grass, green bell peppers, tropical fruit, and floral notes It buds early, grows quickly, and can produce several harvest within the same year.
SB has a Short fermentation cycle, that’s why we are only reviewing 2016 data
According to Wine Economist, Sauvignon Blanc is one of the most popular and most profitable wines in the world
Data was collected from these 2 APIs
and scraped from:
As you will notice in the first jupyter notebook analysis down below, initially correlations were not strong because the data was narrow. It only represents places that are known for moderate to great wine. By passing in a desert location, indicated in the second analysis, and using climate and soil components from a location that is far from ideal, it accentuated the correlation.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
filename = 'Normalized Wine Correlations and Dropped.xlsx'
wine = pd.read_excel(filename)
wine.head()
iso3 | appell_city | appell_country | appell_region | appell_state | lat | lon | lat.1 | lon.1 | price | ... | clay_correl | pH | ph_norm | ph_correl | water_at_withering | waw_norm | waw_correl | bulk_density | bulk_norm | bulk_correl | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | ARG | Argentina | Argentina | Mendoza Province | Mendoza Province | -32.889625 | -68.852687 | -32.889625 | -68.852687 | 18 | ... | -0.027127 | 77 | 78.34375 | -0.211055 | 17 | 34.0 | 0.092602 | 1489 | 66.089655 | 0.032192 |
1 | ARG | Mendoza Province | Argentina | Luján de Cuyo | Mendoza Province | -33.039104 | -68.879864 | -33.039104 | -68.879864 | 42 | ... | NaN | 78 | 81.43750 | NaN | 17 | 34.0 | NaN | 1518 | 72.689655 | NaN |
2 | ARG | Mendoza Province | Argentina | Mendoza | Mendoza Province | -32.889459 | -68.845839 | -32.889459 | -68.845839 | 10 | ... | NaN | 77 | 78.34375 | NaN | 17 | 34.0 | NaN | 1479 | 63.813793 | NaN |
3 | ARG | Mendoza Province | Argentina | Uco Valley | Mendoza Province | -33.600036 | -69.282703 | -33.600036 | -69.282703 | 15 | ... | NaN | 78 | 81.43750 | NaN | 18 | 40.6 | NaN | 1421 | 50.613793 | NaN |
4 | ARG | Mendoza Province | Argentina | Uco Valley | Mendoza Province | -33.600036 | -69.282703 | -33.600036 | -69.282703 | 10 | ... | NaN | 78 | 81.43750 | NaN | 18 | 40.6 | NaN | 1421 | 50.613793 | NaN |
5 rows × 105 columns
df = wine[['growing_temp_correl','year_temp_correl','max_temp_correl','min_temp_correl','max_min_correl',
'hum_correl','precip_correl','carbon_correl','cec_correl','clay_correl','ph_correl',
'waw_correl','bulk_correl']]
new_df = df[0:1]
new_df
growing_temp_correl | year_temp_correl | max_temp_correl | min_temp_correl | max_min_correl | hum_correl | precip_correl | carbon_correl | cec_correl | clay_correl | ph_correl | waw_correl | bulk_correl | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | -0.134643 | -0.044223 | -0.152918 | -0.02064 | -0.104366 | 0.148011 | 0.116698 | 0.047044 | 0.044261 | -0.027127 | -0.211055 | 0.092602 | 0.032192 |
new_df.plot(kind='bar',figsize=(15,5))
plt.title("Correlation coefficient Comparison")
plt.ylabel("Correlation coefficient")
plt.xlabel("Factor")
plt.show()
df1 = wine[['growing_temp_norm','rating _norm','year_temp_norm','max_temp_norm','min_temp_norm','max_min_norm',
'humidity_norm','precip_norm','carbon_norm','cec_norm','clay_norm','ph_norm','waw_norm','bulk_norm']]
df1.head()
growing_temp_norm | rating _norm | year_temp_norm | max_temp_norm | min_temp_norm | max_min_norm | humidity_norm | precip_norm | carbon_norm | cec_norm | clay_norm | ph_norm | waw_norm | bulk_norm | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 66.961771 | 23.846154 | 73.608726 | 66.775003 | 59.742481 | 64.481524 | 33.70719 | 36.689004 | 10.658537 | 10.580645 | 67.00 | 78.34375 | 34.0 | 66.089655 |
1 | 66.961771 | 46.692308 | 73.608726 | 66.775003 | 59.742481 | 64.481524 | 33.70719 | 36.689004 | 8.243902 | 10.580645 | 64.25 | 81.43750 | 34.0 | 72.689655 |
2 | 66.961771 | 31.461538 | 73.608726 | 66.775003 | 59.742481 | 64.481524 | 33.70719 | 36.689004 | 10.658537 | 16.967742 | 64.25 | 78.34375 | 34.0 | 63.813793 |
3 | 66.961771 | 39.076923 | 73.608726 | 66.775003 | 59.742481 | 64.481524 | 33.70719 | 36.689004 | 17.902439 | 32.935484 | 61.50 | 81.43750 | 40.6 | 50.613793 |
4 | 66.961771 | 31.461538 | 73.608726 | 66.775003 | 59.742481 | 64.481524 | 33.70719 | 36.689004 | 17.902439 | 32.935484 | 61.50 | 81.43750 | 40.6 | 50.613793 |
plt.figure(figsize=(15,5))
plt.scatter(df1["rating _norm"], df1["year_temp_norm"],facecolors='blue', edgecolors='black', marker="o")
plt.title("rating _norm vs. year_temp_norm")
plt.ylabel("rating _norm")
plt.xlabel("year_temp_norm")
plt.grid(True)
# Show plot
sns.set()
plt.show()
plt.figure(figsize=(15,5))
plt.scatter(df1["rating _norm"], df1["max_temp_norm"],facecolors='blue', edgecolors='black', marker="o")
plt.title("rating _norm vs. max_temp_norm")
plt.ylabel("rating _norm")
plt.xlabel("max_temp_norm")
plt.grid(True)
# Show plot
sns.set()
plt.show()
plt.figure(figsize=(15,5))
plt.scatter(df1["rating _norm"], df1["max_min_norm"],facecolors='blue', edgecolors='black', marker="o")
plt.title("rating _norm vs. max_min_norm")
plt.ylabel("rating _norm")
plt.xlabel("max_min_norm")
plt.grid(True)
# Show plot
sns.set()
plt.show()
plt.figure(figsize=(15,5))
plt.scatter(df1["rating _norm"], df1["humidity_norm"],facecolors='blue', edgecolors='black', marker="o")
plt.title("rating _norm vs. humidity_norm")
plt.ylabel("rating _norm")
plt.xlabel("humidity_norm")
plt.grid(True)
# Show plot
sns.set()
plt.show()
plt.figure(figsize=(15,5))
plt.scatter(df1["rating _norm"], df1["precip_norm"],facecolors='blue', edgecolors='black', marker="o")
plt.title("rating _norm vs. precip_norm")
plt.ylabel("rating _norm")
plt.xlabel("precip_norm")
plt.grid(True)
# Show plot
sns.set()
plt.show()
plt.figure(figsize=(15,5))
plt.scatter(df1["rating _norm"], df1["carbon_norm"], facecolors='blue', edgecolors='black',marker="o")
plt.title("rating _norm vs. carbon_norm")
plt.ylabel("rating _norm")
plt.xlabel("carbon_norm")
plt.grid(True)
# Show plot
sns.set()
plt.show()
plt.figure(figsize=(15,5))
plt.scatter(df1["rating _norm"], df1["ph_norm"],facecolors='blue', edgecolors='black', marker="o")
plt.title("rating _norm vs. ph_norm")
plt.ylabel("rating _norm")
plt.xlabel("ph_norm")
plt.grid(True)
# Show plot
sns.set()
plt.show()
plt.figure(figsize=(15,5))
plt.scatter(df1["rating _norm"], df1["waw_norm"], facecolors='blue', edgecolors='black',marker="o")
plt.title("rating _norm vs. waw_norm")
plt.ylabel("rating _norm")
plt.xlabel("waw_norm")
plt.grid(True)
# Show plot
sns.set()
plt.show()
df2 = df1[["rating _norm","year_temp_norm"]]
sns.pairplot(df2, kind="reg")
plt.show()
# df3 = df1[["rating _norm","ph_norm"]]
# sns.pairplot(df3, kind="reg")
# plt.show()
sns.lmplot(x="ph_norm", y="rating _norm", data=df1[["rating _norm","ph_norm"]], x_jitter=.05)
plt.show()
sns.lmplot(x="humidity_norm", y="rating _norm", data=df1[["rating _norm","humidity_norm"]], x_jitter=.05)
plt.show()
sns.jointplot(x="ph_norm", y="rating _norm", data=df1[["rating _norm","ph_norm"]], kind="reg")
plt.show()
sns.lmplot(x='humidity_norm', y="rating _norm", data=df1[["rating _norm",'humidity_norm']], x_jitter=.05)
plt.show()
sns.jointplot(x="humidity_norm", y="rating _norm", data=df1[["rating _norm","humidity_norm"]], kind="reg")
plt.show()
Here you can see that the Diurnal Temp Variance during the growing season rather than the entire year have the strongest correlation, followed by Humidity.
These actually are not independent, when you see the mapped locations you will notice the prevalence of coastal locations.
Large bodies of water can moderate temperature effects on surrounding areas. Such bodies of water have a large heat storage capability that can extend the growing season as well as minimize mid-winter temperatures enough to prevent vine damage.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
filename = 'CopyofNormalizedWineCorrelations.xlsx'
wine = pd.read_excel(filename)
wine.head()
iso3 | appell_city | appell_country | appell_region | appell_state | lat | lon | lat.1 | lon.1 | price | ... | clay_correl | pH | ph_norm | ph_correl | water_at_withering | waw_norm | waw_correl | bulk_density | bulk_norm | bulk_correl | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | ARG | Argentina | Argentina | Mendoza Province | Mendoza Province | -32.889625 | -68.852687 | -32.889625 | -68.852687 | 18 | ... | 0.063138 | 77 | 78.34375 | -0.178641 | 17 | 34.0 | 0.157543 | 1489 | 66.089655 | -0.029926 |
1 | TEST | TEST | TEST | DESERT | TEST | 35.011000 | -115.473460 | 35.011000 | -115.473460 | 0 | ... | NaN | 80 | 87.62500 | NaN | 15 | 20.8 | NaN | 1591 | 89.303448 | NaN |
2 | ARG | Mendoza Province | Argentina | Luján de Cuyo | Mendoza Province | -33.039104 | -68.879864 | -33.039104 | -68.879864 | 42 | ... | NaN | 78 | 81.43750 | NaN | 17 | 34.0 | NaN | 1518 | 72.689655 | NaN |
3 | ARG | Mendoza Province | Argentina | Mendoza | Mendoza Province | -32.889459 | -68.845839 | -32.889459 | -68.845839 | 10 | ... | NaN | 77 | 78.34375 | NaN | 17 | 34.0 | NaN | 1479 | 63.813793 | NaN |
4 | ARG | Mendoza Province | Argentina | Uco Valley | Mendoza Province | -33.600036 | -69.282703 | -33.600036 | -69.282703 | 15 | ... | NaN | 78 | 81.43750 | NaN | 18 | 40.6 | NaN | 1421 | 50.613793 | NaN |
5 rows × 105 columns
df = wine[['growing_temp_correl','year_temp_correl','max_temp_correl','min_temp_correl','max_min_correl',
'hum_correl','precip_correl','carbon_correl','cec_correl','clay_correl','ph_correl',
'waw_correl','bulk_correl']]
new_df = df[0:1]
new_df
growing_temp_correl | year_temp_correl | max_temp_correl | min_temp_correl | max_min_correl | hum_correl | precip_correl | carbon_correl | cec_correl | clay_correl | ph_correl | waw_correl | bulk_correl | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | -0.441547 | -0.403961 | -0.230296 | 0.08805 | -0.244262 | 0.328747 | 0.146282 | 0.080658 | 0.056495 | 0.063138 | -0.178641 | 0.157543 | -0.029926 |
new_df.plot(kind='bar',figsize=(15,5))
plt.title("Correlation coefficient Comparison")
plt.ylabel("Correlation coefficient")
plt.xlabel("Factor")
plt.show()
df1 = wine[['rating _norm','growing_temp_norm','year_temp_norm','max_temp_norm','min_temp_norm','max_min_norm',
'humidity_norm','precip_norm','carbon_norm','cec_norm','clay_norm','ph_norm','waw_norm','bulk_norm']]
df1.head()
rating _norm | growing_temp_norm | year_temp_norm | max_temp_norm | min_temp_norm | max_min_norm | humidity_norm | precip_norm | carbon_norm | cec_norm | clay_norm | ph_norm | waw_norm | bulk_norm | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 89.468085 | 26.573825 | 31.841776 | 53.176157 | 59.742481 | 48.879743 | 59.935612 | 36.689004 | 10.658537 | 10.580645 | 67.00 | 78.34375 | 34.0 | 66.089655 |
1 | 1.000000 | 100.000000 | 100.000000 | 100.000000 | 21.283835 | 100.000000 | 1.000000 | 5.118899 | 1.000000 | 29.741935 | 42.25 | 87.62500 | 20.8 | 89.303448 |
2 | 92.627660 | 26.573825 | 31.841776 | 53.176157 | 59.742481 | 48.879743 | 59.935612 | 36.689004 | 8.243902 | 10.580645 | 64.25 | 81.43750 | 34.0 | 72.689655 |
3 | 90.521277 | 26.573825 | 31.841776 | 53.176157 | 59.742481 | 48.879743 | 59.935612 | 36.689004 | 10.658537 | 16.967742 | 64.25 | 78.34375 | 34.0 | 63.813793 |
4 | 91.574468 | 26.573825 | 31.841776 | 53.176157 | 59.742481 | 48.879743 | 59.935612 | 36.689004 | 17.902439 | 32.935484 | 61.50 | 81.43750 | 40.6 | 50.613793 |
plt.figure(figsize=(15,5))
plt.scatter(df1["rating _norm"], df1["max_temp_norm"], marker="o")
plt.title("rating _norm vs. max_temp_norm")
plt.ylabel("rating _norm")
plt.xlabel("max_temp_norm")
plt.grid(True)
# Show plot
sns.set()
plt.show()
plt.figure(figsize=(15,5))
plt.scatter(df1["rating _norm"], df1["max_min_norm"], marker="o")
plt.title("rating _norm vs. max_min_norm")
plt.ylabel("rating _norm")
plt.xlabel("max_min_norm")
plt.grid(True)
# Show plot
sns.set()
plt.show()
plt.figure(figsize=(15,5))
plt.scatter(df1["rating _norm"], df1["humidity_norm"], marker="o")
plt.title("rating _norm vs. humidity_norm")
plt.ylabel("rating _norm")
plt.xlabel("humidity_norm")
plt.grid(True)
# Show plot
sns.set()
plt.show()
plt.figure(figsize=(15,5))
plt.scatter(df1["rating _norm"], df1["precip_norm"], marker="o")
plt.title("rating _norm vs. precip_norm")
plt.ylabel("rating _norm")
plt.xlabel("precip_norm")
plt.grid(True)
# Show plot
sns.set()
plt.show()
plt.figure(figsize=(15,5))
plt.scatter(df1["rating _norm"], df1["carbon_norm"], marker="o")
plt.title("rating _norm vs. carbon_norm")
plt.ylabel("rating _norm")
plt.xlabel("carbon_norm")
plt.grid(True)
# Show plot
sns.set()
plt.show()
plt.figure(figsize=(15,5))
plt.scatter(df1["rating _norm"], df1["ph_norm"], marker="o")
plt.title("rating _norm vs. ph_norm")
plt.ylabel("rating _norm")
plt.xlabel("ph_norm")
plt.grid(True)
# Show plot
sns.set()
plt.show()
plt.figure(figsize=(15,5))
plt.scatter(df1["rating _norm"], df1["waw_norm"], marker="o")
plt.title("rating _norm vs. waw_norm")
plt.ylabel("rating _norm")
plt.xlabel("waw_norm")
plt.grid(True)
# Show plot
sns.set()
plt.show()
sns.lmplot(x="growing_temp_norm", y="rating _norm", data=df1[["rating _norm","growing_temp_norm"]], x_jitter=.05)
plt.show()
sns.lmplot(x="humidity_norm", y="rating _norm", data=df1[["rating _norm","humidity_norm"]], x_jitter=.05)
plt.show()
Wine truly has a goldilocks zone for all of the factors. As mentioned in the website, Humidity extremes can be very damaging to the vine and fruit development, but a moderate amount combined with other factors can produce exceptional wine.
The same is true for Diurnal temp var. No temp variance will leave to under developed flavors, and just poor wine. The stress of temp variation is KEY to Producing the sugars needed to create delicious and alcoholic wines.
The web application touches on ‘stress’ in temperature and water availability, but to reiterate, a balance of soil and climate stresses are what truly build the character of this wine. This deserves a modification to the old addage - “it grows sweeter with time”” to - “it grows better with stress”.