new_new_world

Project #2 for USC's data bootcamp. This repo stores a Flask app utilizing various Javascript Libraries such as D3, and Leaflet to show optimal locations worldwide for the production of Sauvignon Blanc wine.

View the Project on GitHub buitron/new_new_world

New New World

Thirsty Thursday (via GIPHY)

Objective

Discover which environmental properties are prime for Sauvignon Blanc grape cultivation and develop a web application for users to interact with and gather relevant data on a specified location.

Background

Sauvignon blanc is a green-skinned grape variety that originates from the Bordeaux region of France. The grape most likely gets its name from the French words sauvage (“wild”) and blanc (“white”) due to its early origins as an indigenous grape in South West France.

Sauvignon blanc is currently widely cultivated in France, Chile, Canada, Australia, New Zealand, South Africa, the states of Washington and California in the US. It can develop desirable flavors in both cool and warm environments. Such as, flavors of grass, green bell peppers, tropical fruit, and floral notes It buds early, grows quickly, and can produce several harvest within the same year.

SB has a Short fermentation cycle, that’s why we are only reviewing 2016 data

Why Sauvignon Blanc (SB)?

According to Wine Economist, Sauvignon Blanc is one of the most popular and most profitable wines in the world

Questions this application will attempt to answer:



Step 1 - Data Gathering

Data was collected from these 2 APIs

soilgrids
worldweather

and scraped from:

winemag

Step 2 - Data Analysis

Analysis done strictly on vineyard environments rated by Wine Enthusiast

As you will notice in the first jupyter notebook analysis down below, initially correlations were not strong because the data was narrow. It only represents places that are known for moderate to great wine. By passing in a desert location, indicated in the second analysis, and using climate and soil components from a location that is far from ideal, it accentuated the correlation.

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
filename = 'Normalized Wine Correlations and Dropped.xlsx'
wine = pd.read_excel(filename)
wine.head()
iso3 appell_city appell_country appell_region appell_state lat lon lat.1 lon.1 price ... clay_correl pH ph_norm ph_correl water_at_withering waw_norm waw_correl bulk_density bulk_norm bulk_correl
0 ARG Argentina Argentina Mendoza Province Mendoza Province -32.889625 -68.852687 -32.889625 -68.852687 18 ... -0.027127 77 78.34375 -0.211055 17 34.0 0.092602 1489 66.089655 0.032192
1 ARG Mendoza Province Argentina Luján de Cuyo Mendoza Province -33.039104 -68.879864 -33.039104 -68.879864 42 ... NaN 78 81.43750 NaN 17 34.0 NaN 1518 72.689655 NaN
2 ARG Mendoza Province Argentina Mendoza Mendoza Province -32.889459 -68.845839 -32.889459 -68.845839 10 ... NaN 77 78.34375 NaN 17 34.0 NaN 1479 63.813793 NaN
3 ARG Mendoza Province Argentina Uco Valley Mendoza Province -33.600036 -69.282703 -33.600036 -69.282703 15 ... NaN 78 81.43750 NaN 18 40.6 NaN 1421 50.613793 NaN
4 ARG Mendoza Province Argentina Uco Valley Mendoza Province -33.600036 -69.282703 -33.600036 -69.282703 10 ... NaN 78 81.43750 NaN 18 40.6 NaN 1421 50.613793 NaN

5 rows × 105 columns

df = wine[['growing_temp_correl','year_temp_correl','max_temp_correl','min_temp_correl','max_min_correl',
      'hum_correl','precip_correl','carbon_correl','cec_correl','clay_correl','ph_correl',
      'waw_correl','bulk_correl']]
new_df = df[0:1]
new_df
growing_temp_correl year_temp_correl max_temp_correl min_temp_correl max_min_correl hum_correl precip_correl carbon_correl cec_correl clay_correl ph_correl waw_correl bulk_correl
0 -0.134643 -0.044223 -0.152918 -0.02064 -0.104366 0.148011 0.116698 0.047044 0.044261 -0.027127 -0.211055 0.092602 0.032192
new_df.plot(kind='bar',figsize=(15,5))
plt.title("Correlation coefficient Comparison")
plt.ylabel("Correlation coefficient")
plt.xlabel("Factor")
plt.show()

png

df1 = wine[['growing_temp_norm','rating _norm','year_temp_norm','max_temp_norm','min_temp_norm','max_min_norm',
      'humidity_norm','precip_norm','carbon_norm','cec_norm','clay_norm','ph_norm','waw_norm','bulk_norm']]
df1.head()
growing_temp_norm rating _norm year_temp_norm max_temp_norm min_temp_norm max_min_norm humidity_norm precip_norm carbon_norm cec_norm clay_norm ph_norm waw_norm bulk_norm
0 66.961771 23.846154 73.608726 66.775003 59.742481 64.481524 33.70719 36.689004 10.658537 10.580645 67.00 78.34375 34.0 66.089655
1 66.961771 46.692308 73.608726 66.775003 59.742481 64.481524 33.70719 36.689004 8.243902 10.580645 64.25 81.43750 34.0 72.689655
2 66.961771 31.461538 73.608726 66.775003 59.742481 64.481524 33.70719 36.689004 10.658537 16.967742 64.25 78.34375 34.0 63.813793
3 66.961771 39.076923 73.608726 66.775003 59.742481 64.481524 33.70719 36.689004 17.902439 32.935484 61.50 81.43750 40.6 50.613793
4 66.961771 31.461538 73.608726 66.775003 59.742481 64.481524 33.70719 36.689004 17.902439 32.935484 61.50 81.43750 40.6 50.613793
plt.figure(figsize=(15,5))
plt.scatter(df1["rating _norm"], df1["year_temp_norm"],facecolors='blue', edgecolors='black', marker="o")
plt.title("rating _norm vs. year_temp_norm")
plt.ylabel("rating _norm")
plt.xlabel("year_temp_norm")
plt.grid(True)

# Show plot
sns.set()
plt.show()

png

plt.figure(figsize=(15,5))
plt.scatter(df1["rating _norm"], df1["max_temp_norm"],facecolors='blue', edgecolors='black', marker="o")
plt.title("rating _norm vs. max_temp_norm")
plt.ylabel("rating _norm")
plt.xlabel("max_temp_norm")
plt.grid(True)

# Show plot
sns.set()
plt.show()

png

plt.figure(figsize=(15,5))
plt.scatter(df1["rating _norm"], df1["max_min_norm"],facecolors='blue', edgecolors='black', marker="o")
plt.title("rating _norm vs. max_min_norm")
plt.ylabel("rating _norm")
plt.xlabel("max_min_norm")
plt.grid(True)

# Show plot
sns.set()
plt.show()

png

plt.figure(figsize=(15,5))
plt.scatter(df1["rating _norm"], df1["humidity_norm"],facecolors='blue', edgecolors='black', marker="o")
plt.title("rating _norm vs. humidity_norm")
plt.ylabel("rating _norm")
plt.xlabel("humidity_norm")
plt.grid(True)

# Show plot
sns.set()
plt.show()

png

plt.figure(figsize=(15,5))
plt.scatter(df1["rating _norm"], df1["precip_norm"],facecolors='blue', edgecolors='black', marker="o")
plt.title("rating _norm vs. precip_norm")
plt.ylabel("rating _norm")
plt.xlabel("precip_norm")
plt.grid(True)

# Show plot
sns.set()
plt.show()

png

plt.figure(figsize=(15,5))
plt.scatter(df1["rating _norm"], df1["carbon_norm"], facecolors='blue', edgecolors='black',marker="o")
plt.title("rating _norm vs. carbon_norm")
plt.ylabel("rating _norm")
plt.xlabel("carbon_norm")
plt.grid(True)

# Show plot
sns.set()
plt.show()

png

plt.figure(figsize=(15,5))
plt.scatter(df1["rating _norm"], df1["ph_norm"],facecolors='blue', edgecolors='black', marker="o")
plt.title("rating _norm vs. ph_norm")
plt.ylabel("rating _norm")
plt.xlabel("ph_norm")
plt.grid(True)

# Show plot
sns.set()
plt.show()

png

plt.figure(figsize=(15,5))
plt.scatter(df1["rating _norm"], df1["waw_norm"], facecolors='blue', edgecolors='black',marker="o")
plt.title("rating _norm vs. waw_norm")
plt.ylabel("rating _norm")
plt.xlabel("waw_norm")
plt.grid(True)

# Show plot
sns.set()
plt.show()

png

df2 = df1[["rating _norm","year_temp_norm"]]
sns.pairplot(df2, kind="reg")
plt.show()

png

# df3 = df1[["rating _norm","ph_norm"]]
# sns.pairplot(df3, kind="reg")
# plt.show()
sns.lmplot(x="ph_norm", y="rating _norm", data=df1[["rating _norm","ph_norm"]], x_jitter=.05)
plt.show()

png

sns.lmplot(x="humidity_norm", y="rating _norm", data=df1[["rating _norm","humidity_norm"]], x_jitter=.05)
plt.show()

png

sns.jointplot(x="ph_norm", y="rating _norm", data=df1[["rating _norm","ph_norm"]], kind="reg")
plt.show()

png

sns.lmplot(x='humidity_norm', y="rating _norm", data=df1[["rating _norm",'humidity_norm']], x_jitter=.05)
plt.show()

png

sns.jointplot(x="humidity_norm", y="rating _norm", data=df1[["rating _norm","humidity_norm"]], kind="reg")
plt.show()

png

Analysis done with the inclusion of a “not so ideal” environment - the desert

Here you can see that the Diurnal Temp Variance during the growing season rather than the entire year have the strongest correlation, followed by Humidity.

These actually are not independent, when you see the mapped locations you will notice the prevalence of coastal locations.

Large bodies of water can moderate temperature effects on surrounding areas. Such bodies of water have a large heat storage capability that can extend the growing season as well as minimize mid-winter temperatures enough to prevent vine damage.

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
filename = 'CopyofNormalizedWineCorrelations.xlsx'
wine = pd.read_excel(filename)
wine.head()
iso3 appell_city appell_country appell_region appell_state lat lon lat.1 lon.1 price ... clay_correl pH ph_norm ph_correl water_at_withering waw_norm waw_correl bulk_density bulk_norm bulk_correl
0 ARG Argentina Argentina Mendoza Province Mendoza Province -32.889625 -68.852687 -32.889625 -68.852687 18 ... 0.063138 77 78.34375 -0.178641 17 34.0 0.157543 1489 66.089655 -0.029926
1 TEST TEST TEST DESERT TEST 35.011000 -115.473460 35.011000 -115.473460 0 ... NaN 80 87.62500 NaN 15 20.8 NaN 1591 89.303448 NaN
2 ARG Mendoza Province Argentina Luján de Cuyo Mendoza Province -33.039104 -68.879864 -33.039104 -68.879864 42 ... NaN 78 81.43750 NaN 17 34.0 NaN 1518 72.689655 NaN
3 ARG Mendoza Province Argentina Mendoza Mendoza Province -32.889459 -68.845839 -32.889459 -68.845839 10 ... NaN 77 78.34375 NaN 17 34.0 NaN 1479 63.813793 NaN
4 ARG Mendoza Province Argentina Uco Valley Mendoza Province -33.600036 -69.282703 -33.600036 -69.282703 15 ... NaN 78 81.43750 NaN 18 40.6 NaN 1421 50.613793 NaN

5 rows × 105 columns

df = wine[['growing_temp_correl','year_temp_correl','max_temp_correl','min_temp_correl','max_min_correl',
      'hum_correl','precip_correl','carbon_correl','cec_correl','clay_correl','ph_correl',
      'waw_correl','bulk_correl']]
new_df = df[0:1]
new_df
growing_temp_correl year_temp_correl max_temp_correl min_temp_correl max_min_correl hum_correl precip_correl carbon_correl cec_correl clay_correl ph_correl waw_correl bulk_correl
0 -0.441547 -0.403961 -0.230296 0.08805 -0.244262 0.328747 0.146282 0.080658 0.056495 0.063138 -0.178641 0.157543 -0.029926
new_df.plot(kind='bar',figsize=(15,5))
plt.title("Correlation coefficient Comparison")
plt.ylabel("Correlation coefficient")
plt.xlabel("Factor")
plt.show()

png

df1 = wine[['rating _norm','growing_temp_norm','year_temp_norm','max_temp_norm','min_temp_norm','max_min_norm',
      'humidity_norm','precip_norm','carbon_norm','cec_norm','clay_norm','ph_norm','waw_norm','bulk_norm']]
df1.head()
rating _norm growing_temp_norm year_temp_norm max_temp_norm min_temp_norm max_min_norm humidity_norm precip_norm carbon_norm cec_norm clay_norm ph_norm waw_norm bulk_norm
0 89.468085 26.573825 31.841776 53.176157 59.742481 48.879743 59.935612 36.689004 10.658537 10.580645 67.00 78.34375 34.0 66.089655
1 1.000000 100.000000 100.000000 100.000000 21.283835 100.000000 1.000000 5.118899 1.000000 29.741935 42.25 87.62500 20.8 89.303448
2 92.627660 26.573825 31.841776 53.176157 59.742481 48.879743 59.935612 36.689004 8.243902 10.580645 64.25 81.43750 34.0 72.689655
3 90.521277 26.573825 31.841776 53.176157 59.742481 48.879743 59.935612 36.689004 10.658537 16.967742 64.25 78.34375 34.0 63.813793
4 91.574468 26.573825 31.841776 53.176157 59.742481 48.879743 59.935612 36.689004 17.902439 32.935484 61.50 81.43750 40.6 50.613793
plt.figure(figsize=(15,5))
plt.scatter(df1["rating _norm"], df1["max_temp_norm"], marker="o")
plt.title("rating _norm vs. max_temp_norm")
plt.ylabel("rating _norm")
plt.xlabel("max_temp_norm")
plt.grid(True)

# Show plot
sns.set()
plt.show()

png

plt.figure(figsize=(15,5))
plt.scatter(df1["rating _norm"], df1["max_min_norm"], marker="o")
plt.title("rating _norm vs. max_min_norm")
plt.ylabel("rating _norm")
plt.xlabel("max_min_norm")
plt.grid(True)

# Show plot
sns.set()
plt.show()

png

plt.figure(figsize=(15,5))
plt.scatter(df1["rating _norm"], df1["humidity_norm"], marker="o")
plt.title("rating _norm vs. humidity_norm")
plt.ylabel("rating _norm")
plt.xlabel("humidity_norm")
plt.grid(True)

# Show plot
sns.set()
plt.show()

png

plt.figure(figsize=(15,5))
plt.scatter(df1["rating _norm"], df1["precip_norm"], marker="o")
plt.title("rating _norm vs. precip_norm")
plt.ylabel("rating _norm")
plt.xlabel("precip_norm")
plt.grid(True)

# Show plot
sns.set()
plt.show()

png

plt.figure(figsize=(15,5))
plt.scatter(df1["rating _norm"], df1["carbon_norm"], marker="o")
plt.title("rating _norm vs. carbon_norm")
plt.ylabel("rating _norm")
plt.xlabel("carbon_norm")
plt.grid(True)

# Show plot
sns.set()
plt.show()

png

plt.figure(figsize=(15,5))
plt.scatter(df1["rating _norm"], df1["ph_norm"], marker="o")
plt.title("rating _norm vs. ph_norm")
plt.ylabel("rating _norm")
plt.xlabel("ph_norm")
plt.grid(True)

# Show plot
sns.set()
plt.show()

png

plt.figure(figsize=(15,5))
plt.scatter(df1["rating _norm"], df1["waw_norm"], marker="o")
plt.title("rating _norm vs. waw_norm")
plt.ylabel("rating _norm")
plt.xlabel("waw_norm")
plt.grid(True)

# Show plot
sns.set()
plt.show()

png

sns.lmplot(x="growing_temp_norm", y="rating _norm", data=df1[["rating _norm","growing_temp_norm"]], x_jitter=.05)
plt.show()

png

sns.lmplot(x="humidity_norm", y="rating _norm", data=df1[["rating _norm","humidity_norm"]], x_jitter=.05)
plt.show()

png

Wine truly has a goldilocks zone for all of the factors. As mentioned in the website, Humidity extremes can be very damaging to the vine and fruit development, but a moderate amount combined with other factors can produce exceptional wine.

The same is true for Diurnal temp var. No temp variance will leave to under developed flavors, and just poor wine. The stress of temp variation is KEY to Producing the sugars needed to create delicious and alcoholic wines.

The web application touches on ‘stress’ in temperature and water availability, but to reiterate, a balance of soil and climate stresses are what truly build the character of this wine. This deserves a modification to the old addage - “it grows sweeter with time”” to - “it grows better with stress”.

Step 3 - Application Build



Sample Page

page1 page2 page3 page4 page5 page6 page7 page8 page9 page10 page11

Live Heroku Web-application

Click HERE to play!