This is a continuation of my work on the codeacademy data science independant project #1: Watching the Stock Market.
- My first post on the subject explores how I put gathered the stock data.
- In my second post, I explored the data using basic sql queries (and put my sql code out into the world for the first time!)
After completing the basic challenges from my last post on the topic, there are intermediate and advanced challenges too while playing with these stocks data.
Let’s jump into the challenges!
Intermediate Challenges
1. Use aggregate functions to look at key statistics about the data (e.g., min, max, average).
SELECT ROUND(AVG(price), 2) AS average_price,
ROUND(MIN(price), 2) AS min_price,
ROUND(MAX(price), 2) AS max_price
FROM stocks;
average_price | min_price | max_price |
---|---|---|
189.5 | 87.5 | 377.36 |
At the end of my last post, I mentioned that it may be possible that I picked… well expensive stocks. at an average cost of nearly $190 per share, I’d definitely say these aren’t cheap!
2. Group the data by stock and repeat. How do the stocks compare to each other?
So, what’s the variability in stock price within each stock?
SELECT name,
ROUND(AVG(price), 2) AS average_price,
ROUND(MIN(price), 2) AS min_price,
ROUND(MAX(price), 2) AS max_price
FROM stocks
GROUP BY name
ORDER BY average_price;
name | average_price | min_price | max_price |
---|---|---|---|
Electronic Arts | 93.73 | 87.50 | 102.72 |
Spotify | 140.56 | 121.58 | 155.38 |
NVIDIA | 159.38 | 133.78 | 182.98 |
Apple | 196.11 | 173.30 | 210.94 |
Boeing | 357.74 | 337.37 | 377.36 |
So, EA’s stock is lower than all the others and never even overlaps the stock price of any of the other stocks. On the other end of the spectrum is Boeing - more expensive than all the others by leaps over $100 dollars a share! Spotify and Nvidia are closer in price to each other than Nvidia and Apple but Spotify isn’t as expensive as Nvidia.
3. Group the data by day or hour of day. Does day of week or time of day impact prices?
SELECT date,
ROUND(AVG(price), 2) AS average_price
FROM stocks
GROUP BY date
ORDER BY average_price DESC;
date | average_price |
---|---|
5/3/19 | 199.34 |
5/1/19 | 199.20 |
7/23/19 | 198.91 |
5/2/19 | 198.42 |
7/24/19 | 198.18 |
5/6/19 | 197.51 |
7/22/19 | 197.37 |
4/30/19 | 197.33 |
7/12/19 | 196.27 |
7/15/19 | 196.12 |
Well, I don’t know about day of the week but it does look like that things were looking particularly good at the beginning of may!
4. Which of the rows have a price greater than the average of all prices in the dataset?
SELECT *
FROM stocks
WHERE price > 189.5;
symbol | name | date | price |
---|---|---|---|
APPL | Apple | 4/30/19 | 199.9002 |
APPL | Apple | 5/1/19 | 209.7124 |
APPL | Apple | 5/2/19 | 208.3477 |
APPL | Apple | 5/3/19 | 210.9377 |
APPL | Apple | 5/6/19 | 207.6802 |
APPL | Apple | 5/7/19 | 202.0818 |
APPL | Apple | 5/8/19 | 202.1216 |
APPL | Apple | 5/9/19 | 199.9500 |
APPL | Apple | 5/10/19 | 197.1800 |
APPL | Apple | 5/15/19 | 190.9200 |
A lot of rows. A lot of rows have a price greater than the average price (110 to be precise). If you look at the table from challenge 2, you’ll see that the only stocks above the average are from Apple and Boeing.
Advanced Challenge
1. Calculate other key statistics, like median or variance.
SELECT AVG((stocks.price - sub.a) * (stocks.price - sub.a)) AS variance
FROM stocks,
(SELECT AVG(price) AS a
FROM stocks) AS sub;
variance |
---|
8257.67435647567 |
SELECT price AS median
FROM stocks
ORDER BY price
LIMIT 1
OFFSET (SELECT COUNT(price)
FROM stocks) / 2;
median |
---|
159.398056 |
2. Refactor the data into 2 tables
We’ll refactor the stocks table into stock_info to store general info about the stock itself (ie. symbol, name) and stock_prices to store the collected data on price (ie. symbol, date, price).
CREATE TABLE stock_info
AS SELECT DISTINCT symbol, name
FROM stocks;
CREATE TABLE stock_prices
AS SELECT symbol, date, price
FROM stocks;
SELECT *
FROM stock_info;
symbol | name |
---|---|
APPL | Apple |
BA | Boeing |
EA | Electronic Arts |
SPOT | Spotify |
NVDA | NVIDIA |
SELECT *
FROM stock_prices;
symbol | date | price |
---|---|---|
APPL | 4/30/19 | 199.9002 |
APPL | 5/1/19 | 209.7124 |
APPL | 5/2/19 | 208.3477 |
APPL | 5/3/19 | 210.9377 |
APPL | 5/6/19 | 207.6802 |
APPL | 5/7/19 | 202.0818 |
APPL | 5/8/19 | 202.1216 |
APPL | 5/9/19 | 199.9500 |
APPL | 5/10/19 | 197.1800 |
APPL | 5/13/19 | 185.7200 |
3. Now join the 2 tables in order to view more information on the stock with each row of price.
SELECT stock_prices.symbol, date, price, name
FROM stock_prices
LEFT JOIN stock_info ON stock_info.symbol = stock_prices.symbol;
symbol | date | price | name |
---|---|---|---|
APPL | 4/30/19 | 199.9002 | Apple |
APPL | 5/1/19 | 209.7124 | Apple |
APPL | 5/2/19 | 208.3477 | Apple |
APPL | 5/3/19 | 210.9377 | Apple |
APPL | 5/6/19 | 207.6802 | Apple |
APPL | 5/7/19 | 202.0818 | Apple |
APPL | 5/8/19 | 202.1216 | Apple |
APPL | 5/9/19 | 199.9500 | Apple |
APPL | 5/10/19 | 197.1800 | Apple |
APPL | 5/13/19 | 185.7200 | Apple |
Awesome! Now on to the next set of SQL challenges in another post!
DROP TABLE stock_prices;
DROP TABLE stock_info;