This is a continuation of my work on the codeacademy data science independant project #1: Watching the Stock Market.
- My first post on the subject explores how I put gathered the stock data.
- In my second post, I explored the data using basic sql queries (and put my sql code out into the world for the first time!)
After completing the basic challenges from my last post on the topic, there are intermediate and advanced challenges too while playing with these stocks data.
Let’s jump into the challenges!
1. Use aggregate functions to look at key statistics about the data (e.g., min, max, average).
SELECT ROUND(AVG(price), 2) AS average_price, ROUND(MIN(price), 2) AS min_price, ROUND(MAX(price), 2) AS max_price FROM stocks;
At the end of my last post, I mentioned that it may be possible that I picked… well expensive stocks. at an average cost of nearly $190 per share, I’d definitely say these aren’t cheap!
2. Group the data by stock and repeat. How do the stocks compare to each other?
So, what’s the variability in stock price within each stock?
SELECT name, ROUND(AVG(price), 2) AS average_price, ROUND(MIN(price), 2) AS min_price, ROUND(MAX(price), 2) AS max_price FROM stocks GROUP BY name ORDER BY average_price;
So, EA’s stock is lower than all the others and never even overlaps the stock price of any of the other stocks. On the other end of the spectrum is Boeing - more expensive than all the others by leaps over $100 dollars a share! Spotify and Nvidia are closer in price to each other than Nvidia and Apple but Spotify isn’t as expensive as Nvidia.
3. Group the data by day or hour of day. Does day of week or time of day impact prices?
SELECT date, ROUND(AVG(price), 2) AS average_price FROM stocks GROUP BY date ORDER BY average_price DESC;
Well, I don’t know about day of the week but it does look like that things were looking particularly good at the beginning of may!
4. Which of the rows have a price greater than the average of all prices in the dataset?
SELECT * FROM stocks WHERE price > 189.5;
A lot of rows. A lot of rows have a price greater than the average price (110 to be precise). If you look at the table from challenge 2, you’ll see that the only stocks above the average are from Apple and Boeing.
1. Calculate other key statistics, like median or variance.
SELECT AVG((stocks.price - sub.a) * (stocks.price - sub.a)) AS variance FROM stocks, (SELECT AVG(price) AS a FROM stocks) AS sub;
SELECT price AS median FROM stocks ORDER BY price LIMIT 1 OFFSET (SELECT COUNT(price) FROM stocks) / 2;
2. Refactor the data into 2 tables
We’ll refactor the stocks table into stock_info to store general info about the stock itself (ie. symbol, name) and stock_prices to store the collected data on price (ie. symbol, date, price).
CREATE TABLE stock_info AS SELECT DISTINCT symbol, name FROM stocks;
CREATE TABLE stock_prices AS SELECT symbol, date, price FROM stocks;
SELECT * FROM stock_info;
SELECT * FROM stock_prices;
3. Now join the 2 tables in order to view more information on the stock with each row of price.
SELECT stock_prices.symbol, date, price, name FROM stock_prices LEFT JOIN stock_info ON stock_info.symbol = stock_prices.symbol;
Awesome! Now on to the next set of SQL challenges in another post!
DROP TABLE stock_prices;
DROP TABLE stock_info;