This is a continuation of my work on the codeacademy data science independant project #1: Watching the Stock Market.

  • My first post on the subject explores how I put gathered the stock data.
  • In my second post, I explored the data using basic sql queries (and put my sql code out into the world for the first time!)

After completing the basic challenges from my last post on the topic, there are intermediate and advanced challenges too while playing with these stocks data.

Let’s jump into the challenges!

Intermediate Challenges

1. Use aggregate functions to look at key statistics about the data (e.g., min, max, average).

SELECT ROUND(AVG(price), 2) AS average_price, 
       ROUND(MIN(price), 2) AS min_price, 
       ROUND(MAX(price), 2) AS max_price 
FROM stocks;
Table 1: 1 records
average_price min_price max_price
189.5 87.5 377.36

At the end of my last post, I mentioned that it may be possible that I picked… well expensive stocks. at an average cost of nearly $190 per share, I’d definitely say these aren’t cheap!

2. Group the data by stock and repeat. How do the stocks compare to each other?

So, what’s the variability in stock price within each stock?

SELECT name, 
       ROUND(AVG(price), 2) AS average_price, 
       ROUND(MIN(price), 2) AS min_price, 
       ROUND(MAX(price), 2) AS max_price 
FROM stocks
ORDER BY average_price;
Table 2: 5 records
name average_price min_price max_price
Electronic Arts 93.73 87.50 102.72
Spotify 140.56 121.58 155.38
NVIDIA 159.38 133.78 182.98
Apple 196.11 173.30 210.94
Boeing 357.74 337.37 377.36

So, EA’s stock is lower than all the others and never even overlaps the stock price of any of the other stocks. On the other end of the spectrum is Boeing - more expensive than all the others by leaps over $100 dollars a share! Spotify and Nvidia are closer in price to each other than Nvidia and Apple but Spotify isn’t as expensive as Nvidia.

3. Group the data by day or hour of day. Does day of week or time of day impact prices?

SELECT date, 
       ROUND(AVG(price), 2) AS average_price
FROM stocks
ORDER BY average_price DESC;
Table 3: Displaying records 1 - 10
date average_price
5/3/19 199.34
5/1/19 199.20
7/23/19 198.91
5/2/19 198.42
7/24/19 198.18
5/6/19 197.51
7/22/19 197.37
4/30/19 197.33
7/12/19 196.27
7/15/19 196.12

Well, I don’t know about day of the week but it does look like that things were looking particularly good at the beginning of may!

4. Which of the rows have a price greater than the average of all prices in the dataset?

FROM stocks
WHERE price > 189.5;
Table 4: Displaying records 1 - 10
symbol name date price
APPL Apple 4/30/19 199.9002
APPL Apple 5/1/19 209.7124
APPL Apple 5/2/19 208.3477
APPL Apple 5/3/19 210.9377
APPL Apple 5/6/19 207.6802
APPL Apple 5/7/19 202.0818
APPL Apple 5/8/19 202.1216
APPL Apple 5/9/19 199.9500
APPL Apple 5/10/19 197.1800
APPL Apple 5/15/19 190.9200

A lot of rows. A lot of rows have a price greater than the average price (110 to be precise). If you look at the table from challenge 2, you’ll see that the only stocks above the average are from Apple and Boeing.

Advanced Challenge

1. Calculate other key statistics, like median or variance.

SELECT AVG((stocks.price - sub.a) * (stocks.price - sub.a)) AS variance
FROM stocks, 
    (SELECT AVG(price) AS a 
     FROM stocks) AS sub;
Table 5: 1 records
SELECT price AS median
FROM stocks
ORDER BY price
        FROM stocks) / 2;
Table 6: 1 records

2. Refactor the data into 2 tables

We’ll refactor the stocks table into stock_info to store general info about the stock itself (ie. symbol, name) and stock_prices to store the collected data on price (ie. symbol, date, price).

CREATE TABLE stock_info
AS SELECT DISTINCT symbol, name 
   FROM stocks;
CREATE TABLE stock_prices
AS SELECT symbol, date, price
   FROM stocks;
FROM stock_info;
(#tab:show_stock_info)5 records
symbol name
APPL Apple
BA Boeing
EA Electronic Arts
SPOT Spotify
FROM stock_prices;
(#tab:show_stock_prices)Displaying records 1 - 10
symbol date price
APPL 4/30/19 199.9002
APPL 5/1/19 209.7124
APPL 5/2/19 208.3477
APPL 5/3/19 210.9377
APPL 5/6/19 207.6802
APPL 5/7/19 202.0818
APPL 5/8/19 202.1216
APPL 5/9/19 199.9500
APPL 5/10/19 197.1800
APPL 5/13/19 185.7200

3. Now join the 2 tables in order to view more information on the stock with each row of price.

SELECT stock_prices.symbol, date, price, name
FROM stock_prices
LEFT JOIN stock_info ON stock_info.symbol = stock_prices.symbol;
Table 7: Displaying records 1 - 10
symbol date price name
APPL 4/30/19 199.9002 Apple
APPL 5/1/19 209.7124 Apple
APPL 5/2/19 208.3477 Apple
APPL 5/3/19 210.9377 Apple
APPL 5/6/19 207.6802 Apple
APPL 5/7/19 202.0818 Apple
APPL 5/8/19 202.1216 Apple
APPL 5/9/19 199.9500 Apple
APPL 5/10/19 197.1800 Apple
APPL 5/13/19 185.7200 Apple

Awesome! Now on to the next set of SQL challenges in another post!

DROP TABLE stock_prices;
DROP TABLE stock_info;