Hello fellow data enthusiasts! Today we will be exploring a simple method to visualize and analyze the relationship between different stock market indices: the correlation matrix heatmap. You don’t need a background in finance or data science to follow along. We’ll break down complex terms into understandable concepts and walk you through the process using Python’s powerful libraries, pandas and seaborn. So, whether you’re a seasoned trader or a beginner looking to dip your toes into the world of stocks, read on to uncover valuable insights hidden in the market data.
NOTE: code written in JupyterLab
Table of contents
Open Table of contents
Key Concepts
Before diving into the code, let’s understand some key financial concepts.
Stock Market Indices
A stock market index is a measurement of a portion of the stock market. It is computed from the prices of selected stocks and is often used as a benchmark to track the performance of an industry, sector, market, or region. Examples of well-known stock market indices include the S&P 500 in the U.S, the FTSE 100 in the U.K, the DAX in Germany, and the Nikkei 225 in Japan.
Daily Returns
Daily return is a measure of the gain or loss of a stock from one day to another. It’s calculated as the difference between the closing prices of two consecutive days, divided by the closing price of the first day.
Correlation Coefficient
In statistics, the correlation coefficient is a measure that determines the degree to which two variables’ movements are associated. The range of values for the correlation coefficient is -1.0 to 1.0. A correlation of -1.0 shows a perfect negative correlation, while a correlation of 1.0 shows a perfect positive correlation. A correlation of 0.0 shows no linear relationship between the movement of the two variables.
Correlation Matrix
A correlation matrix is a table showing the correlation coefficients between many variables. Each cell in the table shows the correlation between two variables. It is a powerful tool to summarize a large dataset and to identify relationships between variables.
Correlation Matrix Heatmap
A correlation matrix heatmap is a color-coded representation of the data in a correlation matrix. It uses color to represent the magnitude of correlation coefficients, making it easier to identify patterns and relationships.
Analyzing Stock Market Indices with a Correlation Matrix Heatmap
Now that we’ve got the basic terms out of the way, let’s use Python to create a correlation matrix heatmap for the daily returns of several stock market indices. We’ll use pandas for data manipulation and seaborn for data visualization.
Python Libraries
First, assuming you’ve pip installed
the necessary libraries, we’ll need to import the necessary libraries:
import pandas as pd
import seaborn as sns
Importing the Data
Next, let’s assume we have the historical price data of several stock market indices in separate CSV files. I personally retrieved mine from Yahoo Finance, but you have many other options like using 3rd party API’s such as AlphaVantage.
We’ll read these data into pandas DataFrames, calculate the daily returns, and combine them into a single DataFrame:
# Read the adjusted closing prices into a dataframe and filter out just the "Adj Close" column
BVSP = pd.read_csv("/path/to/BVSP.csv", index_col="Date", parse_dates=True)["Adj Close"]
HSI = pd.read_csv("/path/to/HSI.csv", index_col="Date", parse_dates=True)["Adj Close"]
# ... do the same for the rest of the index csv files
# Calculate the daily returns for each index and save it as a column in a new dataframe
returns = pd.DataFrame({
"BVSP": BVSP.sort_index().pct_change(),
"HSI": HSI.sort_index().pct_change(),
# ... do the same for the rest of the indices
})
# Drop the rows with missing values
returns.dropna(inplace=True)
If we display this on JupyterLab using display(returns)
, we should see something like:
BVSP | HSI | … | |
---|---|---|---|
Date | |||
2018-08-01 | 0.001035 | -0.008476 | … |
2018-08-02 | 0.004224 | -0.022095 | … |
2018-08-06 | -0.004715 | 0.005176 | … |
2018-08-07 | -0.008686 | 0.015432 | … |
2018-08-08 | -0.014873 | 0.003903 | … |
… | … | … | … |
2023-07-19 | -0.002452 | -0.003335 | … |
2023-07-20 | 0.004517 | -0.001282 | … |
2023-07-24 | 0.009358 | -0.021342 | … |
2023-07-25 | 0.005489 | 0.041046 | … |
2023-07-26 | 0.004524 | -0.003564 | … |
Generating the Correlation Matrix
Now, let’s calculate the correlation matrix for our data and display it on JupyterLab:
correlation_matrix = returns.corr()
display(correlation_matrix)
BVSP | HSI | … | |
---|---|---|---|
BVSP | 1.000000 | 0.197386 | … |
HSI | 0.197386 | 1.000000 | … |
… | … | … | … |
N100 | 0.506160 | 0.386777 | … |
N225 | 0.215761 | 0.446882 | … |
NOTE: Notice the diagonal cells with a correlation coefficient of 1 - the correlation of an index with itself.
Generating the Correlation Matrix Heatmap
Finally, we can create the correlation matrix heatmap using seaborn and display it:
# the first argument to the sns.heatmap() function is the correlation_matrix from above
returns_heatmap = sns.heatmap(correlation_matrix, vmin=-1, vmax=1, cmap=sns.diverging_palette(0, 255, as_cmap=True), cbar_kws={"label": "Correlation Coefficient"})
returns_heatmap.set(title="Random Global Stock Market Indices Daily Returns: Correlation Matrix Heatmap")
display(returns_heatmap)
As we may have expected, the red column and row is visually quite clear to see. It represents the VIX index’s relationship with other indices which often tends to be negatively correlated as this index is often called the “fear gauge”. We can also quite easily indicate darker shades of blue indicating a stronger positive correlation. The cells with almost no color or very faint color of course indicates almost no correlation.
Important Note
While this is a very quick and fun way to check how multiple variables such as stock market indices relate and correlate to one another, it is important to remember that this is a very simple and naive approach. Like the mean and standard deviation, the correlation coefficient is sensitive to outliers in the data. Also, the correlation coefficient may not be a useful metric if variables have a non-linear association.
For a more varied approach, we can use the scikit-learn module. More info here!
Conclusion
And that’s it! The heatmap allows us to visually identify the correlation between different indices. This type of analysis can be useful when constructing a diversified portfolio or understanding how different markets react to each other.
Happy coding!
References
- pandas docs: https://pandas.pydata.org/
- seaborn docs: https://seaborn.pydata.org/