Skip to content

Analyzing Stock Market Indices with a Correlation Matrix Heatmap using Python in JupyterLab

Posted on:July 30, 2023 at 09:57 AM
Reading time:7 minutes

Hello fellow data enthusiasts! Today we will be exploring a simple method to visualize and analyze the relationship between different stock market indices: the correlation matrix heatmap. You don’t need a background in finance or data science to follow along. We’ll break down complex terms into understandable concepts and walk you through the process using Python’s powerful libraries, pandas and seaborn. So, whether you’re a seasoned trader or a beginner looking to dip your toes into the world of stocks, read on to uncover valuable insights hidden in the market data.

NOTE: code written in JupyterLab

Table of contents

Open Table of contents

Key Concepts

Before diving into the code, let’s understand some key financial concepts.

Stock Market Indices

A stock market index is a measurement of a portion of the stock market. It is computed from the prices of selected stocks and is often used as a benchmark to track the performance of an industry, sector, market, or region. Examples of well-known stock market indices include the S&P 500 in the U.S, the FTSE 100 in the U.K, the DAX in Germany, and the Nikkei 225 in Japan.

Daily Returns

Daily return is a measure of the gain or loss of a stock from one day to another. It’s calculated as the difference between the closing prices of two consecutive days, divided by the closing price of the first day.

Correlation Coefficient

In statistics, the correlation coefficient is a measure that determines the degree to which two variables’ movements are associated. The range of values for the correlation coefficient is -1.0 to 1.0. A correlation of -1.0 shows a perfect negative correlation, while a correlation of 1.0 shows a perfect positive correlation. A correlation of 0.0 shows no linear relationship between the movement of the two variables.

Correlation Matrix

A correlation matrix is a table showing the correlation coefficients between many variables. Each cell in the table shows the correlation between two variables. It is a powerful tool to summarize a large dataset and to identify relationships between variables.

Correlation Matrix Heatmap

A correlation matrix heatmap is a color-coded representation of the data in a correlation matrix. It uses color to represent the magnitude of correlation coefficients, making it easier to identify patterns and relationships.

Analyzing Stock Market Indices with a Correlation Matrix Heatmap

Now that we’ve got the basic terms out of the way, let’s use Python to create a correlation matrix heatmap for the daily returns of several stock market indices. We’ll use pandas for data manipulation and seaborn for data visualization.

Python Libraries

First, assuming you’ve pip installed the necessary libraries, we’ll need to import the necessary libraries:

import pandas as pd
import seaborn as sns

Importing the Data

Next, let’s assume we have the historical price data of several stock market indices in separate CSV files. I personally retrieved mine from Yahoo Finance, but you have many other options like using 3rd party API’s such as AlphaVantage.

We’ll read these data into pandas DataFrames, calculate the daily returns, and combine them into a single DataFrame:

# Read the adjusted closing prices into a dataframe and filter out just the "Adj Close" column
BVSP = pd.read_csv("/path/to/BVSP.csv", index_col="Date", parse_dates=True)["Adj Close"]
HSI = pd.read_csv("/path/to/HSI.csv", index_col="Date", parse_dates=True)["Adj Close"]
# ... do the same for the rest of the index csv files

# Calculate the daily returns for each index and save it as a column in a new dataframe
returns = pd.DataFrame({
    "BVSP": BVSP.sort_index().pct_change(),
    "HSI": HSI.sort_index().pct_change(),
    # ... do the same for the rest of the indices
})

# Drop the rows with missing values
returns.dropna(inplace=True)

If we display this on JupyterLab using display(returns), we should see something like:

BVSPHSI
Date
2018-08-010.001035-0.008476
2018-08-020.004224-0.022095
2018-08-06-0.0047150.005176
2018-08-07-0.0086860.015432
2018-08-08-0.0148730.003903
2023-07-19-0.002452-0.003335
2023-07-200.004517-0.001282
2023-07-240.009358-0.021342
2023-07-250.0054890.041046
2023-07-260.004524-0.003564

Generating the Correlation Matrix

Now, let’s calculate the correlation matrix for our data and display it on JupyterLab:

correlation_matrix = returns.corr()
display(correlation_matrix)
BVSPHSI
BVSP1.0000000.197386
HSI0.1973861.000000
N1000.5061600.386777
N2250.2157610.446882

NOTE: Notice the diagonal cells with a correlation coefficient of 1 - the correlation of an index with itself.

Generating the Correlation Matrix Heatmap

Finally, we can create the correlation matrix heatmap using seaborn and display it:

# the first argument to the sns.heatmap() function is the correlation_matrix from above
returns_heatmap = sns.heatmap(correlation_matrix, vmin=-1, vmax=1, cmap=sns.diverging_palette(0, 255, as_cmap=True), cbar_kws={"label": "Correlation Coefficient"})
returns_heatmap.set(title="Random Global Stock Market Indices Daily Returns: Correlation Matrix Heatmap")
display(returns_heatmap)

correlation-blog_2_4

As we may have expected, the red column and row is visually quite clear to see. It represents the VIX index’s relationship with other indices which often tends to be negatively correlated as this index is often called the “fear gauge”. We can also quite easily indicate darker shades of blue indicating a stronger positive correlation. The cells with almost no color or very faint color of course indicates almost no correlation.

Important Note

While this is a very quick and fun way to check how multiple variables such as stock market indices relate and correlate to one another, it is important to remember that this is a very simple and naive approach. Like the mean and standard deviation, the correlation coefficient is sensitive to outliers in the data. Also, the correlation coefficient may not be a useful metric if variables have a non-linear association.

For a more varied approach, we can use the scikit-learn module. More info here!

Conclusion

And that’s it! The heatmap allows us to visually identify the correlation between different indices. This type of analysis can be useful when constructing a diversified portfolio or understanding how different markets react to each other.

Happy coding!

References