Linear Algebra in Data Science

Published in

CodeX

5 min readNov 9, 2021

Data Science is an intersection of 3 core parts: Hacking Skills, Mathematics & Statistics Knowledge and Substantive Expertise. As can be seen from the image above, it is required as a Data Scientist to have basic knowledge of these 3 parts. It is dangerous (Danger Zone!) to have Hacking Skills and Substantive Expertise (Domain Knowledge) without knowing Mathematics and Statistics.

While you do not need to know every part of Mathematics to be a Practicing Data Scientist, it is required to have the basic knowledge.

The 3 major fields in Mathematics needed for a Data Scientist are:

Calculus
Linear Algebra
Probability and Statistics

For this blog post, we would be looking at Linear Algebra. Linear algebra is one of the most foundational subjects in Mathematics, Statistics, Physics, Data Science and Engineering. In Data Science and Machine Learning, the applications of Linear Algebra can be found in Principal Component Analysis to reduce the dimensionality of data, it is also applied in Deep Learning, Neural Networks, Natural Language Processing etc.

Here, we will only be going over the basic requirements you need to know in Linear Algebra which we would be practicalizing using Python’s Numerical Library, Numpy.

What is Linear Algebra

According to Encyclopedia Britannica, Linear Algebra is a mathematical discipline that deals with Vectors and Matrices and more generally Vector Spaces and Linear Transformation.

A matrix can be defined as a two-dimensional (or rectangular) array of numbers. Each number in this array is called the element of the matrix and the matrix is defined by the dimension: row by column. A vector is different from a matrix because it has just one dimension, hence just one column (Column Vector) or just one row (Row Vector). A tensor has more than 2 dimensions, but we will only be focusing on Vectors and Matrices in this post.

Creating Vectors and Matrices in Numpy

Numpy, otherwise known as Numerical Python is a Library useful for Computational and Numerical Operations in Python. Numpy arrays are saved in the data type numpy.ndarray.

Other methods for Creating Numpy Arrays

Using np.zeros() we can create an array whose elements are all zeros with a specified dimension

Creates a Vector with 10 rows and 1 column
array1 = np.zeros(10)Creates a matrix with 3 rows and 2 columns
sarray2 = np.zeros((3,2))

Using np.ones() similarly, we can create an array whose elements are all ones with a specified dimension.

Creates a Vector with 10 rows and 1 column
array1 = np.ones(10)Creates a matrix with 3 rows and 2 columns
sarray2 = np.ones((3,2))

Using np.full() Here, we specify the dimension and the number we want in our array, not just ones and zeros but any number of our choice.

 Create vector of twos with 3 rows and 1 columnarray1 = np.full((3),2)Creates a matrix of fours with 3 rows and 2 columns
sarray2 = np.full((3,2),4)

Using np.arange() Returns equally spaced values within an interval, just like the range() function. It creates a vector which can converted to a matrix using the numpy reshape() function. Like range, it follows the pattern: np.arange(start,stop,step).

import numpy as np
array1 = np.arange(12)
array2 = np.arange(10,130,10)
array2 = array2.reshape(3,4)

Operations in Linear Algebra

Vector Addition

Two vectors (matrices) can only be added if they are conformable for addition i.e. if they are of the same dimension. This means that a 2x2 matrix can only be added to another matrix with dimension 2x2. Also, vector addition obeys the commutative law of addition. Hence the addition of vectors A and B is equal or equivalent to the addition of vectors B and A. Below is an illustration of vector addition using Numpy.

Scalar Multiplication

Scalar Multiplication is the multiplication of a vector or matrix by a scalar. The Scalar multiplies every element in the vector or matrix.

array1 = np.array([[1,2],[3,4]])
array2 = 2*array1

Matrix Multiplication

For two matrices to be multiplied, the number of columns in the first matrix must be equal to the number of rows in the second matrix. Unlike Vector Addition, Matrix Multiplication does not follow the commutative law of Multiplication. Hence, Matrix A multiplied by Matrix B is not equal to Matrix B multiplied by Matrix A.

Determinant of a Matrix

Before identifying the determinant of a matrix, it is important to note that Determinants can only be gotten for Square Matrices. A matrix is Square if the number of columns is equal to the number of rows in its dimension.

array1 = np.array([[1,2],[3,4]])
determinant = np.linalg.det(new_array)

Transpose of a Matrix/Vector

The transpose of a matrix/vector is found by interchanging the rows into columns and vice versa. Hence, it interchanges the dimension of the matrix/vector. Transposing converts a Row Vector to a Column Vector and vice versa.

array1 = np.array([[1,2],[3,4]])
transpose= array1.T

The Inverse of a Matrix

Having gotten the Determinant of a Matrix, we can derive the Inverse. Just as a number has a reciprocal, a Square Matrix has an inverse. The multiplication of a matrix and its inverse gives the identity matrix.

array1 = np.array([[1,2],[3,4]])
inverse = np.linalg.inv(array1)

Identity Matrix

The identity matrix is a matrix whose diagonal elements have ones and off- diagonal elements have zeros. It is gotten when a Matrix is multiplied by its inverse. An identity matrix can easily be obtained in Numpy using np.identity()

np.identity(2) #Creates a 2x2 identity matrix

In this blog post, we have been able to look at the importance of Linear Algebra in Data Science as well as Creating Vectors and Matrices and Performing popular Vector Operations using Numpy.
Do well to Clap and Share if you learnt something from this post. I would also like to hear what you think in the comment section. Thank you!