译科技『连载』：从可视化线性代数开始机器学习（二）

来源：Towards Data Science 时间：2023-03-07 12:59:58 作者：Marcello Politi（机器学习专家）

　　人工智能被广泛应用于图像识别、语音识别、自然语言处理、智能推荐、自动驾驶、智能制造、医疗保健等众多领域，对社会、经济、科技的发展产生了深远影响。

　　机器学习是人工智能的核心，是使计算机智能化的根本途径。机器学习是一门多领域交叉学科，涉及多门学科，专门研究计算机怎样模拟或实现人类的学习行为，以获取新的知识和技能，并重新组织已有知识结构使之不断提升自身性能。机器学习的发展已经取得叹为观止的成就，彻底改变了人类对人工智能的认知。想了解机器学习，就需要先学习数学基础，包括线性代数、微积分、概率论等知识，这些数学基础对于深度学习等人工智能领域的理解至关重要。

　　数据观从2月27日起连载《从可视化线性代数开始机器学习》系列文章，作者为欧洲航天局机器学习专家Marcello Politi，专业深度讲述机器学习的“黑魔法”。

Visualized Linear Algebra to Get Started with Machine Learning: Part 2

从可视化线性代数开始机器学习（二）

　　Master elements of linear algebra, start with simple and visual explanations of basic concepts

　　掌握线性代数要素，从基本概念的简明阐释开始。

Introduction

简介

　　In this article, we continue the work we started in “Visualized Linear Algebra to Get Started with Machine Learning: Part 1”. We tackle new concepts of linear algebra in a simple and intuitive way. These articles are intended to introduce you to the world of linear algebra and make you understand how strongly the study of this subject and other mathematical subjects is related to data science.

　　本文中，我们将延续《从可视化线性代数开始机器学习（一）》（跳转链接）的内容，以简单直观的方式处理线性代数的概念。本系列文章旨在介绍线性代数的世界，以线性代数为代表的一众数学学科，与数据科学有着千丝万缕的联系。

Index

索引

Solve Equations

解方程

Determinants

行列式

Advanced Changing Basis

高级变基

Eigenvalues and Eigenvectors

特征值和特征向量

Calculating Eigenvalues and Eigenvectors

计算特征值和特征向量

　　Solve Equations

　　解方程

　　Let’s finally try to understand how to solve equations simultaneously. You will by now have become familiar with writing equations compactly using matrices and vectors, as in this example.

　　最后，让我们试着了解一下如何同时解方程。你现在应该已经熟悉了使用矩阵和向量紧凑地书写方程，如本例所示。

Equation (Image By Author)

方程式（图片来自作者）

　　Finding the vector of unknowns r = [a,b], is quite straightforward; we only need to multiply the left and right sides of the equation by the inverse of matrix A.

　　找到未知数向量r=[a,b]是非常简单的；我们只需要将方程的左右两边乘以矩阵A的逆。

Solve Equation (Image By Author)

解方程（图片来自作者）

　　We see that A^-1 and A cancel, since multiplication for a matrix by its inverse always gives the identity matrix (that is, the matrix that has all 1’s on the main diagonal and zero elsewhere). And so we find the value of r.

　　我们看到A^-1和A相抵消，因为矩阵与它的逆数相乘总是能得到相同的矩阵（也就是主对角线上全部为1，其他地方为0的矩阵）。因此，我们找到了r的值。

　　But in order to do this we have to compute A^-1 which may not be too simple. Often programming languages have algorithms already implemented that are very efficient in calculating the inverse matrix, so you will always have to use those. But in case you want to learn how to do this calculation by hand you will have to use Gaussian Elimination.

　　但为了做到这一点，我们必须计算A^-1，这可能不是太简单。通常情况下，编程语言已经实现了在计算逆矩阵方面非常有效的算法，所以你总是不得不使用这些算法。但如果你想学习如何手工计算，你就必须使用高斯消除法。

　　This is for example how you compute the inverse by using numpy in Python.

　　例如，这就是你如何在Python中使用numpy来计算逆。

　　Determinants

　　行列式

　　The determinant is another fundamental concept in line algebra. It is often taught in college how to calculate it but not what it is. We can associate a value with each matrix, and that value is precisely the determinant. However, you can think of the determinant as the area of the deformed space.

　　行列式是线代中的另一个基本概念。大学里经常教大家如何计算它，但不知道它是什么。我们可以将一个值与每个矩阵联系起来，而这个值正是行列式。然而，你可以把行列式看作是变形空间的面积。

　　We have seen how each matrix is simply a deformation of space. Let us give an example.

　　我们已经看到每个矩阵是如何简单地对空间进行变形的。让我们举一个例子。

Determinant (Image By Author)

行列式（图片来自作者）

　　If we calculate the area of the new space, as shown in the figure, this area is precisely the determinant associated with the starting matrix. In this case the determinant = a*d.

　　如果我们计算新空间的面积，如图所示，这个面积正好是与起始矩阵相关的行列式。在这种情况下，行列式=a*d。

　　Certainly, we have matrices that can describe somewhat more complex deformations of space, and in that case, it may not be so trivial to calculate the area i.e., the determinant.

　　当然，我们有一些矩阵可以描述空间的某种更复杂的变形，在这种情况下，计算面积即行列式可能不是那么微不足道。

　　For this, there are known formulas for calculating the determinant. For example, let us see the formula for calculating the determinant of a 2x2 matrix.

　　为此，有一些已知的行列式计算公式。例如，让我们看看2x2矩阵行列式的计算公式。

Compute Determinant od 2x2 Matrix (Image By Author)

计算行列式 od 2x2 矩阵（图片来自作者）

　　You can look here to learn how to calculate the determinant in general cases with larger matrices.

　　这里，可以学习如何在一般情况下计算较大矩阵的行列式。

　　If you think about it, however, there are transformations that do not create any area. Let’s look at the following example.

　　但是，如果仔细考虑一下，有些变换并不产生任何面积。让我们看一下下面的例子。

Det equal zero (Image By Author)

Det 等于零（图片来自作者）

　　In this example, the matrix does not allow us to create any area, so we have a determinant equal to zero.

　　在这个例子中，矩阵不允许我们创建任何区域，所以行列式等于零。

　　But what is the use of knowing the determinant? We have seen that to solve simultaneous equations we need to be able to calculate the inverse of a matrix.

　　但知道行列式有什么用呢？我们已经看到，为了解决同调方程，我们需要能够计算出矩阵的逆。

　　But the inverse of a matrix does not exist if the determinant is equal to zero! That is why it is important to know how to calculate it, to know if there are solutions to the problem.

　　但是，如果行列式等于零，那么矩阵的逆就不存在！这就是为什么要知道如何计算它的原因。

　　You can think of the inverse matrix, as a way of transforming the space back to the original space. But when a matrix causes, not an area but only a segment to be created, and then makes us go from 2d to 1d space, the inverse matrix does not have enough information and will never be able to make us go back to the original space in 2d from that in 1d.

　　您可以将逆矩阵视为一种将空间转换回原始空间的方法。但是，当一个矩阵导致的不是一个区域而是一个段被创建，然后让我们从二维空间回到一维空间时，逆矩阵没有足够的信息，永远无法使我们从1D空间回到2D的原始空间。

　　Advanced Changing Basis

　　高级变基

　　We have already seen in the previous article the basic example of changing the basis, but now let’s look at a somewhat more complex example.

　　我们已经在上一篇文章中看到了变基的例子，但现在让我们看看一个有点复杂的例子。

　　Let’s imagine the existence of two worlds, ours and Narnia’s. In our world, we use the vectors e1 and e2 as our reference vectors, as the basis. Thanks to these vectors we are able to create others and assign coordinates to them. For example, we can create the vectors [1,1], and [3,1].

　　让我们想象有两个世界存在，我们的世界和纳尼亚世界。在我们的世界里，我们使用向量e1和e2作为我们的参考向量，作为基础。由于这些向量，我们能够创建其他的向量并为它们分配坐标。例如，我们可以创建向量[1,1]和[3,1]。

Our world (Image By Author)

我们的世界（图片来自作者）

　　In the world of Narnia though, they use different vectors as a base. Can you guess which ones they use? Just the ones we call [1,1] and [3,1].

　　不过在纳尼亚的世界里，他们使用不同的矢量作为基础。你能猜到他们用的是哪些吗？就是我们称之为[1,1]和[3,1]的那些。

Narnia’s world (Image By Author)

纳尼亚世界（图片来自作者）

　　The people of Narnia will then use this basis of theirs to define other vectors of space, for example, they may define the vector [3/2, 1/2].

　　纳尼亚人随后将使用他们的这个基础来定义其他空间向量，例如，他们可以定义向量 [3/2, 1/2]。

Vector In Narnian’s world (Image By Author)

纳尼安世界中的向量（图片来自作者）

　　Well, now what I want to find out is: how do I define that red vector based on the coordinates of my world?

　　那么，现在我想知道的是：如何根据我的世界坐标定义红色向量？

　　We have already seen this, we take the vectors that form the basis of Narnia but expressed in the coordinates of my world, so [1,1] and [3,1]. We put them in a matrix and multiply this matrix by the red vector.

　　我们已经看到了这一点，我们把构成纳尼亚的基础的向量，用我的世界的坐标表示，如图[1,1]和[3,1]。我们把它们放在一个矩阵中，然后用这个矩阵乘以红色向量。

Changing Basis (Image By Author)

变基（图片来自作者）

　　Now we ask: can we do the reverse as well? Can I express a vector of my world according to the coordinates they use in Narnia? Of course!

　　现在我们要问：可以反过来做吗？我可以根据他们在纳尼亚使用的坐标表达我的世界的矢量吗？当然可以！

　　It will suffice to do the same process but change the point of view. But why do we do all this? Very often when we have to describe vectors or transformations, we have a much simpler notation if we use a different basis.

　　做同样的过程但改变观点就足够了。但我们为什么要做这一切？很多时候，当我们必须描述向量或变换时，如果我们使用不同的基础，就会有更简单的符号。

　　Suppose we want to apply an R-transformation to a vector. But this transformation turns out to be difficult to apply. Then we can first transform my vector into a vector in the world of Narnia by applying the matrix N. After that we apply the desired transformation R. And then we bring everything back to our original world with N^-1.

　　假设我们想对一个向量应用R变换。但这个变换结果是很难应用的。那么我们可以首先通过应用矩阵N将我的向量转化为纳尼亚世界中的向量，然后我们应用所需的变换R，然后我们用N^-1将一切带回我们的原始世界。

　　This is something that can be very useful and make life easier when we are dealing with complex transformations. I hope I have at least given you some insight; there is so much more to talk about.

　　这是非常有用的东西，当我们处理复杂的转换时，可以使生活更容易。希望我至少给了你一些启示；还有很多东西要谈。

　　Eigenvalues and Eigenvectors

　　特征值和特征向量

　　We have already repeated several times that applying a linear transformation (a matrix) to a vector transforms that vector.

　　我们已经多次重复对向量应用线性变换（矩阵）来变换该向量。

　　However, there are cases in which the vector remains in the same initial direction. Think for example the case where we simply scale the space. If we visualize the horizontal and the vertical vector these remain in the same direction although they get longer or shorter.

　　然而，在有些情况下，矢量保持在相同的初始方向。例如，考虑简单地缩放空间的情况。如果我们把水平和垂直方向的向量形象化，虽然它们变长或变短，但仍保持在同一方向。

Scale Space (Image By Author)

比例空间（图片来自作者）

　　We see in the image above that the linear transformation applied here is that of scaling. But if we try to understand what happens to each individual vector we notice that the red vectors still maintain the same direction.

　　在上面的图片中看到，这里应用的线性变换是缩放。但如果我们试图了解每个单独的向量会发生什么，我们会注意到红色的向量仍然保持相同的方向。

　　These vectors that maintain the same direction are called Eigenvectors of the matrix that described this transformation.

　　这些保持相同方向的向量被称为描述这一转换的矩阵的特征向量。

　　In particular, the vertical red vector has remained unchanged, so let’s say it has eigenvalue =1 while the other red vector has doubled so let’s say it has eigenvalue =2.

　　尤其是，垂直的红色向量保持不变，假设它的特征值 = 1，而另一个红色向量增加了一倍，所以我们说它的特征值=2。

　　Obviously depending on the matrix, and thus the transformation, the number of eigenvectors may vary.

　　显然，根据矩阵的不同，也就意味着转换的不同，特征向量的数量可能会有所不同。

　　Calculating Eigenvalues and Eigenvectors

　　计算特征值和特征向量

　　Let us now try to convert what we have expressed in words into a mathematical formula. So eigenvectors are those vectors to which when a matrix is applied they do not change, at most they lengthen or shorten.

　　现在让我们试着把用文字表达的东西转换成数学公式。因此，特征向量是那些当矩阵被应用时，它们不会发生变化的向量，顶多是延长或缩短。

Calculate Eigenvectors (Image By Author)

计算特征向量（图片来自作者）

　　In the formula A is a matrix, x is a vector and lambda is a scalar. If the condition is satisfied we say that x is an eigenvector of A with the corresponding eigenvalue lambda.

　　公式中A是一个矩阵，x是一个向量，lambda是一个标量。如果条件得到满足，我们就说x是A的一个特征向量，其对应的特征值为lambda。

　　By solving the previous equation we can find the value of the eigenvalues that solve the equation, let’s see how to do it.

　　通过解决前面的方程，我们可以找到解决方程的特征值的值，让我们看看如何做。

Characteristic polynomial (Image By Author=

特征多项式（图片来自作者）

　　Once the eigenvalues have been found, it will suffice to substitute them into the following equation to find the eigenvectors.

　　一旦找到特征值，就可以将它们代入以下等式以找到特征向量。

Find eigenvectors (Image By Author)

查找特征向量（图片来自作者）

　　Final Thoughts

　　编后按

　　I hope you have found some useful insights in this article and that you have understood them without too much effort. The purpose is to get a little familiar with these terms and linear algebra elements. In this way, I hope that the next time you go to look at the documentation of sklearn or some library you will be able to better understand what that particulate function you are using is actually doing!

　　希望你从本文中获得有用见解，不费吹灰之力就能理解它们，其目的是让你熟悉这些术语和线性代数元素。通过这种方式，下次你再去看sklearn或一些库的文档时，就能更好地理解你正在使用的那个粒子函数实际上在做什么!