Dimensionality Reduction Intro: What Even Are Dimensions?

ZDF Digital R&D / Analytics
4 min readMar 10, 2021

--

These days, one frequently hears about “high-dimensional” data in the context of machine learning, but it’s rather impossible as a three-dimensional being to imagine anything above three dimensions. Luckily, we can use mathematical definitions to take for example the idea of a circle (usually something that exists in two dimensions) and take it up to three or down to one. In this article, we will play with dimensions a bit.

When we talk about a circle, what we mean is a set of points that are all the same distance away from some middle point. To make our equations nice, let’s say that the middle of the circle is at the point (0,0) in the two-dimensional plane R² and let’s say we want our circle to have radius 1, so all the points should be distance 1 away from the center.

We know from geometry that each point p in R² has two components, which we will call x_1 and x_2 to uniquely define its position in space: p=(x_1,x_2). Thinking back to geometry class, we remember by using the Pythagorean theorem for such a point p on a circle with radius 1 around the origin[1], we have

as an equation for this circle.

So, we can describe all the points in this circle as all points in the plane that fulfill equation (1). Putting this all together, we have the following definition:

Now we can take our first step in dimensionality. What would happen if we took points not from R², but from the three-dimensional space R³ that fulfilled the equivalent of equation (2) in three dimensions? Well, a point in R³ has three components, p=(x_1,x_2,x_3) and we still want the distance of all points to be one from the center, so we would write that definition as

And if we think for a moment, the set of all points in three dimensional space that are distance one away from the origin is a sphere — like a basketball with an infinitesimally thin surface. So, to go from a 2d circle to a 3d sphere, all it took was altering our mathematical definition a little. What would happen if we altered it the other way, what if we took our point p not from the 2d plane but from something one-dimensional — the real number line? According to our definition, we would have:

On the number line, there just aren’t that many choices for points that are distance one from the origin. So, this ‘’circle’’ in 1d is just a pair of points, one at x=1 and one at x=-1.

We call these shapes we have defined here spheres — the circle (2) is a one-dimensional sphere[2], a basketball (3) is a two-dimensional sphere, and the two points on the number line (4) constitute a zero-dimensional sphere.

If we allow the mathematical notation to carry us a bit further, we can generalize and know exactly how to define a sphere in 4, 5, or n dimensions. Using the definitions above, we can easily see how to define a sphere in n-space:

This definition shows us that dimension is just something we can dial up or down — we still have a solid definition for a sphere in any dimension; even if we don’t know how to visualize a 4-dimensional sphere, we know precisely how to describe it mathematically.

But what does this have to do with big data? It turns out that dimensionality is frequently a part of big data. Imagine representing a movie as one point in film-space. Film-space has lots of dimensions: genre, rating, RottenTomatoes score, any awards it has won, director, actors, country of production, budget, language, released in theaters or just on TV, runtime, studio, etc. We might want to find out what movies are similar and would want to do this by finding points near each other in film-space — but we can’t imagine these 12+ dimensions. Is there a way to project those high dimensions down to lower ones so we can look at the similarities without losing important information? That’s what we’ll talk about next time.

For more interesting thoughts on dimension, we highly recommend the YouTube channel 3blue1brown and this video in particular.

This is the first part of our Dimensionality Reduction series. You can find part 2 here.

Author: Emily Searle-White, ZDF Digital

Footnotes:

[1] Note that usually the formula for Pythagoras is a²+b²=c² but 1 is its own square and square root.

[2] A circle ‘’lives’’ in two dimensions, but the surface that constitutes the circle is only one-dimensional.

--

--

ZDF Digital R&D / Analytics

We are the R&D/Analytics department at ZDF Digital. In our day-to-day we identify future & innovation topics in media and offer substantiated analysis services.