Visualizing Hollywood Networks

Most the time when people think of graphs they think of something like this:

In the common vernacular of Data Science this is what we mean too, most of the time when someone asks for “A graph of the data”, this is what they mean.

But there’s another meaning, and when visualized it looks more like this:

In this sense of the word graph we’re not talking actually talking about the visualizations, but rather about the relationships that we’re visualizing above.

Here a graph is defined as a collection of “edges” and “nodes”. Each node (visualized above in blue) represents some predetermined kind of object, and each edge (visualized above in black) represents the relationships between those objects.

Some common examples of graphs that you encounter in the course of navigating the internet include:

Now that we’ve got a quick introduction out of the way, let’s work through an example of what we can do with graphs! Since I don’t have have access to any giant datasets like that, I’m going to create a graph of Hollywood! That is, a network where each node is a movie, and the edges between them are the actors they have in common. By doing so we’ll be able to look at the connections between various actor’s filmographies, as well as enabling us to find the connections between movies that you wouldn’t expect.

First a look at what our graph actually looks like.

You can see we’ve pulled about 24k movies, with 1.6M linkages between all those movies. Note: If two movies share multiple actors we only record one linkage.

Degree here refers to the number of linkages between movies. For instance in our use case the degree of a movie is the other movies that actors from the original movie had been in. The higher the degree, the more likely the film is to either have a lot of star power, or to have an extremely wide cast.

Movie Linkages

By looking at the shortest path between two nodes we can enumerate the connections. Let’s start with Pulp Fiction, and another blockbuster, Star Wars!

Generating another few with random samples we see:

We can also search the graph for the “Longest Shortest Path”, aka the two connected movies which have the longest optimal linkage between them. It turns out that this is the link from Kishmish, a Bengali romantic drama, to Paskal a Malaysian action movie, at 13 links. The median longest shortest path is only 8 links though, illustrating how closely connected most films are.

Generating Actor Networks

Now that we have all the data we need, we can also create some fun visualizations, for instance visualizing the connections between all the movies from a particular actor, for instance the eccentric nature of an actor like Nicolas Cage. Note the lack of well-defined clusters, characteristic of an actor like Cage who is famous for taking diverse (and “eccentric”) roles.

Compare that to a network like that of Chris Evans, an actor who has made the bulk of their fame starring in the Marvel series of movies, but still has some outliers like “Not Another Teen Movie” that require more isolated links to connect to the rest of his filmography.

Or a network somewhere in the middle like that of Mike Myers, an actor who has fame from several very different types of movies, his adult oriented comedies, as well as his family friendly Shrek series.

It even allows for the illustration of odd movie fun facts, like the absolutely insane lack of connections between the highlander franchise.

