Machine Learning – Gaurav Agrawal

In this project, we’ll work on clustering US Senators based on how they voted. We will try to find out whether the senators voted along party line or choose to be unaffiliated with a party?

What is unsupervised learning?

A major type of machine learning is called unsupervised learning.In unsupervised learning, we aren’t trying to predict anything. Instead, we’re finding patterns in data.One of the main unsupervised learning techniques is called clustering

Clustering

Clustering algorithms group similar rows together. There can be one or more groups in the data, and these groups form the clusters. As we look at the clusters, we can start to better understand the structure of the data.Clustering is a key way to explore unknown data, and it’s a very commonly used machine learning technique.

The Dataset

In the US, the Senate votes on proposed legislation. Getting a bill passed by the Senate is a key step towards getting its provisions enacted. A majority vote is required to get a bill passed.The results of these votes, known as roll call votes are public.

You can visit my GitHub repo for complete code.

Senators typically vote in accordance with how their political party votes, known as voting along party lines. In the US, the 2 main political parties are the Democrats, who tend to be liberal, and the Republicans, who tend to be conservative. Senators can also choose to be unaffiliated with a party, and vote as Independents, although very few choose to do so.

114_congress.csv contains all of the results of roll call votes from the 114th Senate. Each row represents a single Senator, and each column represents a vote. A 0 in a cell means the Senator voted No on the bill, 1 means the Senator voted Yes, and 0.5 means the Senator abstained.

Here are the relevant columns:

name — The last name of the Senator.
party — the party of the Senator. The valid values are D for Democrat, R for Republican, and I for Independent.
Several columns numbered like 00001, 00004, etc. Each of these columns represents the results of a single roll call vote.

Below are the first few rows and columns of the data.

name	party	state	00001	00004	00005	00006
Alexander	R	TN	0	1	1	1
Ayotte	R	NH	0	1	1	1
Baldwin	D	WI	1	0	0	1
Barrasso	R	WY	0	1	1	1
Bennet	D	CO	0	0	0	1

We’ll use an algorithm called k-means clustering to split our data into clusters. k-means clustering uses Euclidean distance to form clusters of similar Senators.

The k-means algorithm will group Senators who vote similarly on bills together, in clusters. Each cluster is assigned a center, and the Euclidean distance from each Senator to the center is computed. Senators are assigned to clusters based on which one they are closest to. From our background knowledge, we think that Senators will cluster along party lines.

The k-means algorithm requires us to specify the number of clusters upfront. Because we suspect that clusters will occur along party lines, and the vast majority of Senators are either Republicans or Democrats, we’ll pick 2 for our number of clusters.

Let’s get started

import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
votes=pd.read_csv("114_congress.csv")
votes.head(5)

votes['party'].value_counts()

R	54
D	44
I	2

from sklearn.cluster import KMeans
kmeans_model=KMeans(n_clusters=2,random_state=1) 
senator_distances=kmeans_model.fit_transform(votes.iloc[:,3:]) 
labels=kmeans_model.labels_ pd.crosstab(labels,votes['party'])

party	D	I	R
labels
0	41	2	0
1	3	0	54

pd.crosstab(labels,votes['party']).plot(kind='bar',stacked=True)
x=[0,1]
l=['cluster1','cluster2']
plt.xticks(x,l)
plt.title('Clustering')
plt.xlabel('Clusters')
plt.ylabel('No. of Senators')
plt.tick_params(bottom='off',top='off',right='off',left='off')
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
plt.show()

Democrats like Republicans

democratic_outlier=votes[(labels==1) & (votes['party']=='D')]
democratic_outlier

id 42	name Heitkamp	party D
56	Manchin	D
74	Reid	D

name

Heitkamp

party

Manchin

Reid

Independents like Democrats

independents_like_democrats=votes[(labels==0) &(votes['party']=='I')]
independents_like_democrats

id 50	name King	party I
79	Sanders	I

name

King

party

Sanders

Radical Republicans!!

extremism = (senator_distances ** 3).sum(axis=1)
votes['extremism']=extremism
votes.sort_values('extremism',inplace=True,ascending=False)

id 98	name Wicker	party R	extremism 46.250476
53	Lankford	R	46.046873
69	Paul	R	46.046873
80	Sasse	R	46.046873
26	Cruz	R	46.046873

Conclusions

Based on the voting patterns we could conclude that 3 Democrats were very much similar to the Republicans, the two independents voted in a similar manner like the Democrats.We were able to find the most radical Republican as well, that’s the power of clustering.

Thanks

Tag: Machine Learning

Clustering US Senators based on how they voted

What is unsupervised learning?

Clustering

The Dataset

Let’s get started

Democrats like Republicans

Independents like Democrats

Radical Republicans!!

Conclusions

What is unsupervised learning?

Clustering

The Dataset

Let’s get started

Democrats like Republicans

Independents like Democrats

Radical Republicans!!

Conclusions

Share this article: