Clustering US Senators based on how they voted

In this project, we’ll work on clustering US Senators based on how they voted. We will try to find out whether the senators voted along party line or choose to be unaffiliated with a party?

What is unsupervised learning?

A major type of machine learning is called unsupervised learning.In unsupervised learning, we aren’t trying to predict anything. Instead, we’re finding patterns in data.One of the main unsupervised learning techniques is called clustering

Clustering

Clustering algorithms group similar rows together. There can be one or more groups in the data, and these groups form the clusters. As we look at the clusters, we can start to better understand the structure of the data.Clustering is a key way to explore unknown data, and it’s a very commonly used machine learning technique.

The Dataset

In the US, the Senate votes on proposed legislation. Getting a bill passed by the Senate is a key step towards getting its provisions enacted. A majority vote is required to get a bill passed.The results of these votes, known as roll call votes are public.

You can visit my  GitHub repo for complete code.

Senators typically vote in accordance with how their political party votes, known as voting along party lines. In the US, the 2 main political parties are the Democrats, who tend to be liberal, and the Republicans, who tend to be conservative. Senators can also choose to be unaffiliated with a party, and vote as Independents, although very few choose to do so.

114_congress.csv contains all of the results of roll call votes from the 114th Senate. Each row represents a single Senator, and each column represents a vote. A 0 in a cell means the Senator voted No on the bill, 1 means the Senator voted Yes, and 0.5 means the Senator abstained.

Here are the relevant columns:

name — The last name of the Senator.
party — the party of the Senator. The valid values are D for Democrat, R for Republican, and I for Independent.
Several columns numbered like 00001, 00004, etc. Each of these columns represents the results of a single roll call vote.

Below are the first few rows and columns of the data.

name party state 00001 00004 00005 00006
Alexander R TN 0 1 1 1
Ayotte R NH 0 1 1 1
Baldwin D WI 1 0 0 1
Barrasso R WY 0 1 1 1
Bennet D CO 0 0 0 1

We’ll use an algorithm called k-means clustering to split our data into clusters. k-means clustering uses Euclidean distance to form clusters of similar Senators.

The k-means algorithm will group Senators who vote similarly on bills together, in clusters. Each cluster is assigned a center, and the Euclidean distance from each Senator to the center is computed. Senators are assigned to clusters based on which one they are closest to. From our background knowledge, we think that Senators will cluster along party lines.

The k-means algorithm requires us to specify the number of clusters upfront. Because we suspect that clusters will occur along party lines, and the vast majority of Senators are either Republicans or Democrats, we’ll pick 2 for our number of clusters.

Let’s get started

import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
votes=pd.read_csv("114_congress.csv")
votes.head(5)
votes['party'].value_counts()
R 54
D 44
I 2
from sklearn.cluster import KMeans
kmeans_model=KMeans(n_clusters=2,random_state=1) 
senator_distances=kmeans_model.fit_transform(votes.iloc[:,3:]) 
labels=kmeans_model.labels_ pd.crosstab(labels,votes['party'])
party D I R
labels
0 41 2 0
1 3 0 54
pd.crosstab(labels,votes['party']).plot(kind='bar',stacked=True)
x=[0,1]
l=['cluster1','cluster2']
plt.xticks(x,l)
plt.title('Clustering')
plt.xlabel('Clusters')
plt.ylabel('No. of Senators')
plt.tick_params(bottom='off',top='off',right='off',left='off')
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
plt.show()

c2

Democrats  like Republicans

democratic_outlier=votes[(labels==1) & (votes['party']=='D')]
democratic_outlier
id


42

name


Heitkamp

party


D

56 Manchin D
74 Reid D

Independents like Democrats

independents_like_democrats=votes[(labels==0) &(votes['party']=='I')]
independents_like_democrats
id


50

name


King

party


I

79 Sanders I

Radical Republicans!!

extremism = (senator_distances ** 3).sum(axis=1)
votes['extremism']=extremism
votes.sort_values('extremism',inplace=True,ascending=False)
id


98

name


Wicker

party


R

extremism


46.250476

53 Lankford R 46.046873
69 Paul R 46.046873
80 Sasse R 46.046873
26 Cruz R 46.046873

Conclusions

Based on the voting patterns we could conclude that 3 Democrats were very much similar to the Republicans, the two independents voted in a similar manner like the Democrats.We were able to find the most radical Republican as well, that’s the power of clustering.

Thanks

 

Design a site like this with WordPress.com
Get started