Week 3: Exploring alternate options

Since we could not find any valuable correlation when plotting all of the variables against each other, we decided to start thinking outside the box and use any way possible to get a prediction that could tell us if somebody will have a higher or lower emotional granularity based on the variables we were given. Through ways thought about in class and on my own, alternate options were developed to try to figure out how to predict emotional granularity based on the variables we were given.

The first thing I did was gather summary statistics, or moments, about the total electronic use, level of emotional difficulty, and emotional granularity. I have not done anything with these statistics, but I am sure they will be useful in future attempts to unravel the data.

Next, Professor Davis told us about how he had a colleague who was asked to predict something and couldn’t until she had a random idea to scale all of the variables and sum them. This random idea worked to predict, although it couldn’t tell her client what caused the predictions. Dr. Fugate also just asked us to predict somebody’s emotional granularity. Therefore, we could use this same tactic to try to predict somebody’s emotional granularity. That is exactly what I did. However, even after removing outliers, I still only found an r-squared value of 0.2544991. This means that only 25.4% of the emotional granularity can be explained by the summations I developed.

Scatterplot of the data described above
R-Squared calculation as described above

I then started doing my own research and thinking back to my class on R programming. I remembered how we used clusters and dendrograms and wondered if they could explain anything significant about this data. I immediately started remembering how to do this and created an elbow graph that would show me the ideal amount of clusters for the data (the largest change in slope at a point). The ideal amount of clusters ended up being 3, so I created a cluster plot using the “clusplot()” function. This returned to me 2 things; it returned a plot of clusters, obviously, but it also returned to me how much variability that the components explained. The components of the cluster plot explained about 60% of the point variability. I then added a dendrogram and split it into 3 sections, which is similar to the clustering method. I am unsure what to do with this information, or if it is even a good percentage, but I will try to find out.

The elbow chart used to determine how many clusters should be used. As you can see, the largest change in slope is at x=3.
Cluster plot of all variables with the % variability explained
Dendrogram of the same data

This cluster plot was slightly confusing to me, so I tried to do another version, just based on 2 variables. I created a cluster plot based on the emotional granularity, or RDEES, and the summation calculation that I described earlier. Again, I am not sure what a cluster plot will be able to tell us with this data, but it was worth a shot.

Different form of a cluster chart, but just as effective and easier to read

Finally, I tried two different dendrograms, both split into 4. One using the standard method and the other using the Agnes method. When doing this, I got an agglomerative coefficient of 0.92, which I’m assuming is good, but I am still unsure about this process that I’m testing.

Dendrogram split into 4 with an agglomerative coefficient of 0.92
Agnes dendrogram that also displays the agglomerative coefficient of 0.92

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Create your website with WordPress.com
Get started
%d bloggers like this: