-
Exploring the Science of Variable Relationships: A Guide to Structural Equation Modeling
Abstract: This article provides an overview of structural equation modeling (SEM), a statistical method widely used in various domains. The article explains what SEM is, its components, types, the mathematics, as well as the data, assumptions, and challenges associated with it. It also covers the five steps of building a SEM, an example of the… Continue reading
-
Predicting Employee Engagement and Satisfaction with Transformational Leadership
Using measures of transformational leadership, this research seeks to predict levels of employee engagement and employee satisfaction. Employee engagement “describes the level of enthusiasm and dedication a worker feels toward their job” (Smith, 2020, para. 1). Engaged employees “care about their work and about the performance of the company, and feel that their efforts make… Continue reading
-
Behavioral Insights from Website Analytics: An Ecommerce Case Study
Today I will explore several insights derived from my analysis. The focal points of discussion will encompass a range of subjects, including the assessment of browser and device compatibility with the webpage, identification of challenges encountered by users employing mobile devices, the concerning decline in add-to-cart rates, as well as the overall stagnant nature of… Continue reading
-
Bayesian Network: Infant Clinical Presentations
In this analysis I am going to build a Bayesian Network in R based on medical data to identify the likelihood of six possible diseases based on clinical presentations in infants. The idea and data for this analysis came from the CHILD network created by David J. Spiegelhalter, A. Philip Dawid, Steffen L. Lauritzen, Robert… Continue reading
-
Exploring Neonatal and Maternal Predictors of Germinal Matrix Hemorrhage in Low Birth Weight Infants with Logistic Regression and Odds Ratios
A low birth weight dataset records 100 births and whether or not these babies experiences a germinal matrix hemorrhage (grmhem). The babies’ 5 minute apgar score is recorded (apgar5) and also whether or not the mother had toxemia (tox) during her pregnancy. The odds ratio of whether a baby experienced a germinal matrix hemorrhage associated… Continue reading
-
Supply Chain Analytics: Mapping Coffeehouse Locations, Understanding Customers with Cohort Analysis, RFM Analysis, and K-Means Clustering, & Exploring Top-Quality Coffee Producers for Supplier Recommendations
# Visualizing Starbucks locationsimport numpy as npimport pandas as pdimport matplotlib.pyplot as pltimport seaborn as sns; sns.set()import cartopy.crs as ccrsfrom mpl_toolkits.basemap import Basemapfrom matplotlib.colors import Normalize%matplotlib inline# Load Starbucks datadf=pd.read_csv(‘Starbucks.csv’)# View the dataframedf.head() # Rank the top 10 countries with most number of stores top10 = df.Country.value_counts().head(10)top10 # Plot top 10 countriesfig, ax=plt.subplots(figsize=(10,5))ax=sns.barplot(y=top10.index, x=top10.values, ax=ax,… Continue reading
-
Bladder and Brain Cancer Survival Analysis
The Kaplan-Meier survival function gives the probability of surviving past time, t: 86 bladder cancer patients had tumors removed. After removal, these patients were separated into two groups, a placebo group (group 0), and a drug Thiopeta treatment group (group 1). The variable time in this dataset represents how many months until a tumor reoccurred… Continue reading
-
Comparing Self-Organizing Map, Complete Linkage Hierarchical Clustering, and Principal Component Analysis on the NCI Microarray Data
The NCI microarray data contains expression levels on 6830 genes from 64 cancer cell lines. library(ISLR)library(kohonen)nci<-NCI60$datalabels<-as.data.frame(NCI60$labs)labelstable(labels)nci.scaled <- scale(nci) #fit a SOMset.seed(21)som.grid <- somgrid(xdim = 4, ydim = 4, topo = “hexagonal”) nci.som <- som(nci.scaled, grid = som.grid, rlen = 1000)nci.som$codescodes <- nci.som$codes[[1]]nci.som$unit.classiftable(nci.som$unit.classif) quartz()plot(nci.som, type = “changes”, main = “NCI Data”)#plot vector of changes, the average… Continue reading
-
Comparing Machine Learning Models to Predict Death Due to Heart Failure and Diabetes
Part 1: Use Machine Learning to Predict a Death Event Due to Heart Failure The goal of this analysis is to compare machine learning methods when predicting a death event due to heart failure. The predictor variables of a death event include age, anaemia, creatinine phosphokinase, diabetes, ejection fraction, high blood pressure, platelets, serum creatinine,… Continue reading
-
Predicting Chronic Kidney Disease with Machine Learning in R
My goal with this analysis was to predict chronic kidney disease based on 24 attributes. The dimensions of the data are 400 by 25, including the dependent variables. There are 150 observations of patients who do not have chronic kidney disease and 250 observations of patients with chronic kidney disease. I hoped to create a… Continue reading
-
Exploring the State and Arrests Data with Hierarchical Clustering, Stars Plots, and a Self-Organizing Map
In the first part of this analysis I am going to show an example of hierarchical clustering and how correlation can help aid in the understanding of dendrogram results. I will then explore some star plots. State data released from the US department of Commerce, Bureau of the Census is available in R. I will… Continue reading
-
Exploring the State and Arrests Data with Hierarchical Clustering, Stars Plots, and a Self-Organizing Map
In the first part of this analysis I am going to show an example of hierarchical clustering and how correlation can help aid in the understanding of dendrogram results. I will then explore some star plots. State data released from the US department of Commerce, Bureau of the Census is available in R. I will… Continue reading
-
Using the Apriori Algorithm to Discover Association Rules
For this tutorial, I am going to use the transaction Income dataset from the R package arules. This dataset comes from the website for the book The Elements of Statistical Learning. Chapter 14 has information about association rules. Here is a link and a citation for that book: Hastie, T., Tibshirani, R. & Friedman, J. (2001) The Elements… Continue reading
-
Derive Generalized Association Rules by Disguising an Unsupervised Learning Problem as a Supervised Learning Problem with CART
The income data used for this analysis can be found under the marketing database provided by The Elements of Statistical Learning by Hastie, T., Tibshirani, R., & Friedman, J. H.. Here is the link: https://hastie.su.domains/ElemStatLearn/data.html Attribute information includes household income, sex, marital status, age, education, occupation, how long the person has lived in the San Francisco/Oakland/San Jose area… Continue reading
-
Hypothesis Testing: ANOVA, Chi-Square Goodness of Fit, One Sample Z-Test, One Sample T-Test, One Sample Variance Test, Two Sample Z-Test, Two Sample T-Test, Paired T-Test, Two Sample Variance Test
Hypothesis Testing: ANOVA The following is a compilation of a series of ANOVA tests conducted over the course of a few years. Analysis of Variance (ANOVA) is a statistical technique used to compare the means of groups to determine if there are any statistically significant differences between them. It helps in understanding whether there are variations… Continue reading
WORK & VOLUNTEER EXPERIENCE
Data Analyst
CenCore, LLC
2024 – Current
Mental Health Crisis Counselor
Crisis Text Line
2023 – 2024
Contributing Data Science Writer
Dev Genius
2022 – 2024
Research Assistant & Academic Writer
Utah State University
2019 – 2020
Behavior Technician
Wasatch Behavioral Health
2018 – 2019
