ericmshen | projects

Side Projects

Well, forthcoming!

Academic Projects

Differential Expressiveness
4/2024
For my undergraduate thesis I delved into the principles of algorithmic fairness. I began with an overview of the field, which has its philosophical roots in the modern conceptions of justice conceived by the work of Rawls and his peers and whose relevance has more recently been motivated by troubling real-world case studies. I then proposed and explored a framework for "differential expressiveness", a different metric for evaluating algorithmic bias that formalizes the idea of the same value (or distribution) of a feature meaning very different things semantically across different groups.
A Hierarchical Model for Degree-Heterogenous Random Graphs
12/2023
The final project for a statistics course I completed during my concurrent Master's, a teammate and I formulated a new model for generating random graphs whose nodes have varying degree (i.e. numbers of connections) by considering a setup where different nodes have different "popularities", drawn from some latent distribution. We then outlined methodologies for performing statistical inference and simulating graph generation using our model.
Analyzing CoT Prompting in LLMs via Gradient-Based Feature Attributions
7/2023
I worked with a team to inspect the then-nascent and ill-understood technique of chain-of-thought prompting using simple gradient-based techniques (calculating "saliency scores" based on the gradient of a language model's prediction, token-by-token). But these techniques proved too crude, like performing surgery with a fork and knife, and our results were noisy; as the field of interpretability would learn, more sophisticated tools would be needed to get us closer. Workshopped at ICML 2023.
Weighted Consecutive-Loss Rules for Justified Representation in Perpetual Voting
1/2023
A peer and I studied weighting schemes in perpetual voting settings: given an electorate that votes repeatedly and arbitrarily often, how should we define fairness in terms of who gets elected, and how should votes be weighted in order to satisfy those criteria? We investigated several classes of such voting "rules" and derivde some theoretical guarantees; I mainly contributed to simulation work for seeing how our rules fared in practice.
Cluster-Robust Standard Errors
12/2022
With a team I produced a review of statistical techniques in cluster-robust regression: a flavor of inference that takes into account the fact that data might fall into different clumps, within which the standard errors of a linear model might fail to be uncorrelated. We summarized various estimation techniques for more accurate error modeling in these scenarios, replete with a tidy bit of R demo code.
Lab Work at DtAK
5/2022
I performed some miscellaneous lab work with the Data to Actionable Knowledge lab at Harvard's School of Engineering and Applied Sciences. I wrote code to test the behavior of toy neural networks in continual learning settings on binary classification tasks (i.e. where the network is shown, between training epochs, new data that complicates the decision boundary), and earlier tested the effectiveness of the Bayes Factor and related metrics in non-nested model selection, where it's harder to strictly compare the relative complexity of two models.
Surveying Differential Privacy in Regression Methods
4/2022
Differential privacy is a young branch of computer science that seeks to analyze sensitive data in a way that preserves the anonymity of the data source. The challenge therein is doing this in a relatively high-fidelity way, so that we still get statistical results that aren't too noisy. I surveyed and implemented some extant techniques for performing linear regression under this framework, discovering empirically that this aforementioned challenge remains a very real pain.