Please review my SOP
I have pasted my SOP. Can you please review and let me know if it is fine?
As a child, I used to watch international cricket and wonder - genuinely wonder - how the broadcasters on television could predict what was about to happen before it did. Not just guessing. Actual prediction, based on something. My coach used to say that the best captains don't react to a match, they read it. I led a few teams myself and came to understand what he meant - that there's information already available in any situation, and the only question is whether you're paying attention to it or not. I did not know at the time that this instinct had a name or a discipline attached to it. What I knew was that patterns, once you learn to look for them, are everywhere. That way of looking is what has guided my academic choices, my work, and now my decision to apply for the Master of Science in Data Science at the University of Maryland.
Growing up in Nadiad, Gujarat, I was - for a long stretch of my early years - a quiet kid. The stammer had a lot to do with that. Speaking in groups was uncomfortable, introductions were awkward, and anything that required talking in front of people needed extra preparation on my part. In that context, the tabla became something of a refuge. Rhythm, after all, doesn't require you to say anything. I played in our school's morning assemblies, and for a while that was the version of participation I was most comfortable with. Cricket captaincy came later and worked differently - it forced me to communicate under pressure, to motivate people when a match was going poorly, and to make decisions with incomplete information. When I eventually changed schools, the social rebuild that followed was the most unglamorous kind of growth: slow, a little uncomfortable, and genuinely useful. My parents, through all of this, kept things uncomplicated. Work hard, stay grounded, don't make excuses. That's more or less what I've tried to do.
My undergraduate years at Vellore Institute of Technology, where I studied Computer Science with a specialisation in Data Science, gave me the formal vocabulary for what I'd been doing intuitively since those cricket-watching days. The coursework - machine learning, natural language processing, statistical modelling, deep learning - covered the fundamentals well, and I finished with a CGPA of 8.92. But the experience that sharpened my thinking most happened outside regular classes. In my second year, Samsung R&D Institute India conducted a campus selection exam and picked fifty students from my college for a project cohort. I was one of them. My team was assigned to train Bixby - Samsung's virtual assistant - to respond accurately to pharmaceutical queries from users. None of us had done anything close to this before. We gathered over a thousand drug-related queries, cleaned the data, built the pipeline, and kept iterating over several months. The assistant eventually reached 95% response accuracy. Of all the groups in the cohort, only two received the Certificate of Excellence - and we were one. What I kept thinking about afterwards, though, wasn't the accuracy figure. It was the times the model underperformed and why. Almost every failure traced back to the data, not the algorithm. That lesson - that the quality of what you feed a model matters more than almost any other variable - stayed with me far longer than the certificate did.
Besides formal projects, I found myself picking up problems I'd noticed around me. During my later undergraduate years, I watched seniors spend a lot of time applying for jobs without much clarity on whether their profiles actually matched what a role required. I started thinking about whether a system could do that matching automatically - read a resume, extract what was actually relevant, and tell you plainly where you stood. So I collected over a thousand publicly available resumes, cleaned and labelled the dataset, and built an NLP-based model to extract details and categorise candidates by job suitability. It reached 85% classification accuracy, and I designed it so that recruiters processing high volumes of applications could use it as well, not just individual applicants. Around the same time, I built a fake news detection model - sourced about five thousand labelled news articles, applied logistic regression, neural networks, TF-IDF and word embeddings, and got to 92% accuracy with a 15% improvement over baseline after tightening the feature engineering. Neither of these was assigned work. I was simply curious about whether the problem was solvable, and wanted to find out.
My machine learning internship at WebLineIndia gave me my first real encounter with how different building something for actual use is from building something for evaluation. I was asked to develop an Audio Emotion Recognition model - something capable of detecting from a speaker's voice whether they sounded confident, anxious, flat, or distressed. The dataset I eventually settled on had over 5,000 audio samples, but the recordings were in poor condition: background noise, distortion, inconsistent quality throughout. Using Librosa and Wav2Vec2 for preprocessing, I cleaned and prepared the data before training a HuBERT-based model through the Hugging Face Trainer API. Gradient accumulation and padding strategies brought training efficiency up by 20%, and the model settled at 85% accuracy - tracked consistently using Weights and Biases. The company deployed it for candidate screening, which meant it had to hold up under actual, repeated use by people who weren't me. That gap - between a model that performs well in testing and one that holds up in deployment - was something I hadn't fully appreciated before that internship. It made me want to understand production systems better, which is part of why, when Ford Motor Company offered a Software Engineer internship, I took it. I joined a team working on a large-scale migration of Ford's Government Bid Management System from legacy JSF architecture to Spring Boot. I contributed to RESTful API development and participated in security auditing that identified and resolved a significant number of critical vulnerabilities. When a full-time offer followed the internship, I accepted it - not because my interest in machine learning had changed, but because I wanted the engineering grounding that I felt was still missing. After nearly a year in the role, that foundation is now in place. What I want next is the depth in statistical learning and data systems that a rigorous graduate programme can actually provide, and the environment to do original research rather than just apply existing methods.
My research interests have, over time, settled around Natural Language Processing and the question of model interpretability - specifically, how much we actually understand about why a model produces the outputs it does. The Samsung project raised this question for me early on, and my work on resume parsing and the audio model kept returning to it. In this regard, Professor Jordan Boyd-Graber's work at UMD is directly relevant to what I want to study. His research on making machine learning systems more useful and interpretable, and particularly his work on topic models and question answering systems designed to interact with and learn from humans, connects closely with the kind of problems I've been working around since the Bixby project. I would very much like to work under his guidance. The broader research environment at UMD - the Center for Machine Learning and the CLIP lab in particular - offers the kind of interdisciplinary structure where work across NLP, systems and human-AI interaction can actually happen in the same space.
The structure of UMD's MS in Data Science programme - the way it covers machine learning, statistical foundations, big data systems and data mining alongside communication and applied work - reflects what I think a serious graduate programme in this area should look like. I'm coming in with an 8.92 undergraduate GPA, AWS Certified Cloud Practitioner and Oracle Cloud AI certifications, and a year of full-time industry experience on top of the internships. Beyond that, I carry a way of working that has been shaped by a lot of iteration, a fair amount of failure, and a consistent habit of going back and figuring out what actually went wrong. UMD is where I want to do the next part of that.