Prospective Students

My home department is the College of Information Science, and I am cross-listed (i.e. can supervise graduate students) in Computer Science and Linguistics.

I am a member of the Computational Language Understanding Lab (CLULAB) at the University of Arizona.

Prospective Students — A Field Guide

Are you a prospective student (undergraduate, masters, PhD) interested in working with me at the University of Arizona? That’s great! Due to the large volume of e-mails professors receive, this short field guide is designed to help save us both some time.

If you’re reading this, there’s likely space in the lab for another student. But before you fire up ChatGPT to write me that “perfect” email, please read on.

Non-Traditional Supervisory Style

The common style of supervision in computer science is for students to be largely independent, meeting with their faculty advisor briefly every week or two. Often, faculty run large labs, and as a result have to switch most of their focus from low-level scientific tasks like coding and experiments, to high-level tasks like management.

Conversely, while I am a Professor, I am also a boots-on-the-ground research scientist, and I’m knee-deep in code and data for most of my days. I prefer to work with students as collaborators, and frequently work alongside them daily, towards our common shared research goals. This means I invest heavily in each student — both in time and resources — because I’m genuinely interested in making progress on large scientific problems, and advancing your career through that process. The tradeoff is that I typically only work with a small number of students at a time.

Research Goals

Here’s what I care about:

Work on very ambitious and challenging projects that help advance the state of the science.

Be excellent stewards of public research dollars. Fun fact: Depending on the University, it can cost almost $100k/year in grant money to fund a single PhD student in the US. Most of that goes to tuition and overhead, not your stipend, but still — that’s taxpayer money. We should deliver an excellent return on public research funding.

Be unapologetically open source. Every project produces: (1) a paper, (2) public code/data/tools, (3) something others can actually use. If we’re not releasing it, we’re probably not doing it.

Be impactful. I want to build things that people genuinely use. Not “might cite in a related work section” use. Actually-help-solve-their-problem use.

Build careers. You should leave with a portfolio of work that makes people say “oh wow, they really know their stuff.” Papers, tools, skills — transferable credits for whatever comes next.

The Process

Here is how the process usually works:

  1. Learn about my work and see if there’s a fit
  2. Write me a high signal-to-noise e-mail (details below)
  3. If there’s potential, we’ll chat
  4. If we think there’s a good fit, we’ll do a code interview (ChatGPT not invited)

Step 1: Learn About What I Actually Do

I work in automated scientific discovery — broadly speaking. This includes literature-based discovery (like Theorizer), experiment-driven discovery (like CodeScientist), data-driven discovery, virtual environments (DiscoveryWorld, ScienceWorld), and related sub-tasks and agents.

If you’re not genuinely excited about these topics — or at least topics I’ve published on recently — we probably don’t have a good research fit. (And that’s fine! The world needs people working on all sorts of different problems.)

I highly recommend you find a few recent papers I’ve written that look exciting to you, read them in detail, and then dig through the code, data, and other resources. Get a feel for what building these projects actually looks like.

Step 2: Write Me a High Signal-to-Noise Email

Professors get a lot of emails from prospective students. Most are ChatGPT-generated noise that tell me nothing about the actual human on the other end. This section is designed to help you stand out. Whatever you do, please do not use ChatGPT or any other AI assistant to write your e-mail to me. While it might look unique to you, I promise when you get several of them a day, they all are very easily identifiable.

Full disclosure: I am objectively terrible at predicting whether someone will be a great research collaborator based on intuition alone. The only thing I’ve found that correlates with future success is past success. So here’s what I look for:

1. Strong Technical Skills

I work on problems that I personally find hard. Really hard. This means when we collaborate, you need to already have a strong technical foundation so we can spend our time on the exciting, difficult, science-advancing problems, not on teaching fundamentals.

What does “strong” mean for the work I do?

  • Extremely comfortable with data structures and algorithms (at the level of a practiced CS undergrad)
  • Already taken (or have equivalent knowledge of) an NLP course, and you’re well-practiced in these concepts
  • Ideally have AI/ML coursework and practice (though not always required)
  • Can architect software at the scale of thousands of lines of code, with comfort in OOP and basic software engineering

If you’ve only written short (~100-200 line) programs, short experiment scripts, or primarily have a numerical programming background rather than data structures/algorithms experience, there’s likely not a good fit. Similarly, if you’re using ChatGPT to write most of your code, we’re not a match.

2. Evidence of Significant Past Projects

The best predictor of future success is past success. I’d love to know about your wins, particularly as they relate to our shared research interests.

I’m looking for three things:

a) What does your code look like? What does it tell me about your current technical skills, and how you go about solving problems?

b) Have you completed substantial projects? Are they at a similar level to what would be required for a large technical project with me?

c) Do you understand research design fundamentals? At least the basics?

Red flags:

  • Code written by ChatGPT/LLMs
  • Only short projects or basic course assignments (sentiment analysis with BERT, etc.)
  • All projects are confidential/unavailable
  • Only coding exam experience (LeetCode, etc.), no actual projects
  • No research project experience (research is messy, open-ended, and frankly, not for everyone. If you haven’t successfully completed the scientific discovery process before, there’s real risk you won’t enjoy it once the initial excitement wears off and you’re trudging through the long, boring parts between coming up with an idea and eventually making it work.)

3. Strong Time Management

While I don’t agree with a lot of Steve Jobs quotes, one does resonate with me: in most professions, the “dynamic range” is low—the fastest taxi driver might be 2x faster than the slowest, and the fastest chef 3x faster than the slowest. But in computer science, good developers work 100x faster than slower ones.

You need to be somewhere vaguely near the middle or higher of that curve. You should know how to make substantial research progress most days, finish things with plenty of time to spare (because unexpected things always happen in research, and in life), and have the discipline to get everything done in a normal ~8-hour workday so you (and your teammates, including me!) can have work-life balance.

4. You Play Nice With Others

Kind, professional collaboration with the highest scientific integrity. Non-negotiable.

5. You Actually Like These Problems

It takes all kinds to push science forward. For us to work together, you need to be genuinely excited about the specific problems I work on — not just “AI” in the abstract.

It’s not uncommon for students to join a lab and then realize the research isn’t what they thought it would be. It happens, but it’s not a good use of resources. You should already know — from reading papers and playing with the code — that this work genuinely excites you. And you should already know, from whatever lab-based or self-directed research experience you have, that you enjoy research.

6. Be Genuine

Here’s the thing: I already know ChatGPT writes fantastic emails and great code. I’m not interested in interviewing ChatGPT. I’m interested in learning about you.

I want to understand what problems you find exciting. What drives your scientific journey. And yes, I want to see your actual current level of knowledge, technical ability, and communication skills — including how you write emails and code.

Please do not submit anything generated by a language model, coding assistant, or anything other than your own intellectual labor when contacting me.

Yes, I can tell. No, it’s not as subtle as you think.

Step 3: Finding Time to Chat

I get a lot of e-mail. I physically cannot respond to every one. If there’s a strong fit based on your email, I’ll reach out and we’ll take it from there.

If you’ve read this far and you are a human: please put [BANANA] at the beginning of your email subject line. It helps me quickly filter e-mails and confirms you’ve actually read these instructions.

But Wait: Unicorns!

Maybe you don’t fit the traditional mold. Maybe you’ve taken a non-traditional path, or don’t check all the boxes above, but you genuinely believe there’s a research fit.

That’s okay — I had a very non-traditional and interdisciplinary path myself. Please still reach out. Just make sure you can clearly articulate why you think there’s a fit despite the non-traditional path, and follow the instructions above about e-mail subject lines.

Closing Remarks

I’m looking for excellent student collaborators who are excited about automated scientific discovery, have strong technical chops, and are ready to dive into ambitious, challenging projects that advance the state of science.

If that’s you, I’d love to hear from you.