I am a Senior Research Scientist at Google Research in Mountain View working on Large Language Models. I focus on improving reasoning capabilities of LLMs, improving their training and inference efficiency via novel architecture changes. I also have an interest in developing a foundational understanding of deep learning and have worked on unsupervised learning, sparsity within Transformers and external trainable memory modules for Transformer architectures. I graduated with a Ph.D. from the Computer Science department at MIT in 2020 (Advisor: Prof. Costis Daskalakis). My Ph.D. thesis focused on handling complex graph depenedency structures when doing statistics with data. Before MIT, I spent 4 years at IIT Bombay from where I graduated with a B. Tech. in Computer Science and Engineering. In Summer 2018, I was an intern at Microsoft Research New England. I worked on Econometrics and Optimization problems with Vasilis Syrgkanis and Greg Lewis. I have also interned in 2019 as a Quantitative Researcher at Jump Trading, Chicago, where I was working on prediction problems in finance with a focus on high-frequency trading strategies.


  1. Learning Neural Networks with Sparse Activations with Pranjal Awasthi, Pritish Kamath and Raghu Meka. under submission
  2. On the Benefits of Learning to Roue in Mixture of Expert Models with Nikhil Ghosh, Raghu Meka, Rina Panigrahy, Nikhil Vyas and Xin Wang, EMNLP 2023.
  3. Minimax Estimation of Conditional Moment Models with Greg Lewis , Lester Mackey and Vasilis Syrgkanis .
  4. Estimating Ising Models from One Sample with Yuval Dagan, Constantinos Daskalakis and Anthimos Vardis Kandiros.
  5. Logistic Regression with Peer-Group Effects via Inference in Higher-Order Ising Models. with Constantinos Daskalakis and Ioannis Panageas, accepted for publication at AISTATS 2020.
  6. Generalization and Learning under Dobrushin's Condition. with Yuval Dagan , Constantinos Daskalakis and Siddhartha Jayanti , accepted for publication at COLT 2019.
  7. Regression from Dependent Observations with Constantinos Daskalakis and Ioannis Panageas , accepted for publication at STOC 2019.
  8. Post-Processing Calibrated Classifiers. with Ran Canetti , Aloni Cohen , Govind Ramnarayan , Sarah Scheffler and Adam Smith , in the 2019 Conference on Fairness, Transparency and Accountability (FAT* 2019).
  9. HOGWILD!-Gibbs can be PanAccurate. with Constantinos Daskalakis and Siddhartha Jayanti, in the 32nd Annual Conference on Neural Information Processing Systems (NeurIPS 2018).
  10. Testing Symmetric Markov Chains from a Single Trajectory with with Constantinos Daskalakis and Nick Gravin, in the 31st Annual Conference on Learning Theory (COLT 2018).
  11. Concentration of Multilinear Functions of the Ising Model with Applications to Network Data with Constantinos Daskalakis and Gautam Kamath, in the 31st Annual Conference on Neural Information Processing Systems (NeurIPS 2017).
    arXiv, Video
  12. Testing Ising Models with Constantinos Daskalakis and Gautam Kamath,
    in the Proceedings of the 29th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2018).
    Featured in: Property Testing Review
  13. Tight Hardness Results for Maximum Weight Rectangles with Arturs Backurs and Christos Tzamos.
    in the 43rd International Colloquium on Automata, Languages and Programming (ICALP 2016).
  14. Effect of Strategic Grading and Early Offers in Matching Markets with Hedyeh Beyhaghi and Éva Tardos.
    brief announcement in SAGT 2015.
  15. Can Credit Increase Revenue? with Éva Tardos.
    in the 9th ACM international conference on Web and Internet Economics (WINE 2013).