NISHANTH DIKKALA

I am a Senior Research Scientist at Google Research in Mountain View working on Large Language Models. I focus on improving reasoning capabilities of LLMs, improving their training and inference efficiency via novel architecture changes. I also have an interest in developing a foundational understanding of deep learning and have worked on unsupervised learning, sparsity within Transformers and external trainable memory modules for Transformer architectures. I graduated with a Ph.D. from the Computer Science department at MIT in 2020 (Advisor: Prof. Costis Daskalakis). My Ph.D. thesis focused on handling complex graph depenedency structures when doing statistics with data. Before MIT, I spent 4 years at IIT Bombay from where I graduated with a B. Tech. in Computer Science and Engineering. In Summer 2018, I was an intern at Microsoft Research New England. I worked on Econometrics and Optimization problems with Vasilis Syrgkanis and Greg Lewis. I have also interned in 2019 as a Quantitative Researcher at Jump Trading, Chicago, where I was working on prediction problems in finance with a focus on high-frequency trading strategies.

Publications

  1. Causal Language Modeling can Elicit Search and Reasoning Capabilities for Sudoku Puzzles with Kulin Shah, Xin Wang and Rina Panigrahy. under submission
  2. Rethinking Chain-of-Thought Reasoning via Looped Models with Nikunj Saunshi, Zhiyuan Li, Sashank J. Reddi and Sanjiv Kumar. under submission
  3. ReMI: A Dataset for Reasoning with Multiple Images. with Mehran Kazemi, Ankit Anand, Petar Devic, Ishita Dasgupta, Fangyu Liu, Bahare Fatemi, Pranjal Awasthi, Dee Guo, Sreenivas Gollapudi and Ahmed Qureshi. under submission.
    arXiv
  4. Learning Neural Networks with Sparse Activations with Pranjal Awasthi, Pritish Kamath and Raghu Meka. published at COLT 2024.
    arXiv
  5. Alternating Updates for Efficient Transformers with Cenk Baykal, Dylan Cutler, Nikhil Ghosh, Rina Panigrahy and Xin Wang, published at NeurIPS 2023 (Spotlight).
    arXiv
  6. On the Benefits of Learning to Roue in Mixture of Expert Models with Nikhil Ghosh, Raghu Meka, Rina Panigrahy, Nikhil Vyas and Xin Wang, published at EMNLP 2023.
    Paper
  7. A Theoretical View on Sparsely Activated Networks with Cenk Baykal, Rina Panigrahy, Cyrus Rashtchian and Xin Wang, published at NeurIPS 2022.
    arXiv
  8. Sketching based representations for robust image classification with provable guarantees with Sankeerth Rao Karingula, Raghu Meka, Jelani Nelson, Rina Panigrahy and Xin Wang, published at NeurIPS 2022.
    Paper
  9. Do More Negative Samples Necessarily Hurt in Contrastive Learning? with Pranjal Awasthi and Pritish Kamath. published at ICML 2022 (Oral).
    arXiv
  10. Estimating Ising Models from One Sample with Yuval Dagan, Constantinos Daskalakis and Anthimos Vardis Kandiros. published at STOC 2021.
    arXiv
  11. Statistical Estimation from Dependent Data with Yuval Dagan, Constantinos Daskalakis, Surbhi Goel and Anthimos Vardis Kandiros. published at ICML 2021.
    arXiv
  12. Minimax Estimation of Conditional Moment Models with Greg Lewis , Lester Mackey and Vasilis Syrgkanis published at NeurIPS 2020.
    arXiv
  13. Logistic Regression with Peer-Group Effects via Inference in Higher-Order Ising Models. with Constantinos Daskalakis and Ioannis Panageas, accepted for publication at AISTATS 2020.
    arXiv
  14. Generalization and Learning under Dobrushin's Condition. with Yuval Dagan , Constantinos Daskalakis and Siddhartha Jayanti , accepted for publication at COLT 2019.
    arXiv
  15. Regression from Dependent Observations with Constantinos Daskalakis and Ioannis Panageas , accepted for publication at STOC 2019.
    arXiv
  16. Post-Processing Calibrated Classifiers. with Ran Canetti , Aloni Cohen , Govind Ramnarayan , Sarah Scheffler and Adam Smith , in the 2019 Conference on Fairness, Transparency and Accountability (FAT* 2019).
    arXiv
  17. HOGWILD!-Gibbs can be PanAccurate. with Constantinos Daskalakis and Siddhartha Jayanti, published at NeurIPS 2018.
    Poster
  18. Testing Symmetric Markov Chains from a Single Trajectory with with Constantinos Daskalakis and Nick Gravin, published at COLT 2018.
    arXiv
  19. Concentration of Multilinear Functions of the Ising Model with Applications to Network Data with Constantinos Daskalakis and Gautam Kamath, published at NeurIPS 2017.
    arXiv, Video
  20. Testing Ising Models with Constantinos Daskalakis and Gautam Kamath,
    published at SODA 2018.
    arXiv
    Featured in: Property Testing Review
  21. Tight Hardness Results for Maximum Weight Rectangles with Arturs Backurs and Christos Tzamos.
    published at ICALP 2016.
    arXiv
  22. Effect of Strategic Grading and Early Offers in Matching Markets with Hedyeh Beyhaghi and Éva Tardos.
    brief announcement in SAGT 2015.
    arXiv
  23. Can Credit Increase Revenue? with Éva Tardos.
    in the 9th ACM international conference on Web and Internet Economics (WINE 2013).
    PDF