Draft:Jesse Dodge
![]() | Review waiting, please be patient.
This may take 3 months or more, since drafts are reviewed in no specific order. There are 3,141 pending submissions waiting for review.
Where to get help
How to improve a draft
You can also browse Wikipedia:Featured articles and Wikipedia:Good articles to find examples of Wikipedia's best writing on topics similar to your proposed article. Improving your odds of a speedy review To improve your odds of a faster review, tag your draft with relevant WikiProject tags using the button below. This will let reviewers know a new draft has been submitted in their area of interest. For instance, if you wrote about a female astronomer, you would want to add the Biography, Astronomy, and Women scientists tags. Editor resources
Reviewer tools
|
Jesse Dodge | |
---|---|
Alma mater | University of Washington (BS) Carnegie Mellon University (MS, PhD) |
Scientific career | |
Institutions | Allen Institute for Artificial Intelligence |
Thesis | Towards Efficient and Reproducible Natural Language Processing (2020) |
Website | https://jessedodge.ai |
Jesse Dodge is an artificial intelligence (AI) research scientist whose research focuses on natural language processing (NLP), machine learning (ML) and emphasizes the importance of measuring and considering the environmental impact of AI.[1]
Education
[edit]Upon earning his bachelor's degree at the University of Washington in 2013, Dodge joined Carnegie Mellon University as a graduate student. In 2020, after he earned his Ph.D. in Language and Information Technology[2], he subsequently began his professional career at the Allen Institute for Artificial Intelligence (AI2), where he currently serves as a Research scientist.[3]
Research and career
[edit]In 2020, Dodge created the NLP Reproducibility Checklist, a research transparency requirement for all submitting authors to the 2020 conference on Empirical Methods in Natural Language Processing (EMNLP) 2020.[4] He also co-created the Responsible NLP Checklist which began being used for submissions to the Association for Computational Linguistics (ACL) Rolling Review starting in 2022.[5]
In 2021, Dodge published some the first documentation on the contents of the Colossal Clean Crawled Corpus (C4) dataset and their origin. This initiative advocated for the need for more transparency and thoughtfulness during the creation of large web-text corpora.[6] That same year, Dodge spoke at a panel at the Conference on Neural Information Processing Systems (NeurIPS) which discussed how a machine learning researcher should think about AI ethics.[7]
Dodge was a guest on the Practical AI podcast in 2021 alongside Chris Benson, Roy Schwartz, and Daniel Whitenack where they discussed computational efficiency in support of more environmentally friendly research practices in artificial intelligence.[8]
Dodge continued his work in large language model transparency and access by contributing to the 2023 release of Dolma, an open dataset for language model pre-training research[9][10], as well as the 2024 release of the OLMo language model and its accompanying OLMoTrace system, which enhances transparency in AI by tracing model outputs back to their training data.[11][12]
In 2024, Dodge was part of a team that met with Senator Martin Heinrich and nearly 100 members of The United States Congress in an event focused on AI and Climate to educate on the potential of AI as well as its impact on Earth's climate.[13]
During that time, Dodge also met with several major news outlets such as NPR, The Washington Post, and Fox News, to discuss the environmental considerations and importance of transparency in AI.[14][15][16] Dodge was interviewed by Ira Flatow on the public radio show Science Friday where he discussed the critical planning around data center placement and the impacts of nearby power sources as well as the potential policies and regulations that companies operating data centers could be subjected to.[17]
Awards and honors
[edit]- 2015 North American Chapter of the Association for Computational Linguistics – Human Language Technologies (NAACLL HLT) Best Student Paper[18]
- 2022 Association for Computational Linguistics Test-of-Time Paper Award[19]
- 2024 Association for Computational Linguistics Best Resource Paper Award[20]
- 2024 Association for Computational Linguistics Best Theme Paper Award[21]
References
[edit]- ^ Dodge, Jesse; Prewitt, Taylor; Remi Tachet Des Combes; Odmark, Erika; Schwartz, Roy; Strubell, Emma; Alexandra Sasha Luccioni; Smith, Noah A.; DeCario, Nicole; Buchanan, Will (2022). "Measuring the Carbon Intensity of AI in Cloud Instances". arXiv:2206.05229 [cs.LG].
- ^ University, Carnegie Mellon. "Jesse Dodge - Language Technologies Institute - School of Computer Science - Carnegie Mellon University". lti.cmu.edu. Retrieved 2025-05-15.
- ^ "OpenReview". OpenReview. Retrieved 2025-05-15.
- ^ "Call for Papers | EMNLP". Empirical Methods in Natural Language Processing. Archived from the original on 2023-05-30. Retrieved 2025-05-08.
- ^ Review, ACL Rolling. "ACL Rolling Review". ACL Rolling Review. Retrieved 2025-05-02.
- ^ Dodge, Jesse; Sap, Maarten; Marasović, Ana; Agnew, William; Ilharco, Gabriel; Groeneveld, Dirk; Mitchell, Margaret; Gardner, Matt (2021). "Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus". arXiv:2104.08758 [cs.CL].
- ^ "NeurIPS Panel How Should a Machine Learning Researcher Think About AI Ethics?". neurips.cc. Retrieved 2025-05-15.
- ^ "Practical AI". practicalai.fm. Archived from the original on 2025-03-18. Retrieved 2025-05-16.
- ^ Soldaini, Luca; et al. (2024). "Dolma: An Open Corpus of Three Trillion Tokens for Language Model Pretraining Research". arXiv:2402.00159 [cs.CL].
- ^ "Dolma | Ai2". allenai.org. Retrieved 2025-05-08.
- ^ Kerner, Sean Michael (2025-04-10). "What's inside the LLM? Ai2 OLMoTrace will 'trace' the source". VentureBeat. Retrieved 2025-05-13.
- ^ Groeneveld, Dirk; Beltagy, Iz; Walsh, Evan; Bhagia, Akshita; Kinney, Rodney; Tafjord, Oyvind; Jha, Ananya; Ivison, Hamish; Magnusson, Ian; Wang, Yizhong; Arora, Shane; Atkinson, David; Authur, Russell; Chandu, Khyathi; Cohan, Arman (2024-08-11). Ku, Lun-Wei; Martins, Andre; Srikumar, Vivek (eds.). "OLMo: Accelerating the Science of Language Models". Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Bangkok, Thailand: Association for Computational Linguistics: 15789–15809. doi:10.18653/v1/2024.acl-long.841.
- ^ "AI & climate: A first of its kind conversation on the hill | Ai2". allenai.org. Retrieved 2025-05-15.
- ^ "AI Revolution: Unintended consequences for the environment". KTVU FOX 2 San Francisco. 2024-01-05. Retrieved 2025-05-08.
- ^ "Artificial intelligence's thirst for electricity". NPR. Retrieved 2025-05-08.
- ^ Schaul, Kevin; Chen, Szu Yu; Tiku, Nitasha. "Inside the secret list of websites that make AI like ChatGPT sound smart". Washington Post. Retrieved 2025-05-08.
- ^ "Understanding And Curbing Generative AI's Energy Consumption". Archived from the original on 2025-04-03. Retrieved 2025-05-16.
- ^ "NAACL HLT 2015: Welcome". naacl.org. Retrieved 2025-05-02.
- ^ "Announcement of the 2022 ACL Test-of-Time Paper Award | ACL Member Portal". www.aclweb.org. Retrieved 2025-05-02.
- ^ "Best Paper Awards". ACL 2024. Retrieved 2025-05-02.
- ^ "Best Paper Awards". ACL 2024. Retrieved 2025-05-02.