Georgia Tech and Meta Introduce Open Dataset for Carbon Capture AI

To avoid catastrophic impacts on the climate, excessive CO2 emissions must be combated. At this point, it is not enough to reduce emissions. Direct air capture, a technology that removes carbon dioxide from the surrounding air, has great potential to help solve the problem.

But there is a big challenge. For direct air capture technology, each type of environment and location requires a unique, specific design. A direct air capture configuration in Texas, for example, would inevitably be different from one in Iceland. These systems must be designed with precise humidity, temperature and airflow parameters for each location.

Now Georgia Tech and Meta have collaborated to create a comprehensive database that may make developing and implementing direct air sensing technologies easier and faster. The open-source database allowed the team to train an AI model that is orders of magnitude faster than existing chemistry simulations. The project, called OpenDAC, could accelerate climate solutions that the planet desperately needs.

The team's research was published in ACS Central Science, a journal of the American Chemical Society.

“For direct air sensing, there are many ideas about how to best exploit the airflows and temperature fluctuations of a given environment,” said Andrew J. Medford, an associate professor in the School of Chemical and Biomolecular Engineering (ChBE) and a lead author of the paper. “But a big problem is finding a material that can efficiently capture carbon under the specific conditions of each environment.”

Their idea was to “create a database and a set of tools to generally help engineers find the right material that can work,” Medford said. “We wanted to use computers to take them from not knowing where to start to giving them a solid list of materials that they could synthesize and try out.”

The team believes it is the largest and most robust data set of its kind, containing reaction data for 8,400 different materials and based on nearly 40 million quantum mechanical calculations.

Building a partnership (and a database)

Researchers from Meta's Fundamental AI Research (FAIR) team looked for ways to use their machine learning capabilities to combat climate change. They landed on direct air capture as a promising technology and needed to find a partner with expertise in materials chemistry related to carbon capture. They went straight to Georgia Tech.

David Sholl, ChBE Professor, Cecile L. and David IJ Wang Faculty Fellow and director of the Transformational Decarbonization Initiative at Oak Ridge National Laboratory, is one of the world's leading experts on metal-organic frameworks (MOFs). This is a class of materials that hold promise for direct air capture due to their cage-like structure and proven ability to attract and capture carbon dioxide. Sholl brought Medford into the project, who specializes in applying machine learning models to atomistic and quantum mechanical simulations related to chemistry.

Sholl, Medford and their students provided all input for the database. Because the database predicts the MOF interactions and the energy output of these interactions, extensive information was required.

They had to know the structure of almost all known MOFs – both the MOF structure itself and the structure of the MOF interacting with carbon dioxide and water molecules.

“To predict what a material might do, you need to know where every single atom is and what chemical element it is,” Medford said. “Finding out the inputs to the database was half the problem, and that’s where our Georgia Tech team brought core expertise.”

The team used large collections of MOF structures that Sholl and his collaborators had previously developed. They also created a large collection of structures that incorporated imperfections of practical materials.

The power of machine learning

Anuroop Sriram, research engineer at FAIR and first author of the paper, created the database by performing quantum chemical calculations using input provided by the Georgia Tech team. These calculations required approximately 400 million CPU hours, which represents hundreds of times more computing power than the average academic computer lab can do in a year.

FAIR also trained machine learning models on the database. After training the machine learning models on the 40 million calculations, they were able to accurately predict how the thousands of MOFs would interact with carbon dioxide.

The team showed that their AI models are powerful new tools for materials discovery, offering comparable accuracy to traditional quantum chemical calculations while being much faster. These features will allow other researchers to expand their work to explore many other MOFs in the future.

“Our goal was to examine the abundance of all known MOFs and find the ones that attract carbon dioxide most strongly while not attracting other components of air such as water vapor, using these high-precision quantum calculations,” Sriram said. “To our knowledge, this is something no other carbon capture database has been able to do to date.”

Using their own database, the Georgia Tech and Meta teams identified approximately 241 MOFs with exceptionally high potential for direct aerial capture.

Move forward with impact

“According to the United Nations and most developed countries, we need to achieve net zero carbon dioxide emissions by 2050,” said Matt Uyttendaele, director of Meta’s FAIR chemistry team and co-author of the paper. “Most of this needs to happen through a complete halt to carbon emissions, but we also need to address historic carbon emissions and economic sectors that are very difficult to decarbonize – like aviation and heavy industry. Therefore, CO2 removal technologies such as direct air capture must be introduced online in the next 25 years.

While direct aerial sensing is still a nascent field, the researchers say it is critical that groundbreaking tools – such as the OpenDAC database provided in the team's work – are now in development.

“There will be no single solution that gets us to net zero emissions,” Sriram said. “Direct air capture has great potential but needs to be scaled up significantly before we can have a real impact. I think the only way to get there is to find better materials.”

Researchers from both teams hope that the scientific community will join in the search for suitable materials. The entire OpenDAC dataset project is open source, from the data to the models to the algorithms.

“I hope this accelerates the development of negative emissions technologies such as direct air capture that may not otherwise have been possible,” Medford said. “As a species, we have to solve this problem at some point. I hope this work can help get us there and I think it has a real chance of achieving that.”

Note: Georgia Tech ChBE graduate students Sihoon Choi, Logan Brabson, and Xiaohan Yu made important contributions and are co-authors of the article.

Citation: A. Sriram et al., The Open DAC 2023 Dataset and Challenges for Sorbent Discovery in Direct Air Capture, ACS Central Science (2024).


Anna Harden

Learn More →

Leave a Reply

Your email address will not be published. Required fields are marked *