Eco-friendly practices

Being mindful of the environmental impact of our activity is crucial in order to aim for a sustainable future. Indeed, the field of big data is expanding with the implied consequences – e.g. energy consumption, data storage and important carbon footprint.

Information technology and computing clusters have a substantial environmental impact, primarily through their intensive consumption of primary resources. These systems rely heavily on rare and non-renewable resources such as rare earth metals, which are essential for manufacturing semiconductors and high-performance computing components. The extraction and processing of these resources can lead to habitat destruction, pollution, and ecosystem disruption. Additionally, the energy-intensive nature of data centers and clusters, required for the processing and storage of data, places significant demands on water resources for cooling purposes.

Considering how urgent the situation is, making the research stakeholders aware of these issues is key to achieve the necessary sustainable goals for humanity to survive.

Here are some good practices to save time and reduce your carbon footprint.

Firstly,

  1. For extensive analyses, consider whether launching the analysis is truly necessary.

  2. If you have a substantial number of similar processes (e.g., numerous fastq files to analyze),

    • launch the process on a single file to assess your memory, CPU, and time requirements.

    • Utilize the ‘seff’ command to gather information about the completed job,

    • then launch the job array with the adjusted parameters.

  3. If you are using a workflow manager,
    • review the default resources in the configuration and reduce them if necessary. You can also employ the aforementioned approach to fine-tune the values.
    • Pause processing at quality check steps to perform them only if the quality is sufficient for the next step.

——

Here is the table 1 “Carbon Footprint of a Range of Bioinformatic Tasks.” of the article titled “The Carbon Footprint of Bioinformatics,” published in Molecular Biology and Evolution, Volume 39, Issue 3, March 2022:

This article investigates the carbon footprint associated with bioinformatics, a field that utilizes computational tools to analyze and interpret biological data. The authors explore various stages of the bioinformatics process, including data collection, analysis, storage, and sharing.

The study assesses how these different stages contribute to greenhouse gas emissions and examines factors influencing the carbon footprint of bioinformatics. The authors also discuss the implications of their findings, emphasizing the importance of considering environmental impact in bioinformatics research.

In summary, the article highlights the need to be aware of the carbon footprint associated with bioinformatics and suggests avenues for reducing its environmental impact.

Genome scaffolding

Tool Version Details about the Experiments Carbon Footprint Tree-months km in a Car (EU) Running Time and Memory Approximate Scaling (if known)
      Increase (%) kgCO2e        
SSPACE 2.0 Scaffolding 2.4 million long reads from human chromosome 14 (Hunt et al. 2014). 0.0010 0.0011 0.01 3 min 21 s / 30 GB  
SOAPdenovo2 r223 +45% 0.0015 0.0016 0.01 4 min 52 s / 30 GB Linearly with number of reads.
SOAPdenovo2 r223 +2,752% 0.029 0.032 0.17 1 h 35 min / 30 GB  
                 
SSPACE 2.0 Scaffolding 23 million short reads from human chromosome 14 (Hunt et al. 2014). 0.0027 0.0029 0.02 8 min 40 s / 30 GB  
SOAPdenovo2 r223 +34% 0.0036 0.0039 0.02 1 min 38 s / 30 GB Linearly with number of reads.
SOAPdenovo2 r223 +4,801% 0.13 0.14 0.74 7 h 05 min / 30 GB  
               

 

 

 

Genome assembly

Tool Version Details about the Experiments Carbon Footprint Tree-months km in a Car (EU) Running Time and Memory Approximate Scaling (if known)
      Increase (%) kgCO2e        
Abyss 2.0 De novo assembly of a human genome from Illumina sequencing reads (Jackman et al. 2017). 11 12 61 20h / 34GB  
MEGAHIT 1.0.6   +42% 15 16 86 26h / 197GB  

 

Metagenome assembly

Tool Version Details about the Experiments Carbon Footprint Tree-months km in a Car (EU) Running Time Memory Approximate Scaling (if known)
      Increase (%) kgCO2e          

MetaVelvet k101

1.2.01

 

14

16

82

1h 06 min

130 GB

 

MEGAHIT

1.0.3

 

+438%

77

84

439

15h 36 min

12 GB

 

metaSPAdes

3.8.0

 

+1,206%

186

203

1,065

29h 24 min

60 GB