This is Part 2 of the “Best Computing Practices for Ecology” series. I’ve just been watching the R revolution sweep ecology (one I’m very much a part of, too) and want to shed light on some other aspects of the “computing ecosystem” in ecological terms…insofar as I can make the wordplay work! So, here is the landscape ecology edition. What I mean to highlight here is “the landscape” of your computing environment: are we talking about just your laptop, a desktop connected to a network with resources you can access, or a computer cluster of some sort? What scale should your work be on, and what scales are accessible to you?
Which of these organisms best represents your analysis, and thus your ecosystem needs?
- Western predatory mite on the orange: maybe your own personal computing environment is all you need! Perhaps you have a great computer that can handle your analyses in a timely fashion. Your biggest wait for an analysis to complete is overnight.
- Karner blue butterfly: perhaps you need to “disperse” your computation a little bit, but not far. Maybe you can use a server in your building to take the load off your machine, and offer a bit more power. Then, you can still use your own machine without it being frozen an analysis that takes all its resources, and you can also shut it off while your analysis keeps running remotely.
- Jellyfish: (OK admittedly this is where the functional connectivity analysis breaks down. I picked this because it’s a colonial organism.) Does your analysis have a lot of moving parts, where each “zooid” ultimately contributes to the whole picture? Maybe parallelization over multiple nodes is right for you (i.e. a computing cluster). For example, you can run many instances of independent similar jobs at once. You also can potentially access a lot of computing resources if your jobs need them.
So, how do we do this if you need an analysis bigger than an individual machine can handle?
- Look into resources at your organization. Get in touch with the IT people in your building to ask about resources such as…
- servers: maybe your department pools resources to have a collection of application servers. Your department IT person can tell you how to access them.
- local clusters: when I was a physics major in undergrad, our IT guys had clustered the lab computers to run computations at night. So, you could get an account and schedule jobs for the “downtime” of the lab, utilizing the parallel functionality afforded by the cluster.
- Campus-wide resources
- supercomputing: there tends to be a little more hoop-jumping to request time on the “big guns” at your university, but that’s not to say it can’t be done! If you’re not faculty, you probably will need your supervisor to make the request on your behalf, so you can help write the request when you find the requirements. At worst, though, you may need to pay for time so you’d need to find out if you can work that into your project budget.
- high throughput computing: Condor allows you to distribute jobs that aren’t urgent and tend to need little resources. Basically, the idea is that you can schedule a ton of little jobs that can go to a broad network of connected computers that are available to process your job.
- high processing computing: this is more what people think of when they think of “supercomputing.” This is a tight cluster, designed with optimal architecture and top-of-the-line hardware. It’s designed to allow you to run your jobs in a high-tech environment, giving you access to fast processors and lots of memory.
- open science grid: if this option is available, it may not look much different from supercomputing for you, but basically your jobs are scheduled over a nationwide grid.
- supercomputing: there tends to be a little more hoop-jumping to request time on the “big guns” at your university, but that’s not to say it can’t be done! If you’re not faculty, you probably will need your supervisor to make the request on your behalf, so you can help write the request when you find the requirements. At worst, though, you may need to pay for time so you’d need to find out if you can work that into your project budget.
- Cloud computing: here’s an interesting area because depending on your circumstances, you can potentially access this for a fee on your own.
- Google Cloud
- Google Earth Engine: this one’s pretty specific to GIS applications, but I love it for that! The coolest thing about it is it made remote sensing so much easier. It hosts a huge imagery library and cuts out the need for a lot of processing, provides many handy algorithms built-in, projects all of your work to a common projection, and has a visualization on Google Maps. It’s a traditional GIS software but with JavaScript coding instead of point-and-click interface. Also awesome: it does basic computations in the cloud, for free!
Let me know what other options I’ve missed in the comments! Let’s start by taking advantage of our computing resources and getting our analyses to the right homes, before moving onto the computing environment.