Tuesday, October 13, 2020

Home office setup

One of my colleagues recently asked me about my home office setup, after noting that my video and audio quality is generally quite good on our frequent Zoom calls.  Whenever anyone asks me a question like this, I take it as a good excuse to write a blog post!

We have all spent a lot of time in our home offices since March, and I’m lucky that we have a guest bedroom that I was able to repurpose as my home office.  I’ve ended up spending a bit of money to make it nice, but I think in general the investments have been good.  However, I’ve also gone cheap/DIY when I can.  Here is a photo of my desk setup:



Here’s a quick rundown of the various items:
  1. Camera: Logitech C930e
  2. Microphone: Audio-Technica ATR2100x-USB with Sterling Audio Sterling SM5 Shock Mount
  3. Pop Filter: Stedman Proscreen XL
  4. Mic boom: Rode PSA1
  5. Headphones (over-ear): Audio-Technica ATH-AD700X Open-air
  6. Headphones (in-ear): Apple AirPods
  7. Lighting: Homemade diffusers with Cree 5000K LED bulbs
  8. Green screen: Homemade
  9. Chair: Steelcase Leap
Camera: This webcam was scavenged from my lab at the beginning of the pandemic, back when it was impossible to find a webcam in stock for purchase.  It works fine, though I wouldn't say that the picture quality is amazing.  After seeing one of my colleagues get amazing video quality by using their DSLR as a webcam, I tried it out with our relatively ancient Canon Rebel - the color was much better but its video was way too laggy, and the camera/tripod setup took up too much room on my desk, so I’ve stuck with the Logitech.  I use the Webcam Settings App for Mac to zoom the image so that my head takes up most of the image without having to lean into the camera.

Microphone setup:  I wanted to get a boom mic rather than a stand mic, mostly because I didn’t want a stand mic taking up extra space on my desktop.  I know that many people use either a lapel mic or a mic integrated into their headset, but neither of those sounded attractive to me.  The microphone connects via USB to my computer, and works really well. I went for a nicer mic in part because I was planning to record an audio version of my statistics book to provide to my students, and I’ve been really happy with the sound quality.  The shock mount does a good job of isolating low-frequency noise from the desk, though a tiny bit of keyboard noise is evident when I’m typing, even with the mic pointed directly away from the keyboard.  The Rode mic stand can sometimes be difficult to keep in position, but works fine for my purposes.  I don’t use the popscreen for Zoom calls, but it has been important for recording spoken word material, which otherwise sounds like I’m spitting on the listener.

Headphones: I generally alternate between in-ear and over-ear headphones over the day.  I love the AirPods, but after a while they start hurting my ears, and they don’t have enough battery life to get me through a full day of Zoom meetings.  The Audio-Technica headphones were my first open-back headphones, and I am definitely a convert - they let you hear the outside world, and don’t leave you with that closed-in feel that you get from closed-back headphones.  They are also super comfortable.  These are standard wired headphones, which I like both because they don’t have a lag like bluetooth headphones (not so important for Zoom calls but essential when I’m playing guitar), and also because I will never be stuck with a dead battery.

Lighting: Everything else involved buying some equipment, so for the lighting setup I decided to go DIY (with lots of help and encouragement from my designer/wife Jen).  I wanted a simple two-point lighting setup from the two sides of my monitor, so we started with a couple of old table lamps that we had around the house.  I took a couple of empty wooden picture frames and attached each one to one of the arms of the lamp using a plastic cable stay, which is not exactly bulletproof but so far as lasted several months without failing.  




To create a diffuser I started with some architectural tracing paper which I affixed in a sleeve around the picture frames.  ultimately this wasn’t quite enough diffusion (I was still seeing strong reflections of the light in my glasses), so I also attached a piece of standard printer paper to the front with a binder clip.  I still get a bit of point glare, but it’s not too bad:



I'll probably try to do some more tweaking to resolve that.  We started with some warmer bulbs but I didn’t love the color, so I replaced them with Cree 5000K LED bulbs which I’m pretty happy with.

Green screen: I don’t usually use a green screen, but sometimes I need it if I want to play with video editing software for lecture videos. This one is also DIY - basically a wheeled clothing rack with a green fleece blanket attached using some large binder clips. 




Definitely not pretty, but gets the job done.

Chair: After spending the first few months of quarantine sitting in a cheapo office chair (and feeling the effects by the end of the day), I decided to splurge on a serious office chair. I already had a Steelcase Leap in my campus office, so I knew I would be happy with it. It has not disappointed - it’s definitely not cheap, but if you need a really good chair and have the budget I would definitely recommend it. Your butt will thank you!

I'm interested to hear your thoughts and any tips on how to further optimize the setup.

Thursday, July 23, 2020

Vacation fun: Making traditional Texas chili con carne

I’m on vacation at home this week, and one afternoon when it was especially gray (because San Francisco in July) I decided to cook up some chili con carne.  The recipe that I use is a modification of one that I can no longer find online, written by Reece Lagnuas back when he was a butcher in Austin.  I documented this cook because the recipe is such a great rendition of the traditional Texas chili that I grew up eating that I thought it should be out there for everyone to try.  And this batch turned out to be especially good!

Your reward at the end of this journey
Be forewarned, this dish requires a pretty substantial time commitment - from start of prep until the dish was cooking it took me about 90 minutes.  Once it’s cooking you just need to check it occasionally to make sure it’s simmering and not cooking too hard - it should be ready to eat within 2-3 hours.  Perfect activity for a cool, gray vacation afternoon!

Also - I've never written a recipe before, I apologize in advance for how verbose it is...





Ingredients:

  • Meat: I hope it goes without saying that you should only cook with humanely raised meat.  The meat for this cook came from our neighborhood butcher shop, Avedanos, which supports local family farms.  
    • ~ 2.5 pounds pork shoulder
    • ~ 2.5 pounds brisket (preferablly from the fattier end, known variously as the point or deckle)
    • if you have your own meat grinder then buy them whole, otherwise ask your butcher to grind them as coarsely as possible 
  • 1 large onion - diced relatively small
  • fresh chiles
    • 2 red bell peppers
    • 2 poblano peppers
    • 2 large jalapeno peppers
  • dried chiles - I use a varying mix, this time it was:
    • 2 chile ancho
    • 3 dried pasilla
    • 2 chile guajillo
  • seasonings: you can mix all of these together as they will be added at the same time
    • salt (start with 1 tbs, we like it salty so usually add more to taste later in the cook)
    • ground black pepper (1 tsp)
    • cayenne pepper (if you want it spicy - for this cook, I added about 1/3 tsp of Penzey’s Black & Red which is a mix of black and cayenne pepper - the end result had just a very tiny bit of spicy kick)
    • Chili Powder (3 tbs)
    • Ground cumin seed (2 tsp)
    • Garlic powder (1 tbs)
    • Onion Powder (1 tbs)

Steps:

Roast the fresh peppers.  The goal here is to char the skins so that they come off easily after steaming.  I used my outdoor gas grill, but you can also do this directly over the burner of a gas range.  If you don’t have gas then it sounds like you can also use an electric range or toaster oven.  You want the skins to be charred black over as much of the pepper as possible, so you will need to turn them regularly; the larger peppers will probably take much longer than the small ones.  Once they are nicely charred, then put them in a loosely sealed container to steam for at least 20 minutes.

Roasting the fresh chiles on the backyard gas grill


Roast and rehydrate the dried chiles.  This will require a hot pan (I used the same Dutch oven that I will use to cook the chili) and about a quart of boiling water.  Heat the pan on high and toss in the chiles, turning them regularly to prevent burning.  When they start to smell roasty, place them in a heatproof bowl for soaking.  Before you soak them, use some scissors to cut small holes in the side of each chili - this will make it easier to get any air out and submerge the chiles fully. After cutting the holes, pour the boiling water over the chilis.  


Roasting the dried chiles

Prepare the meat.  If your meat was ground by your butcher then you can skip this step.  I like to grind the meat myself, since butchers often need time to set up their grinder for a coarse grind.  I use the meat grinder attachment for our KitchenAid mixer.  When grinding meat, it’s important for both the meat and grinder to be as cold as possible, so I put both of them in the freezer for about an hour before grinding the meat.  Chop the meat into strips or chunks that are small enough to fit in the grinder feed; I like to leave most of the fat on and remove it later during the cook, but sometimes I will trim away large fat pieces.

Action shot - grinding the brisket


Clean and chop the chiles.  Remove the skins from the fresh chilis (they should come off easily after steaming), and also remove the stem, seeds, and membranes inside the chili.  Don’t wash them!  For the dried chiles, try to remove as much of the seeds and membrane as possible (don’t worry about the skins).  Then chop them until they are nearing the consistency of a paste; this generally takes a lot of work.
Dried chiles after roasting
Another action shot - chopping chiles

Time to start cooking!  Add about 2 Tbs of oil to the large pot, and cook the onions on relatively high heat until they are just starting to brown, stirring constanly.





Blooming the spices - the smell is amazing
Add the spice mixture once the onions are starting to brown, and stir constantly for a minute or two. You should smell the spices bloom, especially the cumin seed.

Browning the meat. After blooming the spices, add the meat and cook for several minutes until it is starting to brown. You should be able to smell the meat browning and start to see fat from the meat rendering out in the pan. 

Add the chili paste and mix into the meat. Then add just enough water to cover the meat; for this cook it was about 6 cups.

About 3 hours in - almost done!

Bring to a boil and then reduce to a simmer.  The chili will then cook for at least 2 hours and preferably 3 or more hours; for this cook, it went a bit more than 3 hours.

Skim extra grease.  A couple of hours into the cook, there will likely be a substantial amount of grease on the top of the chili.  I like to remove some of this before serving, so that the chili isn’t too greasy. There are probably fancy ways to do this, but I simply use a Chinese soup spoon to skim the fat off of the top.  This time around I ended removing about 1.5 cups of fat.

When you are ready to eat, taste the chili and add salt as needed to taste.

Enjoy!  I don’t generally like adulterating my chili with any additions, but this time I tried it with a bit of guacamole on the side, and it was really good.

This recipe makes a lot of food — we usually have enough left over from this recipe for two additional meals (for two people).  The chili keeps well in the freezer for at least a month, though it rarely lasts that long around here...





Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

Friday, January 24, 2020

Talking remotely: Lessons learned so far

Since making my commitment to reduce air travel for academic purposes, I’ve been giving a lot more remote talks.  In the last 5 months I have given 10 remote talks - many thanks to those who have agreed to host me virtually rather than in person:

September:
  • National Academies Data Science in the Cloud workshop, Washington, DC
  • National Academies Brain Health Across the Lifespan workshop, Washington, DC
October:
  • Cognitive Science Colloquium, Institut d'Etudes Cognitives, École Normale Supérieure, Paris, France.
  • Johns Hopkins University Dept. of Electrical and Computer Engineering, Distinguished Lecturer Series, Baltimore, MD
  • NIMH  Talk Series on Machine Learning in Brain Imaging, Neuroscience, and Psychology, Bethesda, MD
November:
  • Santa Fe Institute, Cognitive Regime Shift meeting, Santa Fe, NM
  • Montreal Neurological Institute, Open Science Symposium, Montreal
  • Johns Hopkins University Dept. of Biostatistics,  Bethesda, MD
January:
  • IBI Data Standards and Sharing Working Group, Tokyo, Japan
  • Max Planck School of Cognition, Berlin, Germany
Some of these were already bunched together so they wouldn’t have required separate flights, but even considering that, my back-of-the-envelope calculation shows that these flights would have resulted almost 7 tons of CO2 being generated (as estimated using https://www.icao.int/environmental-protection/Carbonoffset/Pages/default.aspx).  Not to mention lots of physiological stress from jet lag, and travel costs to be borne by my hosts. So in many ways it’s been a huge win for everyone.

An important issue, however, is what the experience was like, both for my hosts and the attendees and for myself.  The visits varied from talks with a short Q&A session, to extended visits in which my talk was followed by individual meetings with researchers. For me the experience has been very positive — certainly not as good as being there in some ways, but still very satisfying.  The least satisfying experience for me as a speaker has been in situations where I give a talk without time for Q&A afterwards.  I think that my hosts have also largely found it to be a positive experience, at least from the feedback that I’ve received. In one case, I was the pilot test for hosting extended virtual visits, and afterwards they told me that the experience had convinced them to do it regularly.  

Going through these talks has taught me a few lessons about how to improve the experience, both for the speaker and for the audience.
  1. Always set up a time with the host to test things out in advance in the actual venue, preferably at least a few days before the talk.
  2. On the day of the talk, arrange to meet the host online at least 15 minutes before the scheduled talk time.  Even when everything is well oiled, problems can arise, and you don’t want to be debugging them in front of an audience.
  3. Give your host your cell phone number, and keep your phone handy so that they have an alternate way to contact you if necessary. 
  4. In general I think it’s good for a virtual talk to be a bit shorter than a regular talk, simply because it’s easier for people to fade off when you are not present to look them in the eye.  Erring on the side of going short rather than long is also a good general principle — As an audience member I have rarely been upset when a talk went shorter than expected, and it gives more time for questions, which are usually the most interesting part anyway.
  5. For longer talks (over an hour), give the audience a short intermission.  For example, for my talk to the Max Planck School of Cognition (a 90 min talk with 30 mins for questions), I asked the audience to stand up and stretch out about half way through, which they seemed to appreciate.
I also have several suggestions for hosts of virtual visits:
  1. *Please* use a standard commercial conferencing system (like Zoom or Webex) rather than a home-grown system. Especially one that requires me to install special software! Having to install new software or log into a new system is just another potential point of failure for the talk. In general I have had the best experiences when using Zoom or Skype, but I’m sure there are other systems that are also good.  
  2. As a speaker I particularly like being able to see a chat window on my screen as I’m talking, so that people can post questions during the talk.  This works well with systems like Zoom, but often doesn’t exist at all in home-grown systems. 
  3. Please provide a camera so that the speaker can see the audience.  Talking without seeing the audience is much less pleasant and also makes it impossible to tell if people are disengaged, or if there is an unexpected problem with the A/V system.  
  4. Make clear to the audience up front how questions will work.  I prefer having them submitted by chat window, but if they are going to be spoken, then there should be microphones explicitly for the question, and these should be tested beforehand to make sure that the speaker can hear them.
  5. For extended visits, it has worked well to have a single Zoom room for the entire day, which individuals come into or out of throughout the day for their scheduled meetings.  Please remember that people sitting in front a computer have biological needs just like people who are physically present, so schedule regular bio-breaks during the day.
  6. For events that are more discussion based, it's important to have multiple microphones spread around the room so that the virtual attendees can hear what is being said.  If someone is going to be writing on a whiteboard, it's also important to have a camera on the board.
Please leave other thoughts or suggestions in the comments below!


Thursday, December 12, 2019

Computing models for a neuroimaging lab

I had a conversation with a colleague recently about how to set up computing for a new neuroimaging lab.  I thought that it might be useful for other new investigators to hear the various models that we discussed and my view of their pros and cons.  My guess is that many of the same issues are relevant for other types of labs outside of neuroimaging as well - let me know in the comments below if you have further thoughts or suggestions!

The simplest model: Personal computers on premise

The simplest model is for each researcher in the lab to have their own workstation (or laptop) on which all of their data live and all of their computing is performed.

Pros:

  • Easy to implement
  • Freedom: Each researcher can do whatever they want (within the bounds of the institution’s IT policies) as they have complete control over their machine. NOTE: I have heard of institutions that do not allow anyone on campus to have administrative rights over their own personal computer.  Before one ever agrees to take a position, I would suggest inquiring about the IT policies and make sure that they don’t prevent this; if they do, then ask the chair to add language to your offer letter than explicitly provides you with an exception to that policy.  Otherwise you will be completely at the mercy of the IT staff — and this kind of power breeds the worst behavior in those staff.  More generally, you should discuss IT issues with people at an institution before accepting any job offer, preferably with current students and postdocs, since they will be more likely to be honest about the challenges.

Cons:

  • Lack of scalability: Once they need to run more jobs than there are cores on the machine, the researcher could end up waiting a very long time for those jobs to complete, and/or crash the machine due to resource insufficiency.  These systems also generally have limited disk space.
  • Underuse: One can of course buy workstations with lots of cores/RAM/storage, which can help address the previous point to some degree.  However, then one is paying a lot of money for resources that will sit underutilized most of the time.
  • Admin issues: Individual researchers are responsible for managing their own systems. This means that each researcher in the lab will likely be using different versions of each software package, unless some kind of standardized container system is implemented.  This also means that each researcher needs to spend their precious time dealing with software installation issues, etc, unless there is a dedicated system admin, which costs $$$$.
  • Risk: The systems used for these kinds of operations are generally commodity-level systems, which are more likely to fail compared to enterprise-level systems (discussed below).  Unless the lab has a strict policy for backup or duplication (e.g. on Dropbox or Box) then it’s almost certain that at some point data will be lost.  There is also a non-zero risk of personal computers being stolen or lost.
Verdict: I don’t think this is generally a good model for any serious lab.  The only strong reason that I could see for having local workstations for data analysis is if one’s analysis requires a substantial amount of graphics-intensive manual interaction.

Virtual machines in the cloud

Under this model, researchers in the lab house their data on a commercial cloud service, and spin up virtual machines on that service as needed for data analysis purposes.

Pros:

  • Flexibility: This model allows the researcher to allocate just enough resources for the job at hand.  For the smallest jobs, one can sometimes get by with the free resources available from these providers (I will use Amazon Web Services[AWS] as an example here since it’s the one I’m most familiar with).  On AWS, one can obtain a free t2.micro instance (with 1 GB RAM and 1 virtual CPU); this will not be enough to do any real analysis, but could be sufficient for many other functions such as working with files.  At the other end, one can also allocate a c5.24xlarge instance with 96 virtual CPUs and 192 GiB of RAM for about $4/hour.  This range of resources should encompass the needs of many labs.  Similarly, on the space side, you can scale your storage space in an effectively unlimited way.
  • Resource-efficiency: You only use what you pay for.
  • Energy-efficiency: Cloud services are thought to be much more energy-efficient compared to on-premise computers, due to their higher degree of utilization (i.e. they are not sitting idle most of the time) and the fact that they often obtain their power from renewable resources.  AWS estimates that cloud computing can reduce carbon emissions by up to 88% compared to on-premise computers.
  • Resilience: Occasionally the hardware on a cloud VM goes out.  When this happens, you simply spin up a new one --- no hardware replacement cost.

Cons:

  • Administration and training: Since most scientists will not have experience spinning up and administering cloud systems, there will be some necessary training to make this work well; preferably, one would have access to a system administrator with cloud experience.  Researchers need to be taught, for example, to shut down expensive instances after using them, lest the costs begin to skyrocket.
  • Costs: Whereas the cost of a physical computer is one-time, cloud computing has ongoing costs.  If one is going to be a serious user of cloud computing, then they will need to deeply understand the cost structure of their cloud computing services.  For example, there are often substantial costs to upload and download data from the cloud, in addition to the costs of the resources themselves.  Cloud users should also implement billing alarms, particularly to catch any cases where credentials are compromised. In one instance in my lab, criminals obtained our credentials (which were accidentally checked into Github) and spent more than $20,000 within about a day; this was subsequently refunded by AWS, but it caused substantial anxiety and extra work.  
  • Scalability: There will be many cases in which an analysis cannot be feasibly run on a single cloud instance in reasonable time (e.g., running fMRIprep on a large dataset).  One can scale beyond single instances, but this requires a substantial amount of work, and is really only feasible if one has a serious cloud engineer involved. It is simply not a good use of a scientist’s time to figure out how to spin up and manage a larger cluster on a cloud service; I know this because I’ve done it, and those are many hours that I will never get back that could have been used to do something more productive (like play guitar, do yoga, or go for a nice long walk).  One could of course spin up many individual instances and manually run jobs across them, but this requires a lot of human effort, and there are better solutions available, as I outline below.

Verdict: For a relatively small lab with limited analysis needs and reasonably strong system administration skills or support, I think this is a good solution.   Be very careful with your credentials!

Server under a desk (SUAD)

Another approach for many labs is a single powerful on-premise server shared by multiple researchers in the lab, usually located in some out-of-the-way location so that no one (hopefully) spills coffee on it or walks away with it.  It will often have a commodity-grade disk array attached to it for storage.

Pros:
  • Flexibilty: As with the on-premise PC model, the administrator has full control.
Cons:
  • Basically all the same cons as the on-premise PC model, with the added con that it's a single point of failure for the entire lab.
  • Same scaling issues as cloud VMs
  • Administration: I know that there are numerous labs where either faculty or graduate students are responsible for server administration.  This is a terrible idea!  Mostly because it's time they could better spend reading, writing, exercising, or simply having a fun conversation over coffee.
Verdict: Don't do it unless you or your grad students really enjoy spending your time diagnosing file system errors and tuning firewall rules.

Cluster in a closet (CIIC)

This is a common model for researchers who have outgrown the single-computer-per-researcher or SUAD model.  It’s the model that we followed when I was a faculty member at UCLA, and that I initially planned to follow when I moved from UCLA to UT Austin in 2009.  The CIIC model generally involves a rack-mounted system with some number of compute nodes and a disk array for storage.  Usually shoved in a closet that is really too small to accommodate it.

Pros:

  • Scalability: CIIC generally allows for much better scalability. With current systems, one can pack more than 1000 compute cores alongside substantial storage within a single full-height rack.  Another big difference that allows much greater scalability is the use of a scheduling (or queueing) system, which allows jobs to be submitted and then run as resources are available.  Thus, one can submit many more jobs than the cluster can handle at any one time, and the scheduler will deal with this gracefully. It also prevents problems that happen often under the SUAD model when multiple users log in and start jobs on the server and overrun its resources.
  • Flexibility: One can configure one’s cluster however they want, because they will have administrative control over the system.

Cons:

  • Administration:Administering a cluster well is a complex job that needs a professional system administrator, not a scientist moonlighting as an sysadmin; again, I know this because I lived it.  In particular, as a cluster gets bigger, the temptation for criminals to compromise it grows as well, and only a professional sysadmin is going to be able to keep up with cybercriminals who break into systems for a living.
  • Infrastructure: Even a reasonably sized cluster requires substantial infrastructure that is unlikely to be met by a random closet in the lab.  The first is power: A substantial cluster will likely need a dedicated power line to supply it.  The second is cooling: Computers generate lots of heat, to a degree that most regular rooms will not be able to handle.  On more than one occasion we had to shut down the cluster at UCLA because of overheating, and this can also impact the life of the computer’s components.  The third is fire suppression: If a fire starts in the closet, you don’t want regular sprinklers dumping a bunch of water on your precious cluster. It is for all of these reasons that many campuses are no longer allowing clusters in campus buildings, instead moving them to custom-built data centers that can address all of these needs.
  • Cost: The cost of purchasing and running a cluster can be high. Commercial-level hardware is expensive, and when things break you have to find money to replace them, because your team and colleagues will have come to rely on them.
  • Training: Once you move to a cluster with more than a single node, you will need to use a scheduler to submit and run jobs. This requires a change in mindset about how to do computing, and some researchers find it annoying at first.  It definitely requires letting go of a certain level of control, which is aversive for many people. 
  • Interactivity: It can be more challenging to do interactive work on a remote cluster than on a local workstation, particularly if it is highly graphics-intensive work.  One usually interacts with these systems using a remote window system (like VNC), and these often don’t perform very well.
Verdict: Unless you have the resources and a good sysadmin, I’d shy way from running your own cluster.  If you are going to do so, locate it in a campus data center rather than in a closet.

High-performance computing centers

When I moved from UCLA to UT Austin in 2009, I had initially planned to set up my own CIIC. However, once I arrived I realized that I had another alternative, which was to instead take advantage of the resources at the Texas Advanced Computing Center, which is the local high-performance computing (HPC) center (that also happens to be world-class).  My lab did all of its fMRI analyses using the TACC systems, and I have never looked back. Since moving to Stanford, we now also take advantage of the cluster at the Stanford Research Computing Facility, while also continuing to use the TACC resources as well.

Pros:

  • Scalability: Depending on the resources available at one’s HPC center, one can often scale well beyond the resources of any individual lab.  For example, on the Frontera cluster at TACC (its newest, currently the 5th most powerful supercomputer on Earth), a user can request up to 512 nodes (28,672 cores) for up to 48 hrs.  That's a lot of Freesurfer runs. The use of scheduling systems also makes the management of large jobs much easier.  These centers also usually make large-scale storage available for a reasonable cost.  
  • Professional management: HPC centers employ professional system administrators whose expertise lies in making these systems work well and fixing them when they break.  And the best part is that you generally don’t have to pay their salary! (At least not directly).

Cons:

  • Training: The efficient usage of HPC resources requires that researchers learn a new model for computing, and a new set of tools required for job submission and management. For individuals with solid UNIX skills this is rarely a problem, but for researchers without those skills it can be a substantial lift.
  • Control: Individual users will not have administrative control (“root”) on HPC systems, which limits the kinds of changes one can make to the system. Conversely, the administrators may decide to make changes that impact one’s research (e.g. software upgrades).  
  • Sharing: Using HPC systems requires good citizenship, since the system is being shared by many users.  Most importantly: Users must *never* run jobs on the login node, as tempting as that might sometimes be.  
  • Waiting: Sometimes the queues will become filled up and one may have to wait a day for one's jobs to run (especially just before the annual Supercomputing conference).  
  • Access:  If one’s institution has an HPC center, then one may have access to those resources.  However, not all such centers are built alike.  I’ve been lucky to work with centers at Texas and Stanford that really want researchers to succeed.  However, I have heard horror stories at other institutions, particularly regarding HPC administrators who see users as an annoyance rather than as customers, or who have a very inflexible approach to system usage that doesn’t accomodate user needs.  For researchers without local HPC access, there may be national resources that one can gain access to, such as the XSEDE network in the US.

Verdict:  For a lab like mine with significant computing needs, I think that HPC is the only way to go, assuming that one has access to a good HPC center.  Once you live through the growing pains, it will free you up to do much larger things and stop worrying about your cluster overheating because an intruder is using it to mine Bitcoin.

These are of course just my opinions, and I'm sure others will disagree.  Please leave your thoughts in the comment section below!


Thursday, June 27, 2019

Why I will be flying less

Since reading David Wallace Wells’ “The Uninhabitable Earth: Life After Warming” earlier this year, followed by some deep discussions on the issue of climate change with my friend and colleague Adam Aron from UCSD,  I no longer feel we can just sit back and hope someone else will fix the problem.  And it’s becoming increasingly clear that if we as individuals want to do something about climate change, changing our travel habits is probably the single most effective action we can take.  Jack Miles made this case in his recent Washington Post article, "For the love of Earth, stop traveling”:

According to former U.N. climate chief Christiana Figueres, we have only three years left in which to “bend the emissions curve downward” and forestall a terrifying cascade of climate-related catastrophes, much worse than what we’re already experiencing. Realistically, is there anything that you or I can do as individuals to make a significant difference in the short time remaining?
The answer is yes, and the good news is it won’t cost us a penny. It will actually save us money, and we won’t have to leave home to do it. Staying home, in fact, is the essence of making a big difference in a big hurry. That’s because nothing that we do pumps carbon dioxide into the atmosphere faster than air travel. Cancel a couple long flights, and you can halve your carbon footprint. Schedule a couple, and you can double or triple it.

I travel a lot - I have almost 1.3 million lifetime miles on United Airlines, and in the last few years have regularly flown over 100,000 miles per year.  This travel has definitely helped advance my scientific career, and has been in many ways deeply fulfilling and enlightening.  However, the toll has been weighing on me and Miles' article really pushed me over the edge towards action.  I used the Myclimate.org carbon footprint calculator to compute the environmental impact of my flights just for the first half of 2019, and it was mind-boggling: more than 23 tons of CO2.  For comparison, my entire household’s yearly carbon footprint (estimated using https://www3.epa.gov/carbon-footprint-calculator/) is just over 10 tons!  

For these reasons, I am committing to eliminate (to the greatest degree possible) academic air travel for the foreseeable future. That means no air travel for talks, conferences, or meetings -- instead participating by telepresence whenever possible.  I am in a fortunate position, as a tenured faculty member who is already well connected within my field.  By taking this action, I hope to help offset the travel impact of early career researchers and researchers from the developing world for whom air travel remains essential in order to get their research known and meet fellow researchers in their field. I wish that there was a better way to help early career researchers network without air travel, but I just haven’t seen anything that works well without in-person contact.  Hopefully the growing concern about conference travel will also help spur the development of more effective tools for virtual meetings. 


Other senior researchers who agree should join me in taking the No Fly pledge at https://noflyclimatesci.org/.  You can also learn more here: https://academicflyingblog.wordpress.com/


Monday, December 3, 2018

Productivity stack for 2019


Apparently some people seem to think my level of productivity is simply not humanly possible: 






















For the record, there is no cloning lab in my basement.  I attribute my productivity largely to a combination of mild obsessive/compulsive tendencies and a solid set of tools that help me keep from feeling overwhelmed when the to-do list gets too long.  I can’t tell you how to become optimally obsessive, but I can tell you about my productivity stack, which I hope will be helpful for some of you who are feeling increasingly overwhelmed as you gain responsibilities. 

Platform: MacBook Pro 13” + Mac OS X 
  • I have flirted with leaving the Mac as the OS has gotten increasingly annoying and the hardware increasingly crappy, but my month-long trial period with a Windows machine left me running back to the Mac (mostly because the trackpad behavior on the Dell XPS13 was so bad).  Despite the terrible keyboard (I’ve had two of them replaced so far) and the lack of a physical escape key, the 13” Macbook Pro is a very good machine for working on the road - it’s really light and the battery life is good enough that I rarely have to plug in, even on a long flight from SFO to the east coast.  In the old days I would invert my screen colors to reduce power usage, but now I just use Dark Mode in the latest Mac OSX.
  • I keep a hot spare laptop (a previous-generation Macbook Pro) synced to all of my file sharing platforms (Dropbox, Box, and Google Drive) in case my primary machine were to die suddenly.  Which has happened. And will happen again. If you can afford to have a second laptop I would strongly suggest keeping a hot spare in the wings.
  • I don’t have a separate desktop system in my office - when I’m there I just plug into a larger monitor and work from my laptop. In the past I had separate desktop and laptop systems but just found it too easy for things to get desynchronized.
  • Pro Tip: About once a month I run the Onyx maintenance scripts, run the DiskUtility file system repair, and clone my entire system to a lightweight 1TB external drive (encrypted, of course) using CarbonCopyCloner.  Having a full disk backup in my backpack has saved me on a few occasions when things went wrong while traveling.

Mobile: Pixel 2 + Google Fi 
  • I left the iPhone more than a year ago now and have not looked back.  The Pixel 2 is great and Google Fi wireless service is awesome, particularly if you travel a lot internationally, since data costs the same almost everywhere on earth.  If you want to sign up, use my referral link and you’ll get a $20 credit (full disclosure - I will get a $100 credit).

Email: Gmail  
  • For many years I used the Mac Mail.app client, but as it became increasingly crappy I finally gave up and moved to the GMail web client, which has been great.  The segregation of promotion and social emails, and new features like nudges, make it a really awesome mail client.  
  • My email workflow is a lazy adaptation of the GTD system: I try not to leave anything in my inbox for more than a day or so (unless I’m traveling).  I either act on it immediately, decide to ignore/delete it, or put it straight into my todo list (and archive the message so it’s no longer in my inbox).  I’m rarely at inbox zero, but I usually manage to keep it at 25 or fewer messages, so I can see it all in a single screen.

To do list: Todoist 
  • I moved to Todoist a couple of years ago and have been very happy with it. It’s as simple as it needs to be, and no simpler.  The integration with GMail is particularly nice.
Calendar: Google Calendar
  • The integration between my Android device and Gmail across platforms makes this a no-brainer.


Notes: Evernote 
  • Evernote is my go-to for note-taking during meetings, talks, and whenever I just want to write something down.  

Lab messaging: Slack 
  • I really don’t love Slack (because I feel that it’s too easy for things to get lost when a channel is busy), but it has become our lab’s main platform for messaging.   We've tried alternatives but they have never really stuck.

Safe Surfing: Private Internet Access VPN + UBlock Origin/Privacy Badger 
  • Whenever I’m on a public network I stay safe by using the Private Internet Access VPN, which works really well across every platform I have tested it.   (and you can pay for it with Bitcoin!)
  • When surfing in Chrome I use UBlock Origin and Privacy Badger extensions to prevent trackers. 

Writing: Google Docs/TexShop 
  • For collaborative writing we generally stick to Google Docs, which just works.  Paperpile is a very effective reference management system.  
  • For my own longer projects (like books) I write in LaTeX using TexShop, with BibDesk for bibliography management, via the MacTex distribution.  If I were writing a dissertation today I would definitely use LaTeX, as I have seen too many students scramble as Microsoft Word screwed up their huge dissertation file.  Some folks in the lab use Overleaf, which I like, but I also do a lot of writing while offline so a web-based solution isn’t optimal for me.

Presentations: Keynote 
  • I have tried at various points to leave Keynote, but always came crawling back.  It’s just so easy to create great-looking presentations, and as cool as it would be to build them in LaTeX, I would have nightmares involving the inability to compile my presentation 3 minutes before my talk.

Art: Affinity Designer 
  • I gave up on Adobe products when they moved to a subscription model.  For vector art, I really like Affinity Designer, though it does have a pretty substantial learning curve.  I've tried various freeware alternatives but none of them work very well.

Coding in R: Rstudio 
  • If you’ve read my statistics book you know that I have a love/hate relationship with R, and most of the love comes from RStudio, which is an excellent IDE.  Except for code meant to run on the supercomputer, I write nearly all of my R code in RMarkdown Notebooks, which are the best solution for literate programming that I have seen.

Coding in Python: Anaconda + Jupyter Lab/Atom 
  • Python is my language of choice for most coding problems, and Anaconda has pretty much everything I need for scientific Python.
  • For interactive coding (e.g. for teaching or exploration) I use Jupyter Lab, which has matured nicely.  
  • For non-interactive coding (e.g. for code that will run on the supercomputer) I generally use Atom which is nice and simple but gets the job done.

Hopefully these tips are helpful - now back to getting some real work done! 

Tuesday, November 27, 2018

Automated web site generation using Bookdown, CircleCI, and Github

For my new open statistics book (Statistical Thinking for the 21st Century), I used Bookdown which is a great tool for writing a book using RMarkdown.  However, as the book came together, the time to build the book grew to more than 10 mins due to the many simulations and Bayesian model estimation.  And since each output type (of which there are currently three: Gitbook, PDF, and EPUB) requires a separate build run, rebuilding the full book distribution became quite an undertaking.  For this reason, I decided to implement an automated solution using the CircleCI continuous integration service. We already use this service for many of the software development projects in our lab (such as fMRIPrep and  MRIQC), so it was a natural choice for this project as well.

The use of CircleCI for this project is made particularly easy by the fact that both the book source and the web site for the book are hosted on Github — the ability to set up hooks between Github and CircleCI allows two important features. First, it allows us to automatically trigger a rebuild of the site whenever there is a new push to the source repo.  Second, it allows CircleCI to push a new copy of the book files to the separate repo that the site is served from.

Here are the steps to setting this up - see the Makefile and CircleCI config.yml file in the repo for questions.  And if you come across anything that I missed please leave a comment below!

  1. Create a CircleCI account linked to the relevant GitHub account.
  2. Add the source repo to CircleCI.
  3. Create the CircleCI config.yml file.  Here is the content of my config file, with comments added to explain each step:

version: 2
jobs:
  build:
    docker:
# this is my custom Docker image
      - image: poldrack/statsthinking21

CircleCI spins up a VM specified by a Docker image, to which we can then add any necessary additional software pieces.  I initially started with an image with R and the tidyverse preinstalled (https://hub.docker.com/r/rocker/tidyverse/) but installing all of the R packages as well as the TeX distribution needed to compile the PDF took a very long time, quickly using up the 1,000 build minutes per month that come with the CircleCI free plan.  In order to save this time I build a custom Docker container (Dockerfile) that incorporates all of the dependencies needed to build the book; this way, CircleCI can simply pull the container from my DockerHub repo and run it straight away rather than having to build a bunch of R packages.   

    steps:
      - add_ssh_keys:
          fingerprints:
            - "73:90:5e:75:b6:2c:3c:a3:46:51:4a:09:ac:d9:84:0f”

In order to be able to push to a github repo, CircleCI needs a way to authenticate itself.  A relatively easy way to do this is to generate an SSH key and install the public key portion as a “deploy key” on the Github repo, then install the private key as an SSH key on CircleCI.  I had problems with this until I realized that it requires a very specific type of SSH key (a PEM key using RSA encryption), which I generated on my Mac using the following command:

ssh-keygen -m PEM -t rsa -C "poldrack@gmail.com


# check out the repo to the VM - it also becomes the working directory
      - checkout
# I forgot to install ssh in the docker image, so install it here as we will need it for the github push below
      - run: apt-get install -y ssh
# now run all of the rendering commands
      - run:
           name: rendering pdf
           command: |
             make render-pdf
      - run:
           name: rendering epub
           command: |
             make render-epub
      - run:
           name: rendering gitbook
           command: |
             make render-gitbook

The Makefile in the source repo contains the commands to render the book in each of the three formats that we distribute: Gitbook, PDF, and EPUB.  Here we build each of those.

# push the rendered site files to its repo on github
      - run:
           name: check out site repo
           command: |
             cd /tmp
             ssh-keyscan github.com >> ~/.ssh/known_hosts

The ssh-keyscan command is necessary in order to allow headless operation of the ssh command necessary to access github below.  Otherwise the git clone command will sit and wait at the host authentication prompt for a keypress that will never come.

# clone the site repo into a separate directory
             git clone git@github.com:psych10/thinkstats.git
             cd thinkstats
# copy all of the site files into the site repo directory
             cp -r ~/project/_book/* .
             git add .
# necessary config to push
             git config --global user.email poldrack@gmail.com             git config --global user.name "Russ Poldrack"
             git commit -m"automated update"
             git push origin master

That’s it! CircleCI should now build and deploy the book any time there is a new push to the repo.  Don’t forget to add a CircleCI badge to the README to show off your work!