Wednesday, 28 September 2016

The Life Bioinformatic with Andrew Lonsdale - Episode 0 - "You need a better microphone."

The first episode of The Life Bioinformatic begins in Sydney, 2015, at the inaugural  Australian Bioinformatics and Computational Biology Society conference, ABACBS. ABACBS 2015 was hosted at the Garvan Institute, and admittedly half way though the conference dinner, I thought it might be a good idea to start a podcast.  Initially there was scepticism that this podcast was legitimate:

AL: I'm speaking with David Ma, photographer extraordinaire, David, you've taken photos of ABACBS and COMBINE. What do these conferences and symposiums mean to you? 
[.. silence ..]  
DM: You need a better microphone I think. We have some, but I guess it's a little late now.
AL: It is! I had this idea just 20 minutes ago. 
DM: Wait, do you run a podcasts normally? 
AL: Not normally, but this could be the first episode 
DM: What's the name of your podcast? 
AL: Umm, it can be whatever you want it to be be? 
DM: OK.
Al: But my idea was the The Life Bioinformatic 
DM: Oh yeah, I love that movie by Wes Anderson, the Life Aquatic. 
AL: Exactly, I'm running for a pun on that, do you think that would work? 
DM: I don't know if it's well known enough.

David was right (possibly on both counts) - the sound quality was such that almost none of it is really suitable for a podcast, and has sat on my phone for the better part of a year. 

But with AB3ACBS 2016 just five weeks away and abstract submissions due this Friday 30th of September (http://www.abacbs.org/conference), it seemed as good a time as any to go through it to see if there was anything worth salvaging. I've transcribed the best into this article. I was, and hopefully you will be too, pleasantly surprised by the genuine enthusiasm that came through for the ABACBS conference, despite the late hour and ample amount hospitality being enjoyed at time of recording. 

Although it was the inaugural conference under the umbrella of the newly formed society, it directly led from the great success of the Australian Bioinformatics Conference (ABiC) the year before.  With the new name, and newish society, I started the podcast by asking simple question: 

What does ABACBS mean to you?

Names have been suppressed to protect the innocent (and those who made some terrible jokes). Responses covered three basic themes:  jokes about the ABACBS acronym, Professor David Lovell and his entertaining role as host of the lightning talks, and a genuine appreciation and enthusiasm for the bioinformatics and computational biology community in Australia:
  • "You know it's uh.. just a conference where you've got come and make things count...you know, ABACBS... get. it... like an abacus and you count things?" [Mild laughter in background].
  • I get to see David Lovell once a year.
  • I get to meet other bioinformaticians and hang out with them which is always fun. 
  • Ummmm......bioinformatics....David Lovell being silly...I dunno.
  • Well i thought its going to be about different tools and just learning tools and and just using tools, and bioinformaticy tools, all about tools you know?
  • It means the state of affairs of bioinformatics in Australia.
  • Well firstly, COMBINE beforehand, great, lots of good free food, then ABABCS, also lot a great good free food, uh, some great poster presentations, some great tips and you get to hang out with gods of bioinformatics.
  • Its great opportunity to see my past colleagues, my present colleagues, who knows about the future!
  • Well obviously its a place where Australian bioinformaticians come together, and for me personally its really important thats its accessible to students.
  • What I've enjoyed the most is the chance to meet people I've wanted to to meet for quite a while.
  • ABiC last year was fantastic, and I don't think we could beat it, so at best we've tried to emulate it in a different place.
  • Its a coming together of bioinformatics communities, and that community is so essential because its really open and collaborative environment.
  • It means everything. 
  • Well, its abacus with a SNP. 
  • So its really about bringing the bioinformatics community together, and exchanging ideas and providing feedback to each other and support, and just developing the community. 
  • It signifies a community of people who share a common love of bioinformatics and computational biology, good spirit, good conversations, intellectual generosity, bring on 2016 I say!

The full transcript (which should never see the light of day!) contained more gold, and I might be able to put together a shortened version if I ever get time (read: never) so instead I've produced this word cloud to summarise what ABACBS means to those in attendance last year:



That's it for episode 0 of the podcast, and in Brisbane in November, the Life Bioinformatic will be back for episode 1... this time with a better microphone!


Monday, 15 February 2016

I'm retiring (from #sideprojects (for a while (probably)))


Despite the excessive parenthesis, this post is meant to be a definitive statement that I can refer back to whenever something interesting (non-research) comes up in the next year or so.

With the long preparation and eventual publication of Ten Simple Rules for a Bioinformatics Journal Club,  ABACBS due for an AGM soon where a new student representative will be elected, and an ever decreasing amount of time left of PhD scholarship, it's an apt time for me to retire from the fun and rewarding world of #sideprojects.

Last year was a particularly busy year for #sideprojects, continuing some degree of involvement in various groups (such as BGSA, ABACBS, Parkville Bioinformatics Journal Club, and of course COMBINE), co-chairing the EMBL Australia PhD Symposium (EAPS15), and teaching a number of Software Carpentry and  Data Carpentry workshops.

Many have asked how I can keep up all my research and still do a plethora of these community-centric things.

The simple answer of course is that I don't.

I've felt, and in discussions with others have formed the opinion, that there always is a balance; at the end of a research degree, there is the conventional wisdom that you need some degree of extra-curricular activities to avoid being a one dimensional at the completion of your degree. Having those 'other' things on your resume are valued, but without any novel research to complement it, you can risk being one dimensional the other way.

So to reduce this risk, it is time for a Sweeping Public Statement: I'm out.

I'm retired from #sideprojects. I keep telling people this, in the hopes that the repetition and public accountability will make it true.

Another piece conventional wisdom is that learning to say "No" to opportunities is also an important skill for a researcher and so far this year I've been keeping track of the great opportunities that I just don't have time for. By noting them down, I've found it's a great way to imagine what time I could have lost to them!

So from now until the end of my PhD, I'll do less not PhD, and more PhD. That there will be the occasional relapse, I do not doubt*, but the intention is very much to say "No", especially to myself.

With regards to COMBINE (the student sub-committee of ABACBS, and the ISCB Regional Student Group for Australia) this is actually a very easy decision, because of the safe hands it is in. The great work of the current president (@hdashnow) and the rest of the committee of COMBINE (past and present: Westa Domanava, Scott Ritchie, Thomas Coudrat, Jane Hawkey, Zoe Dyson, Tim Rice, Kian Ho, Ben Goudey, Karin Klotzbuecher and many more) over the last few years means that COMBINE looks set to make great strides in its goals of world domination a national network of students and ECRS, and I feel I can stop active involvement with full confidence that things will continue.

It's also true that what started out as attempts to build up a network of peers developed quickly into simply friends, and I'm sure those friendships will continue. I really enjoy all these activities, and post-PhD look forward to a nice balance of both research, teaching and community involvement then, but for now, it's time to go to the mattresses on research.

No more #sideproject ideas or trips to teach SWC/DC**.

People in bioinformatics and computational biology often know me for my other activities rather than my research. It's time to change that.



* Because, of course, who are we kidding? Plus I'm sure a few more of these will get written in idle moments. 

** I enjoy teaching SWC and DC, and in terms of time commitment, can be relatively minor when familiar with the materials. I don't intend to teach anymore this year, though if a combination of travel to a conference and an adjacent workshop came up I might consider it, and there a few workshops in planning that I'll help get off the ground for COMBINE.***

*** Yes this is a weasel-ing out of the entire point of this post, but, see *. 

Monday, 27 April 2015

Software Carpentry Feedback


Feedback! Feedback! Feedback!

The first Software Carpentry (SWC) workshop run solo by COMBINE was held recently. We were quite pleased with how it went it, and feedback seems to agree.

Just how much agreement there is beween our opinion on how it went and the participant feedback could be a matter for debate, if it weren't for the excessive amount of feedback that we gathered.

This SWC workshop was a direct result of Bill Mills recent trip to Australia. He taught a face to face instructor training course, and members of COMBINE from across Australian became instructors of Software Carpentry. It was little bit meta, learning how to be a SWC instructor during a SWC-style workshop, and sure enough, post it notes and the usual pedagogical techniques of short exercises and participant interactions were on show.

One thing that was new to me was a secondary use for the post it notes - requiring feedback on both things that went well during the day, and things that could be improved. After re-reading the Lessons Learned article by Greg Wilson afterwards, there is a similar usage mentioned as "minute cards", with post its collated during breaks to gauge things that have been learnt or found confusing.

We decided to use the same technique that Bill had, asking for both kinds of feedback at the end of each day, which led to some interesting outcomes.

It's raining feedback


At the end of day 1, we had approximately 60 notes, two post it notes per person. We quickly looked at them as a team at the end of the day, and devised a few corrective actions where appropriate.
I decided to look at them in detail that evening. I'd opened the workshop with the Unix shell and so had some clue that the feedback on these post it notes would likely contain some comments on my own teaching style. They did.
There were two main issues that I had:
  1. I learnt that my habit of using clear constantly to clean up my demonstration terminal was counter productive; workshop participants quickly got lost. When I paused, a blank screen behind me did nothing to reinforce points, or let students ponder what I had written in the terminal.
  2. I'd also had the brilliant idea before I taught the class to alter my PS1 environment variable to match the SWC course materials, so my command prompt was simply:
    $
    
In my head it made sense. Don't confuse the students Andrew, make sure the prompt matches the materials Andrew.

Of course, the unintended consequence of this was that when I changed directory (as I frequently did during the session) and the participants missed it (or I had cleared my screen by then!), there was little indication I'd done so unless I was conscious of it and threw in a pwd.
Rookie mistakes. These were two obvious items I can address next time; avoid using clear until there is a transition in the material (or even sneakily alias it to something like echo "----") and think of a useful PS1 with the current directory or similar.

I was pleased with the feedback (both the good and improvements) otherwise, and was happy with the outcome from my first official experience teaching a Software Carpentry module.

I wasn't entirely sure what we would actually do with the notes across the eight instructors who shared duties over the 3 days, apart from quick debriefs, however on a whim that evening I transcribed them all, in part because I've been trying to improve my typing to be > 2 fingers and I figured they'd have to be typed in at some point.

It didn't take too long.

Feedback Clouds!

Part way through Day 2, which was on R, I was sitting in the corner acting as a helper and browsed through the now CSV formatted file of comments.

I thought about the feedback, and how I'd found it valuable. It would be good to show the students that we actually read it, and valued it. I didn't want to just display the comments verbatim, or select just the good ones.

The quick solution to this was to use a word cloud!

Now I'm sure there are opinions on word clouds, but I'd been wanting to try one on some kind of toy data for a while; a simple word cloud could show the feedback in a de-identified manner, and show that we had valued it. Since I was in RStudio following the lesson along, I searched quickly: 'wordcloud' by Ian Fellows seemed to fit the bill.

The instructors for that day were up to reading in data from a CSV, and as I looked at the R code an idea slowly crept up.

Looking up at the screen, the R code being discussed was incidentally loading a CSV file at the time, and then it hit me: at the end of the day, I could show a word cloud of yesterday's feedback in RStudio and using concepts that had been taught that day.

As a demonstration and way to reinforce a few concepts (e.g. using libraries, when you might want strings to be factors) I think it was a success.

In R:


library(tm)
library(wordcloud)

feedback <- read.table(
  file="swc_feedback_day1.csv",
  header=TRUE, sep=","
)

feedback$Day <- as.factor(feedback$Day)
feedback$Comment <- as.character(feedback$Comment)

wordcloud(feedback$Comment,colors=brewer.pal(6,"Dark2"),random.order=FALSE)



The wordcloud package is quite simple to use. A couple of extra lines to remove some stop words (such as "the" featuring prominently above):


feedback$Comment <- tolower(x = feedback$Comment)
feedback_cleaned  <- removeWords(feedback$Comment,words = stopwords())
wordcloud(feedback_cleaned,colors=brewer.pal(6,"Dark2"),random.order=FALSE)





I'd also (roughly) coded the comments into various categories. So I could demonstrate how functions could be used, and show just the feedback related to certain codes, such as the helpers around the room:

sub_wordcloud <- function(data,category){
  sub_feedback <- data[data$Coded == category,]
  sub_feedback_cleaned  <- removeWords(sub_feedback$Comment,words = stopwords())
  wordcloud(sub_feedback_cleaned,colors=brewer.pal(6,"Dark2"),random.order=TRUE)
}

sub_wordcloud(feedback,"helpers")


Going through this code at the end of the day was a chance for a demonstration of the concepts of the day and some time for the participants to write the next lot of feedback on that days post it notes. The helpers around the room were indeed excellent, wonderful, patient and attentive.

I think having helpers around the room is the not so secret weapon of Software Carpentry.

When in rains....

After the fun putting together the R script, I was prepared to show it all again on the last day of the workshop (Python) with more data. Showing the Unix/Git sessions feedback using an R script had been useful for demonstrating the concepts from R however, and using the same script again would do nothing to add to the lesson. But if it was in Python... then it would show the same outcome using the two different languages taught, and reinforce the Python concepts that had been taught that day.

This seemed like such a good idea that I typed in all the post it notes again!

I use Python more than R, and know the Software Carpentry materials better as well, so it was even easier (some library installation issues aside) to get ready for the end of the workshop.

We start by loading the feedback csv file - explaining that I chose Pandas rather than Numpy that students had seen earlier in the day, as Pandas is well suited to mixed data types, whereas Numpy is suited to numerical data.

In Python:

import pandas as pd
feedback = pd.read_csv("swc_feedback_day12.csv",delimiter=',')
comments = ' '.join(feedback["Comment"])

Once the feedback is the comments variable, all that is needed is something to make the wordcloud. The virtues of libraries can be reiterated here, and it's relatively easy thanks to this wordcloud library from Andreas Mueller (NB: Install from most recent version in GitHub).
So we can run the code modified from the wordcloud examples in IPython Notebook/Jupyter, and after a few seconds, we can see that after the first two days of the workshop the students thought that it was:
%matplotlib inline
from matplotlib import pyplot as plt
from wordcloud import WordCloud, STOPWORDS

font = "/Library/Fonts/Arial Narrow.ttf"
(w,h) = (800,480)
stopwords = STOPWORDS.copy()

swc_wordcloud= WordCloud(font_path=font,stopwords=stopwords,width=w,height=h).generate(comments)

# Open a plot of the generated image.
swc_wordcloud_image = plt.imshow(swc_wordcloud)
ax = swc_wordcloud_image.axes
ax.axis("off")
plt.show(swc_wordcloud_image)














Good.



Done.






Well that would have been it.. except that after being wowed by R, Python on days 2 and 3, I wanted to ensure that the power of the Unix shell was not lost on the workshop participants!

Too hard you might think, but it's a good opportunity to reinforce the Unix philosophy of small programs that do one thing well.

A word cloud really needs two things: a list of words and values (frequencies/importance), and something to create the cloud image itself.

uniq does all the heavy lifting for the first part, and a blog post from Jwalanta Shrestha (@jwalanta) provides an ingenious awk script for the second.

cat swc_feedback_day12.csv |cut -d"," -f "3"|tr " " "\n" |tr '[:upper:]' '[:lower:]'|sed 's/[^a-z ]//g' |awk '{if (length($1)>=4) print $1 }'| sort|uniq -ci|sort -n|./make_cloud  > cloud.svg


After the rain

After three days everyone was getting tired, and another light moment to end the day seemed to work.

It was a nice way to end the workshop, and some quiet time which gave a chance for the participants to complete both a full feedback form online, and post it notes for that day.

Asking for too much feedback was suggested as a valid response, and though one or two took us up, there was still valuable insight into how the final day had progressed.

Even though it was too late to show directly to the workshop participants, having got this far, data entry for the final day was easy.

Here are all the notes:



and the associated cloud:





In retrospect, as a minimum the data entry of the feedback by day was a good chance to see the progression of the comments, and easier than keeping track of all those post it notes.

The word clouds were almost like a mini-capstone, with a simple purpose executed in multiple languages with knowledge learnt during the workshop.

I think it was a good technique, and would be interested if others also found it so. The code for the above is on GitHub, should other Software Carpentry workshops wish to use and expand on it.

It felt like a fun, but useful bit of code to show at the end of the day, something that was both trivial and interesting.

The only effort is in some data entry, and if it's split amongst instructors, should not be too great a burden.

I think it's worth it.

Friday, 2 May 2014

Getting Organised

The more time you spend developing your perfect organisation system, the better it has to be to help you catch up on all the time you spend developing it. This problem is reminiscent of the issue of improving routine tasks via xkcd:

http://xkcd.com/1205/

Diminishing returns aside, as I find myself in the early stages of my PhD, the tension between working, worrying about how much I am working and deciding to reinvent my  entire organisational system is.... rising.

It's easy to blame your organisational system when you are feeling stressed and unproductive - and I choose to do that. Acknowledging that you are taking the easy option is half the battle. First some
background.

Being a dry lab scientist (in training), how to effectively have a bioinformatics equivalent of the lab notebook has been something I've spent some time thinking about. A place to record experiments, ideas, outcomes and future directions, in a way that works with a general system for keeping track of day to day tasks and activities has been my goal.

For the last few months I've been using Evernote.  I've gone to and from Evernote a few times, but have again returned to it due to 3 simple uses that work for me:
- Combine paper based and computer notetaking
- Clients available for almost any platform I find myself
- Syncing is Somebody Else's Problem.

I'd tried TiddlyWiki previously, but the number of clients I could use it on, and keeping the syncing up to date proved difficult in practice. If I was on a remote server I'd open text files and add notes to them. If I was away from a computer I'd find the various apps a little fiddly to use and  kept resorting to paper, and then promising myself I'd transcribe that paper later. All this really mean was that I had another task to do before I could even get to doing my actual tasks.

Frustration with this setup coincided with two separate features I came across for Evernote: Geeknote, an open source command line tool and Evernote moleskin notebooks.

I'll write more about how Geeknote and using it from the command line in another post. Briefly though, the ability to type 'todo "Upgrade server to latest Ubuntu"' in the middle of a remote command line session on a server and be confident that it will be captured along with all your other tasks and notes has been a real benefit.

The Evernote notebook has also solved a problem I've had whenever I've been tempted to get a moleskin notebook before - they are too nice to write on! Rather than being a place for notes, they'd be book I'd carry around but hesitate to write in them for fear of marking the expensive book with something less than profound.

Evernote moleskin works around this by bundling in 3 months premium subscription to Evernote. A key feature of this is offline access to notebooks, which can be handily on things such as planes or when wireless is unreliable. I treat the moleskin Evernote notebook as paper I need to get through, rather than try to avoid spoiling. It’s been great, though I still need to get a handle on scanning in the pages on a regular basis.

However this system  hasn't been completely useful. It’s approaching a 'capture everything’ system ala GTD, but as way to plan tasks ahead, with a distinct todo list etc, it remains sub prime.

The addition of reminders to Evernote had made me think about how to use Evernote as a real GTD single collection point, but keeping track of the reminders isn't working with so many notebooks in my account, and the mix of scanned paper and computer tasks together meant that I never really had a to-do list.

That's about were I was up to from last week. Geeknote didn't support reminders, and I had one of those 'how hard can it be' moments. Fortunately, a day or two of using my limited python skills resulted in my first pull request to an open source project.

It is working. Sort of. My Geeknote extensions seemed stable, and the habit of adding notes into Evernote is almost a natural instinct. But I still don't review it; notes go to my @Inbox ( a GTD holdover) to be neglected.

But I feel that, like with so many things, you have to use what works for you. This setup is working for me better than anything else, and we can't let the perfect be the enemy of the good. So today, in an attempt to get something done, I'm making the setup a little bit neater.

After putting together all of my rough sketches of how I think I should get organised, I managed to merge all the notes and notebooks into some rough categories  as notebooks and notebook stacks for getting me through the PhD:




These generally reflect how I view most activities, and how I'm spending my day:

  • @Inbox - the capture everything notebook. Moving things out of this notebook (current size 370 notes) is one of my biggest weaknesses. 
  • @Someday - A catch-all for ideas that are, perhaps, before their time. 'Develop my own aligner' goes here. 
  • Infrastructure - Servers, scripts, tools and environments for getting bioinformatics done. There can be some overlap with research (for example, reviewing available bioinformatics workflows for a paper) but the essential criteria for infrastructure activities is that they are designed to help the research get done.
  • Networks - This is devoted to any activities that are about networks. These are professional organisation, informal journal clubs, student groups (local and international) and any other way that I'm trying to interact with the scientific community outside of (published) research. 
  • PhD - see below. 
  • Planning - This goes towards having an organising notebook - somewhere to plan how I plan, and ensure that I review where I'm going and why I'm doing it. Another problem at the moment is planning my week ahead. I hope to use this area to help get into the routine of stepping back, and being mindful of upcoming tasks, rather than only go to the squeaky wheel. 
  • Research - Finally, but equally important is the actual research that my PhD (and beyond) is based on. 
The PhD notebook stack is really concentrated on 'Getting the Dr on the door', a phrase a mentor uses and in the spirit of  the advice in 'How to Write a Better Thesis' that the thesis is an a examination; This folder is all about the experiences and deliverables required to get that PhD. The Research folder is where the content is created/modified/discarded/revisited. Keeping track of hurdles, requirements and personal goals - as well as writing (e.v.e.r.y d.a.y) -  is a distinct responsibility from doing the actual PhD research and I think it deserves it's own mindset.


There is still a lot to be done with my organisational setup. Having defined the broad areas of how I work, I still need to go through the 370 notes in my inbox, and then actually complete some of the tasks they include. I also need to make sure that I review things regularly.

I haven't done all that much today on the system - renamed a few notebooks and moved some notes around. But I feel better about it already. Acknowledging the shortcomings has helped me see it as a work in progress, and a way to move it one stop closer to the goal of a perfect good enough system to get me through my PhD (and beyond).


Saturday, 29 March 2014

From Research Assistant to PhD student

This post has been a couple of months coming (cue freak out that PhD is already two months in...) but was inevitably delayed. I had planned a very big post about my decision to start a PhD, what I enjoyed about working as a research assistant and what I was looking forward to with my reactivated student status.

But, the post was going to have to be perfect, and I'd have to capture everything about the transition so that I could revisit it later on. It was going to be the post that signalled I had started my PhD, but the delay meant that I kept running into friends who were surprised I'd not told anyone! As a temporary solution I posted to the local uni bioinformatics group the news and a question:
So, friends/colleagues/peers/etc, it has come to my attention that perhaps I haven't brought to everyone else's attention that I've started my PhD. So consider this post me telling you. Any advice from the collective intelligence of the Bioinformatics Graduate Student Association?
If you  read enough blogs and books about starting and finishing a PhD, 'write every day' is a recurring piece of advice, and it was one of the more frequent responses to my question from my friends. I suppose it's time to take the hint. 

The most obvious difference between being a Research Assistant (left)
and PhD student (right) so far is the size of coffee I can afford. 
This post therefore counts as writing, even though it isn't a large amount, and even though it isn't exactly profound, it's done. It's not the perfect welcome to PhD post I'd imagined, but I managed to  include the amusing picture about KeepCups that I wanted to and, as I'm about to hit publish, it's off my todo list and I'm thinking about the next thing I'm going to write. Which I guess is the point.  

Tuesday, 22 October 2013

Well that went quickly...

What seems like only last week but was almost two months ago was a post on what I intended to do given I needed Bpipe working under SLURM. 

Long story short - things went well: https://code.google.com/p/bpipe/source/browse/ReleaseNotes.txt

I'd intended to document the whole process as I went along, but as often is the case, necessity dictates that getting the job done means sometimes neglecting to take the time to record how you did it.

Not that I am too regretful. Diving headfirst has been a great refresher in version control and a chance to try Git, as well as try some rudimentary environments using Vagrant and Puppet

I don't think that the saying 'Show, don't tell' really applies to blog posts, but in order to get another post out the door, I'll risk it.

My steps in betting a Bpipe SLURM implementation were going to involve (the show):
  1. Vagrant (and Chef/Puppet)
  2. Git
  3. Bpipe & SLURM
The tell is:
  1. Vagrant, Puppet and some nasty shell scripts to have a reproducible dev environment for Bpipe:  https://github.com/lonsbio/bpipe-vagrant/tree/master
  2. See 1 and 3.
  3. Using 1, clone and develop a version of Bpipe that (sort of) works under SLURM! https://code.google.com/r/andrewlonsdale-bpipe-dev/
There is still work to do - it is preliminary support only - including adding MPI and SMP support for Bpipe SLURM. This will likely require more work on this (https://github.com/lonsbio/slurm-cluster-vagrant).

I won't promise to document these next steps, and then forget to as I go head first into doing them

However with any luck there will be some more 'tell' soon.

Sunday, 18 August 2013

Biting the Bullet(s) - Bpipe, Git and Vagrant (and Chef/Puppet)

I'm a big fan of Bpipe. Why I like it can wait for another post, and the homepage makes a good case. I'd relied on it in a Torque/PBS environment that recently moved to SLURM, which Bpipe does not currently support. Self-interest and open source code meant that if it was going to get added, I could at least get the ball rolling by contributing (or attempting to).

I had dabbled in some source code changes to Bpipe previously on a VM that I've not fired up in a few months. I don't quite recall how I installed it, the various path changes, and issues. The code changes I made are not under source control - they were just sitting there, and I never submitted them.

I know enough to know that's not the way to do it. But there are lots of things I know I should have set up and be using, including source control on everything and having reproducible development environment for contributing to a project such as Bpipe.

I've started on them, there are rough notes for all but a well documented coherent process for them ... <insert excuse here>.

But my reliance on Bpipe for being able to quickly reuse components, call parallel tasks and overall use supercomputing resources with minimal effort has forced my hand. It is clear that:

  1. I need to have Bpipe support SLURM.
  2. I need to setup a new development environment to work on Bpipe.  
  3. I need a way to have track the code changes I make, and share them with the community. 
  4. I need to get to work!
I've been wanting to try Vagrant (http://www.vagrantup.com/) for a while, as well as try out some provisioning software like Chef and Puppet. My GitHb account has been setup but dormant for a while. All that has been lacking is my will to set aside some time to do it.

With any luck, the following list will turn into links to the posts I make as I progress through each of these steps.
  1. Vagrant (and Chef/Puppet)
  2. Git
  3. Bpipe & SLURM