Yi Tang Data Scientist with Emacs

Use Emacs's Org-mode to Effectively Manage Small Projects

Table of Contents

DEADLINE: <2015-06-09 Tue 20:00>

Org-mode is great to serve as knowledge management tool, it also has helped me increase my personal effectiveness. Recently I have been exploring org-mode for managing small projects in the business environment, in which collaboration happens occeasionally between me and the project team members.

In this post I summarised my workflow to organise, manage and monitor a project. The implementation of this workflow revolves around the collaboration. I have been practise this workflow for a while and can see my growth in planing and managing skills.


I use a broad definition of project: as long as a task that requires a series of sub-tasks to be done, then it is a project. Normally I categories any tasks that relates to a project into three groups:

Project Tasks
the major tasks that must to been done in order to deliver the project product.
administrative or miscellaneous tasks that keep the project goes on, like sent out the invoice.
anything that is important to the project and therefore worthy keeping a record, like meeting notes or decision made that that impacts the project progress.

Each category has a corresponding top level section or heading. Once this outline is setup, it is very convenient to view content under these categories, regardless of what tasks I was working on, either reading emails, coding, or writing report. Org-mode can scan all the .org files in a direcotry, and creates a tree-structure, with the file name being the root, and headings being the nodes.

An intuitive way to locate a any node is to start from the beginning, the process is same as finding a section in a text book. It can be summarised as:

  1. first, find the right book by its name,
  2. then find the right part,
  3. then narrow down to the right section,

and continue to the section I am interested in. An more pleasure way is to use fuzzy match supported by Helm package - I can narrow down the selection by random nodes. For example, as the images below shows, to locate headline under this article among 40 org files, I only need to search "Small pro", because there are only three headlines has "Small" in its name, "small changes", "small talk", and "small project", and "pro" narrow down to the unique headline.

It saves me a lot of time in remembering where I saved one notes, and wandering around the files to find something. I only explain a bit of the features of Helm, if you want to try out, you can find my configuration here. I recommend a good tutorial if you want to know more.



Figure 1: Test image

We usually a couple of projects at the same time. Also, create a new tasks or notes is easy. org-capture-mode would create a temporary node and by default it will be saved as a subtree in refile.org, or I can directly re-locate the headline directly to this project using the locating mechanism above.

These two features are most enjoyable to use, and make me away from wandering in multiple directories, trying to find the right files, and therefore increase my productivity. Never under estimate how long you will spent in finding in one file.


Projects usually come with hard deadlines about the product delivery. Setting and change deadlines in org-mode is pleasurable with org-deadline C-c C-d.

It brings up a mini-calendar buffer (shown below), I can use shift+left and shfit+right to move forward and backward for a day, or shift-up and shift-down to move between weeks, and hit RET to select a deadline. Apart from navigating, I can also choose to type the exact date directly, like "2015-07-25" and hit RET.


Once the deadline is set it will show up in that day's calendar. I don't want to suddenly realise there is a deadline I had on that day. So it makes sense to have an early warning period to show the tasks if it is due in days. This behaviour is governed by the org-deadline-warning-days variable. In my Emacs configuration, I set to 30 days. It gives me plenty of time to do any tasks.

I also set deadlines for sub-tasks since it is quite easy to do in org-mode. But coming up with realistic deadlines is difficult. To me, it must give enough time to do the task properly, to the PM, it must be fit in the whole project plan and resource. Both are likely to have different opinion on how long to implement the new features with documentation. It is quite important skills to have: to me, it reflects my understand on the problem and also my own technical capability, to the manager, it is part of their project plan.

My initial estimation may be far from the actual effort, especially when the problem domain is new to me, or I haven't done similar tasks before. The more I do, the better I am good at estimating. At this stage, I practise this skill seriously, and like to have someone with more experienced to review my estimation.

To make this task easy for them, I'd present an overall view of the project time-lines, which clearly shows the period allocate to the specific tasks. org-timeline will generate a time-sorted view for all the tasks. The recent feedback I received is that I tend to overlook the time spent on documentation and tests. Someone with more than 10 years in software development says they usually takes about 3x times on these two tasks together than actually coding.

time-line view also provides benchmark to the progress and I check it frequently to make sure I am on track. It gives the PM a reference for swapping tasks if some becomes urgent.


Additional to have the early warning system to prevent sudden surprise, org-mode provides another way of monitoring the project in terms of resource - the actual time I spent on the project. This feature is quite useful when I am given a quite loose deadline but with limited resource, say 150 hours.

Since the sub-tasks are mostly defined in the early stage, whenever I start to do it, I clock in first by org-clock-in. The clocking will be stopped once I manually clock out, or clock in to another task, or the tasks is completed (marked as DONE.) For each clock entry, it shows start time, end time and duration.

Multiple clocking logs are accumulated, and each entry shows the start time, end time, and duration. The durations can be added up and tells me exactly how much time I spent on each tasks. The whole tasks under the project and aggregated across the whole project, by one single function org-clock-report (C-c C- C-r).

Table 1: Clock summary at [2015-06-14 Sun 11:17]
Headline Time     Effort
Total time 10:41      
TODO Use Emacs's org-mode to Manage a Small Project 10:41      
  TODO Tasks   1:45    
   DONE add example for org-refile     0:35 0:30
   NEXT add example for org-clock-report     0:13 0:15
   NEXT proof read     0:11 0:15
   NEXT proof read - 2     0:46 1:00

It is normal to underestimate the complexity of an tasks, and spent too much time in resolve them, and usually I can catch up the in the later stage, however if I had the feeling the overall progress has been affected, I need require more sources from the PM, and the quote I will give is extra hours I had based on my initial estimation. That's an quick reaction.

Also, the clock-report table tells me the different between my effort estimation and the actual time I spent on that tasks.

Control the Plotting Order in ggplot2


The above two plots show the same data (included below), and if you are going to present one to summarise your findings, which will you choose? It is very likely you are going to pick the right one, because

  1. the linear increasing feature of bars is pleasant to see,
  2. it is easier to compare the categories, the ones on the right has higher value than the ones on the left, and
  3. categories with lowest and highest value are clearly shown,

In this article I am trying to explain how to specify the plotting orders in ggplot to whatever you want and encourage R starters to use ggplot2.

To create a bar plot is dead easy in R, take this dataset as an example,

mode count
ssh-mode 2361
fundamental-mode 4626
git-commit-mode 4869
mu4e-compose-mode 4964
emacs-lisp-mode 6205
shell-mode 10046
minibuffer-inactive-mode 12624
inferior-ess-mode 25774
ess-mode 47115
org-mode 78195

to get the plot on the right side, reorder the table by count (it is already been done), then

with(df, barplot(count, names.arg = mode)) 

will do the job. That's simple and easy, it does what you provide. This is completely different to ggplot() paradigm, which does a lot computation behind the scene.

ggplot(df, aes(mode, count)) + geom_bar()

will give you the first plot; the categories are in alphabetically order. In order to get a pleasant increasing order that depends on the count or any other variable, or even manually specified order, you have to explicitly change the level of factors.

df$mode.ordered <- factor(df$mode, levels = df$mode)

create another variable mode.oredered which looks the same as mode, except for the underlying levels are in different. It is set to the order of counts. Run the same ggplot code again will give you the plot on the right. How does it work?

First, every factor in R is mapped into an integer, and the default mapping algorithm is

  1. sort the factor vector alphabetically,
  2. map the first factor to 1, and last to 10.

So emacs-lisp-mode is mapped to 1 and ssh-mode is mapped to 10.

What the reorder script can do is to sort the factors by count, so that ssh-mode is mapped to 1 and org-mode is mapped to 10, I.e. the factor order which are set to the order of count.

How does this affects ggplot? I presume ggplot do the plotting on the order of levels, or let's say on the integer space, I.e. do the plotting from 1 to 10, and then add the labels for each.

In this example, the default barplot function did the job. Usually we need to do extra data manipulation so that ggplot will do what we want, in exchange for the plot good better and may fits in the other plots. Without considering the time constraints, I would encourage people to stick with ggplot because like many other things in life, once you understand, it becomes easier to do. For example, it is actually very easy to specify the order manually with only two steps:

  • first, sort the whole data.frame to a variable,
  • then change the levels options in factor() to what ever you want.

To show a decreasing trends - the reverse order of increasing, just use levels = rev(mode). How neat!

RExercise - Analyst Your Exercise Data in R

RExercise is a by-product of the ActivityDashboard. It parses your exercise data in .GPX format and for each workout, it returns

location table
a data.frame with longitude, latitude, elevation at a particular recording time,
summary table
a one-row data.frame of summary statistics about the workout, includes duration, distance, speed etc.

It comes with a helper function Parse_GPX_all to do the batch process and combine all data.frame together, also add city and country to the summary tables. Then you can see all the activities summary in one table, and use it to query both location and summary table, for example, how many miles did you run last year? How many cities had you run? It meant to make you feel great by showing you have achieved a lot.

Currently it parsing data from RunKeeper and Strava perfectly. .GPX format is generic data format so applying RExercise to data from other apps shuodn't be a problem. If you do, please feel free to contact me, I am extermely friendly to people who do exercise (:d), or sent me a pull request if you already figure out.


Suppose you have those .GPX data files,


RExercise will gives you a location table and summary table as follows:

Table 1: A Summary Table
id activity date start.time name duration (h) distance (km) speed (km/h) elevation (m) climb (m)
20150108-170830 Run 2015-01-08 17:08:14 Afternoon 0.13 0.74 5.4 109.0 11.1
20150109-171835 Run 2015-01-09 17:18:14 after work 0.42 3.33 7.9 110.5 60.1
20150111-113750 Run 2015-01-11 11:37:14 Sunday 0.50 4.25 8.4 130.6 136.6
20150112-171906 Run 2015-01-12 17:19:14 after work 0.51 4.08 7.9 110.4 88.6
Table 2: A Location Table
lon lat ele time
-2.019050 53.961909 108.4 2015-01-11 11:37:50
-2.017989 53.961375 109.8 2015-01-11 11:38:27
-2.018019 53.961427 109.8 2015-01-11 11:38:29
-2.018004 53.961536 109.8 2015-01-11 11:38:30
-2.018189 53.962276 110.4 2015-01-11 11:38:33
-2.018141 53.962277 110.4 2015-01-11 11:38:34
-2.018090 53.962276 110.4 2015-01-11 11:38:35


1. Install


2. Download GPX data

3. Set working directory and app

all.data <- Parse_GPX_all(data.dir = "~/ExerciseData/Strave/",
                         app = "Strava",
                         add.city = TRUE) 

You should have two tables as shown in Demo section.

Group Emacs Search Functions using Hydra

I am a search-guy: when I want to know something, I use the search functionality to locate to where has the keyword, and I didn't use my eyes to scan the page, it's too slow and harmful.

Emacs provides powerful functionality to do searching. For example, I use these commands very often (with the key-binds),

  1. isearch (C-s), search for a string and move the cursor to there,
  2. helm-swoop (C-F1), find all the occurrences of a string, pull out the lines containing the string to another buffer where I can edit and save,
  3. helm-multi-swoop M-X, apply helm-swoop to multiple buffers, very handy if I want to know where a function is called in different buffers.
  4. projectile-grep or helm-projectile-grep C p s g, find which files in current project contains a specific string, similar to helm-multi-swoop limits the search to files in project directory.

I love doing searching in Emacs, but the problem is to have to remember all the key-binds for different tasks. Also, sometimes, I forgot about what alternatives I have and usually go with the one that I most familiar with, which usually means not the right one. I sometimes realise I use isearch multiple times to do what ace-jump-word-mode can achieve by just once.

Org-mode Hydras incoming! gives me some idea to group all these functions together, and press a single key to perform different tasks, so this can free my mind from remembering all the key-binds. Also, I can write the few lines of text to reminds myself when to do what, and this potentially can solve problem two.

Here is the hydra implementation for searching:

(defhydra hydra-search (:color blue
                               :hint nil)
Current Buffer : _i_search helm-_s_woop _a_ce-jump-word
Multiple Buffers : helm-multi-_S_woop
Project Directory: projectile-_g_rep helm-projectile-_G_rep
  ("i" isearch-forward)
  ("s" helm-swoop)
  ("a" ace-jump-word-mode)
  ("S" helm-multi-swoop)
  ("g" projectile-grep)
  ("G" helm-projectile-grep))
(global-set-key [f4] 'hydra-search/body)

So next time, when I want to search something, I just press F4, and then it brings up all the choices I have, and I don't need to worry about the key-binds or which to use! That's cool!

I am looking forward simplifying my Emacs workflow using hydra package, the key challenge is to identify the logical similarities among the tasks and then group them together accordingly. For hydra-search(), it is "search something on somewhere".

A Workflow for Using Git to Track SVN Repository

Version control system is a complex issues, and hard to understand the idea of branching and different types of merging. I merely understand the basic of Git, and it already makes my life a lot easier, I am managing about 10 repositories at this moment without much effort.

But my collages are using SVN as the centre storage for scripts. Switching to SVN is not a problem, I just need few weeks to transfer the knowledge and start to use it. I am reluctant to learn something basic and have duplicated knowledge, also, I use GitHub and Bitbucket which are Git based. But sticking to Git make mine work impossible to work with collauges.

Then I found out the Git developer has already made effort to bridge Git and other version control system, like SVN. The git svn allows me to just Git commands for staging, cherry-picking, pull etc, and then upload to the SVN remote repository with just one command line. I really like the idea of transferring the skills from one system to another without any cost, it makes me believe Git is great and I can continue to use Magit in Emacs!

Here is the basic steps and comments for this work flow:

  1. Create a folder mkdir ProjRepo
  2. Create an empty Git repository git init
  3. Add the following to .git/config
[svn-remote "svn"] url = https://your.svn.repo fetch = :refs/remotes/git-svn

and change the URL to right repository,

  1. pull from SVN central repository to this folder, git svn fetch svn
  2. switch to SVN remote branch, git checkout -b svn git-svn
  3. modify or add files
  4. use git add and git commit for snapshot local changes
  5. sometimes need to update local repository, git svn rebase
  6. finally upload local changes to SVN central repository git svn dcommit

See the official manual 8.1 Git and Other Systems - Git and Subversion git-svn documentation for more details.

If you have any questions or comments, please post them below. If you liked this post, you can share it with your followers or follow me on Twitter!