howto

Using Askgit

SQL is so nice

tags
git
sql
tools

askgit provides a sql interface to your git repository. Let's install it and see what we can figure out about the repo.

Installing

Following the instructions on the website, we can build the go binary with:

1
go get -v -tags=sqlite_vtable github.com/augmentable-dev/askgit

This will download the package, the dependencies, and compile everything into an executable. If you can't figure out where it's installed, check that you have GOPATH set correctly, and it will end up in $GOPATH/bin. And make sure that it's in your path by checking which askgit.

Get a test repository

I'm going to look at the ruby-git repo so you can follow along.

1
2
3
cd /tmp
git clone https://github.com/ruby-git/ruby-git
cd ruby-git

Commits

The basic unit that git works on is a commit, which is a series of file changes applied to a parent, or multiple parents in the case of a merge. Let's do some basic analysis of what's in a repo, how long it's been worked on, and who has been working on it.

How old is the repo?

Let's find the earliest timestamps in the repo. For most workflows these dates will be similar, but if the project shares patches over email (like linux or postgres or anything on sr.ht) the author_when is when the code was written (according to the author) and the committer_when is when it was added to the repo.

1
2
3
4
5
6
askgit "select min(author_when), min(committer_when) from commits"
+---------------------------+---------------------------+
|     MIN(AUTHOR WHEN)      |    MIN(COMMITTER WHEN)    |
+---------------------------+---------------------------+
| 2007-11-07T12:54:26-08:00 | 2007-11-07T12:54:26-08:00 |
+---------------------------+---------------------------+

First commit

And here's who did it:

1
2
3
4
5
6
askgit "select message, author_name, author_email, author_when from commits order by committer_when asc limit 1"
+-------------------------------+--------------+-------------------------------------------+---------------------------+
|            MESSAGE            | AUTHOR NAME  |               AUTHOR EMAIL                |        AUTHOR WHEN        |
+-------------------------------+--------------+-------------------------------------------+---------------------------+
| beginning of Ruby/Git project | scott Chacon | schacon@agadorsparticus.corp.reactrix.com | 2007-11-07T12:54:26-08:00 |
+-------------------------------+--------------+-------------------------------------------+---------------------------+

Biggest commit

Which commits added the most code?

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
askgit "select summary, author_name, author_when, additions from commits order by additions desc limit 10"
+--------------------------------+---------------+---------------------------+-----------+
|            SUMMARY             |  AUTHOR NAME  |        AUTHOR WHEN        | ADDITIONS |
+--------------------------------+---------------+---------------------------+-----------+
| added documentation and a      | scott Chacon  | 2007-11-11T17:59:18-08:00 |     12707 |
| license file                   |               |                           |           |
| updated a bunch of the         | scott Chacon  | 2007-11-13T07:36:50-08:00 |      2223 |
| documentation                  |               |                           |           |
| updated the docs               | scott Chacon  | 2007-11-16T13:24:30-08:00 |      2109 |
| updated docs                   | scott Chacon  | 2007-11-19T07:34:25-08:00 |      1448 |
| added push, changed some docs, | scott Chacon  | 2007-11-12T10:55:39-08:00 |      1193 |
| merged README and EXAMPLES,    |               |                           |           |
| fixed the Rake tasks to build  |               |                           |           |
| a proper gem                   |               |                           |           |
| Implementation and tests       | James Couball | 2019-12-11T10:04:03-08:00 |       763 |
| required to ensure that        |               |                           |           |
| command output encoding does   |               |                           |           |
| not raise an exception (#405)  |               |                           |           |
| added files from the           | scott Chacon  | 2007-11-20T12:07:46-08:00 |       749 |
| gitrb project, which seems     |               |                           |           |
| abandoned, but which is great  |               |                           |           |
| code                           |               |                           |           |
| adding my testing git          | scott Chacon  | 2007-11-11T17:11:19-08:00 |       711 |
| directory                      |               |                           |           |
| added branches, more log       | scott Chacon  | 2007-11-09T13:11:22-08:00 |       522 |
| stuff, better tests, changed   |               |                           |           |
| the log api a bit added        |               |                           |           |
| tests for Git::Lib, started    |               |                           |           |
| Git::Diff development          |               |                           |           |
| added some testing files and   | scott Chacon  | 2007-11-07T17:19:09-08:00 |       486 |
| got some real tests to run     |               |                           |           |
+--------------------------------+---------------+---------------------------+-----------+

We can see that these are at all the beginning of the project.

Biggest cleanup

Which commit removed the most code?

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
askgit "select summary, author_name, author_when, deletions from commits order by deletions desc limit 10"
+--------------------------------+--------------------+---------------------------+-----------+
|            SUMMARY             |    AUTHOR NAME     |        AUTHOR WHEN        | DELETIONS |
+--------------------------------+--------------------+---------------------------+-----------+
| remove doc from revision       | Jonathan Rudenberg | 2008-12-24T08:08:19+08:00 |     14822 |
| control                        |                    |                           |           |
| updated docs                   | scott Chacon       | 2007-11-19T07:34:25-08:00 |      1311 |
| updated the docs               | scott Chacon       | 2007-11-16T13:24:30-08:00 |      1141 |
| updated a bunch of the         | scott Chacon       | 2007-11-13T07:36:50-08:00 |      1061 |
| documentation                  |                    |                           |           |
| reverted the pure ruby code    | Scott Chacon       | 2008-03-08T18:14:15-08:00 |      1050 |
| to system calls and split the  |                    |                           |           |
| pure ruby to a new library     |                    |                           |           |
| added push, changed some docs, | scott Chacon       | 2007-11-12T10:55:39-08:00 |       976 |
| merged README and EXAMPLES,    |                    |                           |           |
| fixed the Rake tasks to build  |                    |                           |           |
| a proper gem                   |                    |                           |           |
| Cleaning up unused files/tasks | robertodecurnex    | 2014-02-03T18:14:29-03:00 |       750 |
| have the pure ruby bindings    | scott Chacon       | 2007-11-20T13:24:44-08:00 |       393 |
| working to some degree         |                    |                           |           |
| add markdown styling to readme | schneems           | 2011-10-19T16:30:26-05:00 |       240 |
| added documentation and a      | scott Chacon       | 2007-11-11T17:59:18-08:00 |       179 |
| license file                   |                    |                           |           |
+--------------------------------+--------------------+---------------------------+-----------+

Again most of these are at the beginning of the project, through there is a big 2014 cleanup that is probably a refactor of some kind.

How many commits do we have?

We've all seen this query before:

1
2
3
4
5
6
askgit "select count(*) from commits"
+----------+
| COUNT(*) |
+----------+
|      402 |
+----------+

How about most frequent multi-committer?

Lets group the commits by author_name and sort by the most committers. It looks like somewhere down the line scott Chacon changed to Scott Chacon so he's counted twice. Still, it looks like he's been doing a lot.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
askgit "select count(*) as count, author_name from commits group by author_name having count > 1 order by count desc"
+-------+--------------------+
| COUNT |    AUTHOR NAME     |
+-------+--------------------+
|    81 | scott Chacon       |
|    78 | robertodecurnex    |
|    77 | Roberto Decurnex   |
|    20 | Scott Chacon       |
|    20 | Joshua Nichols     |
|     9 | Per Lundberg       |
|     9 | James Couball      |
|     8 | Daniel Mendler     |
|     5 | Vern Burton        |
|     5 | Kelly Stannard     |
|     4 | elliottcable       |
|     4 | Josh Nichols       |
|     4 | James Rosen        |
|     3 | Yuichi Tateno      |
|     3 | Michael Mallete    |
|     3 | Jorge Bernal       |
|     2 | Yuya.Nishida       |
|     2 | Yuta Harima        |
|     2 | TIT                |
|     2 | Joshua Liebowitz   |
|     2 | Jonathan Rudenberg |
|     2 | Joe Moore          |
|     2 | Gianni Chiappetta  |
|     2 | Eric Goodwin       |
|     2 | Cameron Walsh      |
+-------+--------------------+

Commits by month

We can look at the activity of the project over time using the sqlite date functions. We can see that there's a lot of activity in the beginning and it trails off.

1
2
3
4
cd /tmp/ruby-git
askgit "select strftime(\"%Y-%m\", author_when) as 'month', 
       count(*) as total_commits, sum(additions), sum(deletions)
       from commits group by strftime(\"%Y-%me\", author_when) order by strftime(\"%Y-%m\", author_when);"
+---------+---------------+----------------+----------------+
|  MONTH  | TOTAL COMMITS | SUM(ADDITIONS) | SUM(DELETIONS) |
+---------+---------------+----------------+----------------+
| 2007-11 |            78 |          27597 |           6204 |
| 2008-01 |             3 |              5 |              5 |
| 2008-03 |            12 |            279 |           1101 |
| 2008-05 |            10 |            180 |             34 |
| 2008-07 |             4 |             33 |              6 |
| 2008-08 |             1 |              2 |              2 |
| 2008-10 |             1 |              1 |              1 |
| 2008-11 |             1 |              1 |              1 |
| 2008-12 |             6 |            180 |          15039 |
| 2009-02 |            16 |            202 |            216 |
| 2009-03 |             2 |              9 |              2 |
| 2009-04 |             1 |              1 |              1 |
| 2009-05 |             2 |            115 |             16 |
| 2009-08 |            13 |            168 |             97 |
| 2009-09 |             4 |             10 |              6 |
| 2009-10 |             6 |             11 |             11 |
| 2011-08 |             1 |              1 |              1 |
| 2011-09 |             1 |              0 |              0 |
| 2011-10 |             1 |            251 |            240 |
| 2011-12 |             1 |             30 |              1 |
| 2012-03 |             1 |              1 |              0 |
| 2012-04 |             1 |              1 |              1 |
| 2013-01 |             1 |              6 |              4 |
| 2013-04 |            45 |            704 |            388 |
| 2013-05 |             3 |              8 |              5 |
| 2013-06 |             1 |              9 |              0 |
| 2013-08 |            17 |            184 |             41 |
| 2013-09 |             1 |              8 |              8 |
| 2013-12 |             2 |              4 |              4 |
| 2014-01 |            26 |            254 |             96 |
| 2014-02 |             5 |              8 |            753 |
| 2014-05 |             1 |              2 |              2 |
| 2014-06 |             4 |             14 |              2 |
| 2014-07 |             6 |             44 |             35 |
| 2014-08 |             4 |              4 |              4 |
| 2014-09 |             5 |            109 |             54 |
| 2014-10 |             1 |             20 |              0 |
| 2014-11 |             4 |             10 |              4 |
| 2014-12 |            11 |            244 |             60 |
| 2015-01 |            36 |            446 |            216 |
| 2015-10 |             1 |              1 |              0 |
| 2016-02 |            12 |            216 |            196 |
| 2018-01 |             2 |             40 |             42 |
| 2018-02 |             1 |             17 |             17 |
| 2018-03 |             4 |            233 |             64 |
| 2018-04 |             1 |              8 |              4 |
| 2018-05 |             4 |             81 |             35 |
| 2018-06 |             5 |            192 |             19 |
| 2018-07 |             4 |             37 |              9 |
| 2018-08 |             5 |             64 |             15 |
| 2018-10 |             1 |            169 |              0 |
| 2019-02 |             1 |             65 |             45 |
| 2019-09 |             1 |              8 |              2 |
| 2019-12 |             1 |            763 |             17 |
| 2020-01 |             6 |            172 |             56 |
| 2020-02 |             5 |             15 |             10 |
| 2020-04 |             9 |             63 |              9 |
+---------+---------------+----------------+----------------+

Looking at this you can see that nothing happened in 2010 and 2017, a whole bunch of stuff was removed on 2008-11, and really there's only a fairly small amount of work done after the initial flurry of implementation.

Tags

Tags can represent releases. Presumably the project is at a point where you want to mark it in some way, and we'll use that to look at the life cycle of what's been going on.

Get a list of tags

Here we can see a list of tags for the project.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
askgit "select name, tagger_name, target_type from tags"

+-----------------------+------------------------------+------------------------------------------+
|         NAME          |         TAGGER NAME          |               TARGET TYPE                |
+-----------------------+------------------------------+------------------------------------------+
| refs/tags/1.0.3       | NULL                         | NULL                                     |
| refs/tags/1.0.5       | NULL                         | NULL                                     |
| refs/tags/1.0.5.1     | schacon@gmail.com            | ae106e2a3569e5ea874852c613ed060d8e232109 |
| refs/tags/v1.0.7      | schacon@gmail.com            | 1adc5b8136c2cd6c694629947e1dbc49c8bffe6a |
| refs/tags/v1.2.0      | NULL                         | NULL                                     |
| refs/tags/v1.2.1      | NULL                         | NULL                                     |
| refs/tags/v1.2.10     | decurnex.roberto@gmail.com   | 6f48447656c68317842aaa39a959dbe69681e38d |
| refs/tags/v1.2.2      | NULL                         | NULL                                     |
| refs/tags/v1.2.3      | NULL                         | NULL                                     |
| refs/tags/v1.2.4      | NULL                         | NULL                                     |
| refs/tags/v1.2.5      | NULL                         | NULL                                     |
| refs/tags/v1.2.6      | decurnex.roberto@gmail.com   | 6db4fdcad851eeaff6382a9eb6748455c3818c3e |
| refs/tags/v1.2.7      | roberto.decurnex@avature.net | 219c729b24b1dc868986f821655912612919ac21 |
| refs/tags/v1.2.8      | roberto.decurnex@avature.net | 8425a6b6169eaeda0afe2e16b0e4210ed433caf2 |
| refs/tags/v1.2.9      | decurnex.roberto@gmail.com   | 99b11561a64740285872f4c0a8b72e3e69308ed2 |
| refs/tags/v1.2.9.1    | decurnex.roberto@gmail.com   | cc6d6efc1b7eaf63609c4c06969b0f839dc80095 |
| refs/tags/v1.3.0      | decurnex.roberto@gmail.com   | a223fcf873bd99658cd1d0bdee1818adc7e3e92c |
| refs/tags/v1.4.0      | per.lundberg@ecraft.com      | c8d10125d0cf172684e2da776b79edf90baebc52 |
| refs/tags/v1.5.0      | per.lundberg@ecraft.com      | fd2642a626d20594a25010526f10ff99431c2cca |
| refs/tags/v1.6.0      | NULL                         | NULL                                     |
| refs/tags/v1.6.0.pre1 | jcouball@yahoo.com           | b9c7c5c51f935e547ab15a134e66c65ea0d9ab76 |
| refs/tags/v1.7.0      | couballj@verizonmedia.com    | 4bef5abbba073c77b4d0ccc1ffcd0ed7d48be5d4 |
+-----------------------+------------------------------+------------------------------------------+

When target_type is NULL it means that it's a lightweight tag and doesn't have a commit associated with it. Most of the time people will have a commit, and since a commit has a date it makes it easier for us to associate with.

Finding out who did the actual tagging and when

Here we can find out who did the tagging and what the message was.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
askgit "select name, commits.id, commits.message, author_name, author_email, author_when from tags, commits where tags.target_type = commits.id order by author_when"
+--------------------+------------------------------------------+--------------------------------+------------------+------------------------------+---------------------------+
|        NAME        |                    ID                    |            MESSAGE             |   AUTHOR NAME    |         AUTHOR EMAIL         |        AUTHOR WHEN        |
+--------------------+------------------------------------------+--------------------------------+------------------+------------------------------+---------------------------+
| refs/tags/1.0.5.1  | ae106e2a3569e5ea874852c613ed060d8e232109 | bug fixes                      | Scott Chacon     | schacon@gmail.com            | 2008-03-23T13:32:52-07:00 |
| refs/tags/v1.0.7   | 1adc5b8136c2cd6c694629947e1dbc49c8bffe6a | updated README with new        | Scott Chacon     | schacon@gmail.com            | 2008-05-27T12:59:54-07:00 |
|                    |                                          | install instructions           |                  |                              |                           |
| refs/tags/v1.2.6   | 6db4fdcad851eeaff6382a9eb6748455c3818c3e | Version 1.2.6 changelog        | robertodecurnex  | decurnex.roberto@gmail.com   | 2013-08-17T04:23:53-03:00 |
| refs/tags/v1.2.7   | 219c729b24b1dc868986f821655912612919ac21 | Moving version to 1.2.7        | robertodecurnex  | roberto.decurnex@avature.net | 2014-06-09T17:06:42-03:00 |
| refs/tags/v1.2.8   | 8425a6b6169eaeda0afe2e16b0e4210ed433caf2 | Removing jruby-head from       | Roberto Decurnex | roberto.decurnex@avature.net | 2014-07-31T16:57:57-03:00 |
|                    |                                          | .travis.yml                    |                  |                              |                           |
| refs/tags/v1.2.9   | 99b11561a64740285872f4c0a8b72e3e69308ed2 | Updating CHANGELOG             | Roberto Decurnex | decurnex.roberto@gmail.com   | 2015-01-12T16:51:41-03:00 |
| refs/tags/v1.2.9.1 | cc6d6efc1b7eaf63609c4c06969b0f839dc80095 | Updating version (1.2.9.1)     | Roberto Decurnex | decurnex.roberto@gmail.com   | 2015-01-15T14:15:46-03:00 |
| refs/tags/v1.2.10  | 6f48447656c68317842aaa39a959dbe69681e38d | Adding before install to       | Roberto Decurnex | decurnex.roberto@gmail.com   | 2016-02-25T16:16:06-03:00 |
|                    |                                          | travis. (update bundler)       |                  |                              |                           |
| refs/tags/v1.3.0   | a223fcf873bd99658cd1d0bdee1818adc7e3e92c | Bump version 1.3.0 Dropping    | Roberto Decurnex | decurnex.roberto@gmail.com   | 2016-02-25T19:22:08-03:00 |
|                    |                                          | ruby 1.8.x support             |                  |                              |                           |
| refs/tags/v1.4.0   | c8d10125d0cf172684e2da776b79edf90baebc52 | Release 1.4.0                  | Per Lundberg     | per.lundberg@ecraft.com      | 2018-05-16T09:49:34+03:00 |
| refs/tags/v1.5.0   | fd2642a626d20594a25010526f10ff99431c2cca | Release v1.5.0                 | Per Lundberg     | per.lundberg@ecraft.com      | 2018-08-10T09:57:26+02:00 |
| refs/tags/v1.7.0   | 4bef5abbba073c77b4d0ccc1ffcd0ed7d48be5d4 | Release v1.7.0                 | James Couball    | couballj@verizonmedia.com    | 2020-04-25T14:40:51-07:00 |
+--------------------+------------------------------------------+--------------------------------+------------------+------------------------------+---------------------------+

Looking at the dates of this, we can see that Scott Chacon started the project in 2008, Roberto Decurnex took over from 2013-2016, Per Lundberg in 2018, and finally James Couball did a release in 2020.

Looking at the notes it seems like it's the world around ruby-git that is changing (Travis, ruby very support, etc.).

Figuring out start and end dates

We can ask all of the questions above based upon the timestamps of when a tag was created. This will give us an idea of what's going on in the project during that time. Let's create a simple CSV file that will make it easier to ask these questions.

1
2
cd /tmp/ruby-git
askgit "select name, commits.id, author_when from tags, commits where tags.target_type = commits.id order by author_when" --format csv > tags.csv

Then we can make this a little simpler to loop over, where we have a start time and end time for each of the tags. We start with the initial commit as the start time, and then use the previous tag time going forward.

1
2
3
4
5
6
7
PREV_DATE=$(askgit "select min(author_when) as first from commits" --format csv |grep -v first)
while IFS=, read -r tag id date; do
  if [[ "$tag" != "name" ]]; then
    echo ${tag},${id},${PREV_DATE},${date}
    PREV_DATE=${date}
  fi
done < /tmp/ruby-git/tags.csv > /tmp/ruby-git/start_end.csv

Tag stats

We can loop over this file and pull out some stats for what went into each tag.

1
2
3
4
5
6
7
cd /tmp/ruby-git

while IFS=, read -r tag id start end; do
echo $tag
askgit "select count(*) as count from commits where author_when > '${start}' and author_when <= '${end}'" --format json | jq .count
askgit "select count(*) as count, author_name from commits where author_when > '${start}' and author_when <= '${end}' group by author_name order by count desc"
done < start_end.csv
refs/tags/1.0.5.1
0
+-------+-------------+
| COUNT | AUTHOR NAME |
+-------+-------------+
+-------+-------------+
refs/tags/v1.0.7
10
+-------+-----------------+
| COUNT |   AUTHOR NAME   |
+-------+-----------------+
|     6 | Scott Chacon    |
|     3 | Jorge Bernal    |
|     1 | Michael Siebert |
+-------+-----------------+
refs/tags/v1.2.6
128
+-------+--------------------+
| COUNT |    AUTHOR NAME     |
+-------+--------------------+
|    54 | robertodecurnex    |
|    20 | Joshua Nichols     |
|     8 | Daniel Mendler     |
|     5 | Scott Chacon       |
|     4 | elliottcable       |
|     4 | Roberto Decurnex   |
|     4 | Josh Nichols       |
|     4 | James Rosen        |
|     3 | Yuichi Tateno      |
|     3 | Michael Mallete    |
|     2 | Jonathan Rudenberg |
|     2 | Joe Moore          |
|     2 | Gianni Chiappetta  |
|     1 | schneems           |
|     1 | schacon            |
|     1 | devyn              |
|     1 | Zachery Hostens    |
|     1 | TJ Biddle          |
|     1 | TIT                |
|     1 | Steve Schwartz     |
|     1 | Nathan Fixler      |
|     1 | Michael Hale       |
|     1 | Ken Pratt          |
|     1 | Gabriel Cardoso    |
|     1 | Chad Rosen         |
|     1 | Bryan Larsen       |
+-------+--------------------+
refs/tags/v1.2.7
39
+-------+------------------+
| COUNT |   AUTHOR NAME    |
+-------+------------------+
|    22 | robertodecurnex  |
|     9 | Roberto Decurnex |
|     2 | Cameron Walsh    |
|     1 | OHTAKE Tomohiro  |
|     1 | Guy Hughes       |
|     1 | David Underwood  |
|     1 | Ben Sedat        |
|     1 | Arnvald          |
|     1 | Andy Schrage     |
+-------+------------------+
refs/tags/v1.2.8
8
+-------+------------------+
| COUNT |   AUTHOR NAME    |
+-------+------------------+
|     3 | Roberto Decurnex |
|     2 | robertodecurnex  |
|     2 | Yuya.Nishida     |
|     1 | Xavier Calland   |
+-------+------------------+
refs/tags/v1.2.9
56
+-------+------------------------+
| COUNT |      AUTHOR NAME       |
+-------+------------------------+
|    46 | Roberto Decurnex       |
|     4 | Kelly Stannard         |
|     1 | mwrock                 |
|     1 | Harald Sitter          |
|     1 | Francis Luong (Franco) |
|     1 | Dave Lane              |
|     1 | Brian Stevens          |
|     1 | Adam Hooper            |
+-------+------------------------+
refs/tags/v1.2.9.1
4
+-------+------------------+
| COUNT |   AUTHOR NAME    |
+-------+------------------+
|     3 | Roberto Decurnex |
|     1 | Eric Tsiliacos   |
+-------+------------------+
refs/tags/v1.2.10
12
+-------+------------------+
| COUNT |   AUTHOR NAME    |
+-------+------------------+
|    10 | Roberto Decurnex |
|     1 | David Vartanian  |
|     1 | Christian Paling |
+-------+------------------+
refs/tags/v1.3.0
2
+-------+------------------+
| COUNT |   AUTHOR NAME    |
+-------+------------------+
|     2 | Roberto Decurnex |
+-------+------------------+
refs/tags/v1.4.0
12
+-------+-------------------------+
| COUNT |       AUTHOR NAME       |
+-------+-------------------------+
|     6 | Per Lundberg            |
|     1 | Vern Burton             |
|     1 | Tom Potts               |
|     1 | Richard Vodden          |
|     1 | Rafael Reggiani Manzo   |
|     1 | Guilherme Maluf Balzana |
|     1 | Alexander Maslov        |
+-------+-------------------------+
refs/tags/v1.5.0
12
+-------+---------------------+
| COUNT |     AUTHOR NAME     |
+-------+---------------------+
|     3 | Vern Burton         |
|     2 | Per Lundberg        |
|     2 | Joshua Liebowitz    |
|     1 | Taichiro Suzuki     |
|     1 | Matias Garcia Isaia |
|     1 | Kody                |
|     1 | Kelly Stannard      |
|     1 | Denis Defreyne      |
+-------+---------------------+
refs/tags/v1.7.0
26
+-------+--------------------+
| COUNT |    AUTHOR NAME     |
+-------+--------------------+
|     9 | James Couball      |
|     2 | Yuta Harima        |
|     1 | cyclotron3k        |
|     1 | Vern Burton        |
|     1 | TIT                |
|     1 | Stephen Paul Weber |
|     1 | Salim Afiune       |
|     1 | Per Lundberg       |
|     1 | Michael Camilleri  |
|     1 | Marcel Hoyer       |
|     1 | Jonas Müller       |
|     1 | James Bunch        |
|     1 | Harald             |
|     1 | Evgenii Pecherkin  |
|     1 | Antonio Terceiro   |
|     1 | Alex Mayer         |
|     1 | Agora Security     |
+-------+--------------------+

Files

We can also look at the files inside of the commits. askgit does have a file table, but that shows the repository state at as specific commit, and thus is less useful for actually tracking changes that happened in a commit, or between commits.

But there's no reason why we can't use git diff-tree to find the information.

Files modified with a specific commit

We can see which files were changed in the latest commit:

1
2
cd /tmp/ruby-git
git diff-tree --no-commit-id -r --name-only HEAD^1
CHANGELOG.md
lib/git/version.rb

Which, at the time of this writing, is the same as:

1
2
cd /tmp/ruby-git
git diff-tree --no-commit-id -r --name-only 2eb64a4147b75ea8776828f2b8a6b50427dec06c
CHANGELOG.md
lib/git/version.rb

Files modified by a tag

We can use the .. way of specifying commit ranges, so let's look to see which files have changed between v1.5.0 and v1.7.0.

1
2
cd /tmp/ruby-git
git diff-tree --no-commit-id -r --name-only v1.5.0..v1.7.0
.github/stale.yml
.jrubyrc
.travis.yml
CHANGELOG.md
CONTRIBUTING.md
MAINTAINERS.md
PULL_REQUEST_TEMPLATE.md
README.md
RELEASING.md
git.gemspec
lib/git.rb
lib/git/base/factory.rb
lib/git/branch.rb
lib/git/config.rb
lib/git/diff.rb
lib/git/lib.rb
lib/git/version.rb
tests/files/encoding/dot_git/COMMIT_EDITMSG
tests/files/encoding/dot_git/HEAD
tests/files/encoding/dot_git/config
tests/files/encoding/dot_git/description
tests/files/encoding/dot_git/hooks/applypatch-msg.sample
tests/files/encoding/dot_git/hooks/commit-msg.sample
tests/files/encoding/dot_git/hooks/fsmonitor-watchman.sample
tests/files/encoding/dot_git/hooks/post-update.sample
tests/files/encoding/dot_git/hooks/pre-applypatch.sample
tests/files/encoding/dot_git/hooks/pre-commit.sample
tests/files/encoding/dot_git/hooks/pre-push.sample
tests/files/encoding/dot_git/hooks/pre-rebase.sample
tests/files/encoding/dot_git/hooks/pre-receive.sample
tests/files/encoding/dot_git/hooks/prepare-commit-msg.sample
tests/files/encoding/dot_git/hooks/update.sample
tests/files/encoding/dot_git/index
tests/files/encoding/dot_git/info/exclude
tests/files/encoding/dot_git/logs/HEAD
tests/files/encoding/dot_git/logs/refs/heads/master
tests/files/encoding/dot_git/objects/20/aefc8947d5bf08710afabe7712a1d6040ed5bd
tests/files/encoding/dot_git/objects/54/82c9609dd461acafcc859279490acfdea01f00
tests/files/encoding/dot_git/objects/87/d9aa884f84c67ac2185530f0b84d5eebda3eca
tests/files/encoding/dot_git/objects/91/59312af5dd77ca1fac174a3b965a806451b5c6
tests/files/encoding/dot_git/objects/cf/921422e5382afe0c90a772a2cb37867839ae64
tests/files/encoding/dot_git/objects/d4/fc598fff13f7bd681ceb38afafcae631ab3e50
tests/files/encoding/dot_git/refs/heads/master
tests/files/encoding/test1.txt
tests/files/encoding/test2.txt
tests/files/working/dot_git/config
tests/test_helper.rb
tests/units/test_config.rb
tests/units/test_diff_non_default_encoding.rb
tests/units/test_lib.rb
tests/units/test_logger.rb
tests/units/test_merge_base.rb
tests/units/test_thread_safety.rb
tests/units/test_thread_safty.rb

File history

It can be useful to see when a file was added. In this case again, askgit isn't necessarily the right way to go about it, since git log is much faster.

Here we can show the history of a specific file, PULL_REQUEST_TEMPLATE.md. We'll pass in some formatting options so that we could parse this later, and the --reverse field so that it's in chronological order. The pretty format that we are using is:

%aIAuthor date in iso8601
%anAuthor Name
%hShort commit id
%sSummary string
1
2
cd /tmp/ruby-git
git log --reverse --pretty='format:%aI %h %an %s' PULL_REQUEST_TEMPLATE.md
2018-06-20T06:58:17-05:00 4aa6d1f Vern Burton Creates CONTRIBUTOR and MAINTAINERS files (#353)
2020-02-04T03:31:12+09:00 e0a385f Yuta Harima Fix broken link in a PR template (#444)
2020-02-05T00:20:01+09:00 5c89ef6 Yuta Harima fix broken link in a PR template again (#446)

So we can see that Vern Burton did some GitHub related things in mid 2018, and Yuta Harima did some maintenance on it in Feb of 2020.

Thoughts

Static analysis of code repositories can add some interesting insights into how a project is maintained, in terms of who is working on it, and whether the original maintainers left or moved on to other things.

For example, we could look at who has been making changes to part of a system and, based upon who has been commiting changes to those files and how recently they've contributed to the project, understand if the knowledge of how it works is still with the project, or if the knowledge has decayed. git blame – terribly named, btw, I think it should be more like git gratitude – shows who last touched a specific line of code, but they may have moved on to other things. But if people have touched the surrounding lines or even the file itself, you can believe that the knowledge has been refreshed within the project as a whole.

Another thing that would be interesting to look at is churn on files, and churn on sets of files – which files have been changing, and which files change together. When commits change lots of things together, does this tell us something about the complexity of the project? Are we more likely to find bugs in "big" commits? Are files that have been unchanged in a long time more polished and complete, but ones that are frequently changing immature and need to be more tested?

Previously

labnotes

Getting Websters

More glorious language

tags
emacs
howto
language

Next

howto

gitlog in sqlite

sql is great

tags
git
sqlite