Improving on Easy GIT

by havoc

I previously
mentioned
Easy GIT,
which greatly improves git, in large part by hiding man pages and
command line options packed with unimportant implementation detail,
while adding examples and options that relate to workflow.

Since then I’ve been using Easy GIT with other people and a central
repository, and moving up the learning curve a bit, and started to
find some stuff that still doesn’t work for me.

(I guess I’ll say eg and git interchangeably, since in many cases
enhancements could go in either project.)

Should be a way to globally see what is outstanding

For my workflow, with a central repository and a small team (which is
all my projects ever, whether D-Bus or Metacity or LiTL), any
local-only changes or local-only branches are temporary.

In fact my standard procedure on a branch that will last a few days is
to push to the server pretty much every hour or so, so I have a
backup. Easy GIT (or git) makes this easier in some ways, since it’s
easy to create my own branch on the server.

No way I’m keeping a few days or weeks of work only on my local
drive. I’ve watched a few too many other people do that and regret it.

Here’s what happens, though. As the day goes on I end up with a
half-dozen branches, and some commits on master too, in various stages
of patch review, some approved for merge to master and some not. For
most of these branches I probably intended to push them to the server,
but for some really small quick-fixes, perhaps not.

Now say I want to power down, or go to bed for the night, or switch
from my home computer to my work computer; what I want to do is say
“sync to server” – just back it all up! I don’t want stuff only on my
local drive. If I go from home to work or vice versa, I want
everything available on the server so I have it.

Two ideas:

  • eg sync origin should be possible. “Just put everything on the
    server unless I’ve explicitly marked it local-only.”
  • eg outstanding origin should be a command that describes all
    differences between local and remote repositories, so if
    something is not synced, I can quickly find it.

This morning I set out to push all my patches that had not been
pushed. Problem one: I couldn’t figure out what these patches were.

Remote tracking branches: implementation detail

Remote tracking branches are confusing, and I think could simply be an
implementation detail. I care about remote repositories (“remotes”); I
care about branches that are on remotes; I care about having an
offline cache of branches that are on remotes; but I do not
care that the offline cache happens to be implemented as a branch. And
I do not ever, ever, ever want to write to the
remote tracking branch.

How does one write to a remote tracking branch? I’m not sure to be
honest. But today, for a second time, I discovered I had a remote
tracking branch that was somehow not the same as the branch on the
server it was supposed to be tracking. My only guess is that this
results from typing “push origin/master” instead of “push origin”, or
the like. But I have no idea, really, how this could happen, or why I
would want it to happen. Worse, I haven’t been able to figure out how
to fix it, short of a fresh clone.

This is only a small symptom of the problem, though. I think the big
picture is that for purposes of command line syntax, “origin/master”
should mean “branch master on the origin remote.” If an operation
should be done offline (as everything except writes and fetches should
be), then behind the scenes it would use the remote tracking
branch. If an operation is a write, then it should go to the remote
branch instead of the tracking branch.

I don’t need to know that “origin/master” and “branch master
on origin” are different. I think it’s clear in all contexts which one
I mean, because git already separates network operations from
local-only operations, and because it is never correct to modify the
remote tracking branch (except to pull in new stuff from the remote
branch, of course).

On every pull, the system should verify that the remote tracking
branch (aka the offline cache) is exactly the same as the remote
branch, and make it be the same if it isn’t. And “push –branch master
origin” simply should not be different from “push origin/master” –
that’s crazy.

Whether and where to push/pull: property of the branch, not of the
push/pull operation

Getting back to the idea of “eg sync”: at any given time, I’m planning
to either never push a branch, or always push a branch, or not push it
for a while and then only push it when I explicitly decide to; but
whatever the plan, it’s not something that changes every hour. I want
to say “keep this branch in sync with server”; or “don’t send this to
the server ever”; or “don’t sync this for now, I’ll re-enable sync later.”

If branches were tagged with whether to push them or not, and to which
server branch, I could globally “eg sync” the entire repository.

I guess git lets you push a branch to multiple different remote
branches. Seems like an obscure feature that I can’t imagine
using. For me it would be fine if, for each remote branch I want
changes on, I had to create a local branch, attach it to that remote
branch, make the changes on this dedicated local branch, and push.
In the normal case, I would have a local branch for all remote
branches already anyway.

But, if there are people who love pushing to lots of remote branches
from one local branch, they can just set all branches as “never sync”
and then they can push individual branches by hand. The rest of us
should be able to sync all shared branches at once, while
still having local-only branches if we want.

Easy GIT has –all-branches and –matching-branches, but these are IMO
wrong workarounds. –all-branches forces you to push stuff that may be
a throwaway local branch or “on hold” temporarily. –matching-branches
doesn’t push new branches and may also push a branch you wanted to keep
on hold. What’s needed is that branches know where they go; I
shouldn’t have to push with a special “wildcard” option to do the
normal thing, which is to sync all branches marked shared, and do not
sync any branches I intend to be local-only.

Feedback: tell me what’s going on!

Now that Easy GIT fixed the docs, I think the number-one UI deficiency
in git is that it has no feedback; it does not explain what it’s doing
when it’s doing it. Sometimes it’s totally silent;
sometimes it has a bunch of babble about “objects” and “packs” that
means nothing to me; neither of those is good.

This steepens the learning curve, since you can’t watch what commands do.

Maybe worse, it makes the source control system “feel bad.” For me,
the purpose of a source control system is to make it so I can
never lose any history or data
; when every command feels like it
did something mysterious I’m not sure I understand, I don’t have a
sense of security.

Commands should output things like: “downloading changes from remote
server ‘origin’ on remote branch ‘master'”; “merging branch
origin/master onto branch master”; “3 new changes applied to
master”. For each command, I should get feedback on any network
transfers; all branches that were involved; and all commits that were
created or merged.

“eg branch” should show more than only branch names to help orient
me. I would like to know if the branch is synced and if so to which
remote branch, for example.

ChangeLog workflow is wrong

For a detailed ChangeLog, I want to write the ChangeLog entry as I
develop the code, using ‘C-x 4 a’ in Emacs, ideally.

The problem is that when I go to commit, that’s not when I
want to write the log
. I prefer to write it either as I go,
or just before commit as part of self-reviewing the patch – I read the
patch while doing ‘C-x 4 a’ to document each part. That’s the value of
having a ChangeLog file that exists always, and isn’t just an open
editor at commit time.

However, if you have a ChangeLog file git barfs all over every
merge. git should be smarter. Merging ChangeLog conflicts is not
exactly a computationally intractable problem. But there’s an even
better solution maybe.

Every time I switch to a branch, git could create an empty file called
ChangeLog; then when I commit, it could pre-fill the editor with the
contents of that file and reset the file to empty. Magic!

The problem is not that ChangeLog disrupts git merges. The problem is
that git does not support the nice format and workflow of ChangeLog.

Use EMAIL and GECOS

A minor thing, but if you just start using git, it puts garbage in the
Author field. Every other program uses the EMAIL environment variable
and your UNIX account information. That is a good default. If people
want to override it via config option, then let them, but don’t
require configuration to get started.

Easier way to see what a branch does

If you want to review a branch to see if it should be merged, the
syntax is the magic triple dots: git diff master...mybranch

This is weird, arcane, hard to discover… and something I need to do
all the time.

I’m not sure what the right solution is. Maybe just docs, or maybe it
should be an option to diff instead of the funky triple-dots.

Deleting a remote branch

I think to delete a remote branch you have to do eg push
:branchname
, another strange and surprising syntax.

eg branch -d remotename/branchname should work,
IMO. (Again, writes to a remote branch should modify the server-side
branch, not the remote tracking branch.)

Can the central repository be “messed up”?

With subversion, I think it’s basically impossible for someone with
access to the central repo to accidentally make a change that
can’t be reverted. Sure they can log on to the server and delete stuff
from the shell, but with Subversion commands, I can’t do anything that
won’t show up in the history.

I can’t tell whether this is true with git. Throughout the docs there
are options like “–force” and “–hard” and warnings about how using
the command can screw you. I don’t know how many of these warnings
apply to central, remote repositories, but I worry about it. Remember,
I don’t understand the git docs, and hope I never have to try.

An example from man git-push:

–force

Usually, the command refuses to update a remote ref that is not an
ancestor of the local ref used to overwrite it. This flag disables
the check. This can cause the remote repository to lose commits; use
it with care.

Wait – can cause the remote repository to lose
commits
???!!! This is not what I’m looking for in a source
control system. It’s the main thing a source control system is
supposed to be preventing!

Accidents worry me a lot more than malicious people or mysterious
cosmic rays. Especially when something as absurdly hard to
use as git is involved!

It also bugs me that I can accidentally do things that while
theoretically recoverable, are still very hard to recover from. For
example, somehow having changes on remote tracking branches that are
not on the server. (To beat that dead horse a bit more.)

Overview of branch relationships

If you want to understand the branch structure of a project, your best
bet is gitk, and gitk is not a good bet. I do not understand the gitk
display at all.

There’s probably some simple info the command line could report that
would be very helpful, such as which branches have changes that are
not on master, which branches were ever merged into a given
branch, or which branch a branch was originally branched from. Perhaps
some of this should be in the “git branch” output by default.

Conclusion

So much work to do.

(This post was originally found at http://log.ometer.com/2008-04.html#29)

My Twitter account is @havocp.
Interested in becoming a better software developer? Sign up for my email list and I'll let you know when I write something new.