Advanced Topics

Reducing memory usage with RevWalk

The revision walk interface and the RevWalk and RevCommit classes are designed to be light-weight. However, when used with any repository of considerable size they may still require a lot of memory. This section provides hints on what you can do to reduce memory when walking the revision graph.

Restrict the walked revision graph

Try to walk only the amount of the graph you actually need to walk. That is, if you are looking for the commits in refs/heads/master not yet in refs/remotes/origin/master, make sure you markStart() for refs/heads/master and markUninteresting() refs/remotes/origin/master. The RevWalk traversal will only parse the commits necessary for it to answer you, and will try to avoid looking back further in history. That reduces the size of the internal object map, and thus reduces overall memory usage.

RevWalk walk = new RevWalk(repository);
ObjectId from = repository.resolve("refs/heads/master");
ObjectId to = repository.resolve("refs/remotes/origin/master");
walk.markStart(walk.parseCommit(from));
walk.markUninteresting(walk.parseCommit(to));
// ...

Discard the body of a commit

There is a setRetainBody(false) method you can use to discard the body of a commit if you don't need the author, committer or message information during the traversal. Examples of when you don't need this data is when you are only using the RevWalk to compute the merge base between branches, or to perform a task you would have used `git rev-list` with its default formatting for.

RevWalk walk = new RevWalk(repository);
walk.setRetainBody(false);
// ...

If you do need the body, consider extracting the data you need and then calling dispose() on the RevCommit, assuming you only need the data once and can then discard it. If you need to hang onto the data, you may find that JGit's internal representation uses less overall memory than if you held onto it yourself, especially if you want the full message. This is because JGit uses a byte[] internally to store the message in UTF-8. Java String storage would be bigger using UTF-16, assuming the message is mostly US-ASCII data.

RevWalk walk = new RevWalk(repository);
// more setup
Set<String> authorEmails = new HashSet<String>();
for (RevCommit commit : walk) {
	// extract the commit fields you need, for example:
	authorEmails.add(commit.getAuthorIdent().getEmailAddress());
 	commit.dispose();
}

Subclassing RevWalk and RevCommit

If you need to attach additional data to a commit, consider subclassing both RevWalk and RevCommit, and using the createCommit() method in RevWalk to consruct an instance of your RevCommit subclass. Put the additional data as fields in your RevCommit subclass, so that you don't need to use an auxiliary HashMap to translate from RevCommit or ObjectId to your additional data fields.

public class ReviewedRevision extends RevCommit {
	private final Date reviewDate;
	private ReviewedRevision(AnyObjectId id, Date reviewDate) {
		super(id);
		this.reviewDate = reviewDate;
	}
	public List<String> getReviewedBy() {
		return getFooterLines("Reviewed-by");
	}
	public Date getReviewDate() {
		return reviewDate;
	}
	public static class Walk extends RevWalk {
		public Walk(Repository repo) {
			super(repo);
		}
		@Override
		protected RevCommit createCommit(AnyObjectId id) {
			return new ReviewedRevision(id, getReviewDate(id));
		}
		private Date getReviewDate(AnyObjectId id) {
			// ...
		}
	}
}

Cleaning up after a revision walk

A RevWalk cannot shrink its internal object map. If you have just done a huge traversal of say all history of the repository, that will load everything into the object map, and it cannot be released. If you don't need this data in the near future, it may be a good idea to throw away the RevWalk and allocate a new one for your next traversal. That will let the GC reclaim everything and make it available for another use. On the other hand, reusing an existing object map is much faster than building a new one from scratch. So you need to balance the reclaiming of memory against the user's desire to perform fast updates of an existing repository view.

RevWalk walk = new RevWalk(repository);
// ...
for (RevCommit commit : walk) {
	// ...
}
walk.dispose();