Week 2 [Mon, Aug 18th] - Topics

Detailed Table of Contents

Guidance for the item(s) below:

Given this is a first course in SE, tradition demands that we start by defining the subject. However, let's not spend a lot of time going through lengthy/formal definitions of SE. Instead, let's look at an extract from the very first chapter of a very famous SE book, with the aim of providing some inspiration, but also an appreciation of the challenges ahead.

[W2.1] SE: Intro

W2.1a

Software Engineering → Introduction → Pros and cons

Software engineering: Software Engineering is the application of a systematic, disciplined, quantifiable approach to the development, operation, and maintenance of software" _{-- IEEE Standard Glossary of Software Engineering Terminology}

The following description of the Joys of the Programming Craft was taken (and emphasis added) from Chapter 1 of the famous book The Mythical Man-Month, by Frederick P. Brooks.

Why is programming fun? What delights may its practitioner expect as his reward?

First is the sheer joy of making things. As the child delights in his mud pie, so the adult enjoys building things, especially things of his own design. I think this delight must be an image of God's delight in making things, a delight shown in the distinctness and newness of each leaf and each snowflake.

Second is the pleasure of making things that are useful to other people. Deep within, you want others to use your work and to find it helpful. In this respect the programming system is not essentially different from the child's first clay pencil holder "for Daddy's office."

Third is the fascination of fashioning complex puzzle-like objects of interlocking moving parts and watching them work in subtle cycles, playing out the consequences of principles built in from the beginning. The programmed computer has all the fascination of the pinball machine or the jukebox mechanism, carried to the ultimate.

Fourth is the joy of always learning, which springs from the nonrepeating nature of the task. In one way or another the problem is ever new, and its solver learns something: sometimes practical, sometimes theoretical, and sometimes both.

Finally, there is the delight of working in such a tractable medium. The programmer, like the poet, works only slightly removed from pure thought-stuff. He builds his castles in the air, from air, creating by the exertion of the imagination. Few media of creation are so flexible, so easy to polish and rework, so readily capable of realizing grand conceptual structures....

Yet the program construct, unlike the poet's words, is real in the sense that it moves and works, producing visible outputs separate from the construct itself. It prints results, draws pictures, produces sounds, moves arms. The magic of myth and legend has come true in our time. One types the correct incantation on a keyboard, and a display screen comes to life, showing things that never were nor could be.

Programming then is fun because it gratifies creative longings built deep within us and delights sensibilities you have in common with all men.

Not all is delight, however, and knowing the inherent woes makes it easier to bear them when they appear.

First, one must perform perfectly. The computer resembles the magic of legend in this respect, too. If one character, one pause, of the incantation is not strictly in proper form, the magic doesn't work. Human beings are not accustomed to being perfect, and few areas of human activity demand it. Adjusting to the requirement for perfection is, I think, the most difficult part of learning to program.

Next, other people set one's objectives, provide one's resources, and furnish one's information. One rarely controls the circumstances of his work, or even its goal. In management terms, one's authority is not sufficient for his responsibility. It seems that in all fields, however, the jobs where things get done never have formal authority commensurate with responsibility. In practice, actual (as opposed to formal) authority is acquired from the very momentum of accomplishment.

The dependence upon others has a particular case that is especially painful for the system programmer. He depends upon other people's programs. These are often maldesigned, poorly implemented, incompletely delivered (no source code or test cases), and poorly documented. So he must spend hours studying and fixing things that in an ideal world would be complete, available, and usable.

The next woe is that designing grand concepts is fun; finding nitty little bugs is just work. With any creative activity come dreary hours of tedious, painstaking labor, and programming is no exception.

Next, one finds that debugging has a linear convergence, or worse, where one somehow expects a quadratic sort of approach to the end. So testing drags on and on, the last difficult bugs taking more time to find than the first.

The last woe, and sometimes the last straw, is that the product over which one has labored so long appears to be obsolete upon (or before) completion. Already colleagues and competitors are in hot pursuit of new and better ideas. Already the displacement of one's thought-child is not only conceived, but scheduled.

This always seems worse than it really is. The new and better product is generally not available when one completes his own; it is only talked about. It, too, will require months of development. The real tiger is never a match for the paper one, unless actual use is wanted. Then the virtues of reality have a satisfaction all their own.

Of course the technological base on which one builds is always advancing. As soon as one freezes a design, it becomes obsolete in terms of its concepts. But implementation of real products demands phasing and quantizing. The obsolescence of an implementation must be measured against other existing implementations, not against unrealized concepts. The challenge and the mission are to find real solutions to real problems on actual schedules with available resources.

This then is programming, both a tar pit in which many efforts have floundered and a creative activity with joys and woes all its own. For many, the joys far outweigh the woes....

Exercises:

SE vs Civil Engineering

Software vs Bridges

Coding as a manufacturing activity

List pros and cons of SE

Which one of these is not included in Brook’s list of ‘Woes of the Craft’?

Guidance for the item(s) below:

Now, let's switch our focus to the project management aspect of SE.

Broadly speaking, there are two approaches to doing a software project. Those two approaches are also highly relevant to the way this course is run, and how it is different from most SE courses elsewhere.

Let's learn about those two approaches early so that we can better understand how this course works.

[W2.2] SDLC Process Models: Basics

Video

W2.2a

Project Management → SDLC Process Models → Introduction → What

Software development goes through different stages such as requirements, analysis, design, implementation and testing. These stages are collectively known as the software development lifecycle (SDLC). There are several approaches, known as software development lifecycle models (also called software process models), that describe different ways to go through the SDLC. Each process model prescribes a 'roadmap' for the software developers to manage the development effort. The roadmap describes the aims of the development stages, the outcome of each stage, and the workflow i.e., the relationship between stages.

W2.2b

Project Management → SDLC Process Models → Introduction → Sequential models

The sequential model, also called the waterfall model, views software development as a linear process, in which the project is seen as progressing through the development stages. The name waterfall stems from how the model is drawn to look like a waterfall (see below).

When one stage of the process is completed, it produces some artifacts to be used in the next stage. For example, the requirements stage produces a comprehensive list of requirements, to be used in the design phase.

A strict sequential model project moves only in the forward direction i.e., each stage is completed before starting the next. For example, once the requirements stage is over, there is no provision for revising the requirements later.

This model can work well for a project that produces software to solve a well-understood problem, in which case the requirements can remain stable and the effort can be estimated accurately. Furthermore, as each stage has a well-defined outcome, it is easy to track the progress of the project because one can gauge the project progress by monitoring which stage the project is in.

However, real-world projects often tackle problems that are not well-understood at the beginning, making them unsuitable for this model. For example, target users of a software product may not be able to state their requirements accurately at the start of the project, if they have not used a similar product before.

W2.2c

Project Management → SDLC Process Models → Introduction → Iterative models

The iterative model advocates producing the software by going through several iterations. Each of the iterations could potentially go through all the stages of the SDLC, from requirements gathering to deployment.

Each iteration produces a new version of the product, building upon the version produced in the previous iteration. Feedback from each iteration is factored into the subsequent iterations. For example, if an implementation task took longer than expected, the effort estimate for a similar tasks in future iterations can be adjusted accordingly. Similarly, if a feature introduced in the current iteration was not well-received by target users, it can be removed or tweaked in the next iteration.

The iterative model can be done in breadth-first or depth-first approach.

In the breadth-first approach, an iteration evolves all major components and all functionality areas in parallel i.e., most features and most will be updated in each iteration, producing a working product at the end of each iteration.
In the depth-first approach, an iteration focuses on fleshing out only some components or some functionality area. Accordingly, early depth-first iterations might not produce a working product.

Taking a Minesweeper game as an example,

breadth-first iterations will deliver a fully playable version early. These early versions may have primitive functionality, for example, a rudimentary text based UI, fixed board size, limited minefield layouts, etc. These functionalities (and corresponding components) will then be improved in later releases.
an early depth-first iteration could deliver the full user interface (UI) but with no game logic at all. Alternatively, an early iteration could focus on just the logic for generating initial layouts of the minefield. Neither will be a playable version of the game but both can be used to collect early feedback (about the UI, and the initial minefield layouts, respectively) which can then be used to guide later iterations.

A project can be done as a mixture of breadth-first and depth-first iterations i.e., an iteration can contain some breadth-first work as well as some depth-first work, or, some iterations can be breadth-first while others are depth-first.

Follow up notes for the item(s) above:

Scanning a TLDR version of a topic: As mentioned in 'Using this Website' page, the more important layer of information is given in bold text. For example, you can quickly scan the essential points of a topic by reading the bold text only (this could be useful when you want to quickly recap a previous topic, or to get an idea of what a topic covers without reading all the details).

Guidance for the item(s) below:

Next, let's resume our Git Learning Trial, covering a few more tours. the first two focus on working with GitHub, while the other two focus on getting more out of the Git revision history.

[W2.3] RCS: Getting Started with GitHub

W2.3a

Git Learning Trail → Tour 2: Backing up a Repo on the Cloud

Tour 2: Backing up a Repo on the Cloud

Target Usage: To back up a Git repository on a cloud-based Git service such as GitHub.

Motivation: One (of several) benefits of maintaining a copy of a repo on a cloud server: it acts as a safety net (e.g., against the folder becoming inaccessible due to a hardware fault).

Lesson plan:

To back up your Git repo on the cloud, you’ll need to use a remote repository service, such as GitHub.

T2L1. Remote Repositories covers that part.

To use GitHub, you need to sign up for an account, and configure related tools/settings first.

T2L2. Preparing to use GitHub covers that part.

The first step of backing up a local repo on GitHub: create an empty repository on GitHub.

T2L3. Creating a Repo on GitHub covers that part.

The second step of backing up a local repo on GitHub: link the local repo with the remote repo on GitHub.

T2L4. Linking a Local Repo With a Remote Repo covers that part.

The third step of backing up a local repo on GitHub: push a copy of the local repo to the remote repo.

T2L5. Updating the Remote Repo covers that part.

Git allows you to specify which files should be omitted from revision control.

T2L6. Omitting Files from Revision Control covers that part.

T2L1. Remote Repositories

To back up your Git repo on the cloud, you’ll need to use a remote repository service, such as GitHub.

This lesson covers that part.

A repo you have on your computer is called a local repo. A remote repo is a repo hosted on a remote computer and allows remote access. Some use cases for remote repositories:

as a backup of your local repo
as an intermediary repo to work on the same files from multiple computers
for sharing the revision history of a codebase among team members of a multi-person project

It is possible to set up a Git remote repo on your own server, but an easier option is to use a remote repo hosting service such as GitHub.

T2L2. Preparing to use GitHub

To use GitHub, you need to sign up for an account, and configure related tools/settings first.

This lesson covers that part.

GitHub is a web-based service that hosts Git repositories and adds collaboration features on top of Git. Two other similar platforms are GitLab and Bitbucket. While Git manages version control locally, such platforms provide additional features such as shared access to repositories, issue tracking, code reviews, and permission controls. They are widely used in software development projects, for both open-source software (OSS) and closed-source software projects.

On GitHub, a Git repo can be put in one of two spaces:

A GitHub user account represents an individual user. It is created when you sign up for GitHub and includes a username, profile page, and personal settings. With a user account, you can create your own repositories, contribute to others’ projects, and manage collaboration settings for any repositories you own.
A GitHub organisation (org for short) is a shared account used by a group such as a team, company, or open-source project. Organisations can own repositories and manage access to them through teams, roles, and permissions. Organisations are especially useful when managing repositories with shared ownership or when working at scale.

Every GitHub user must have a user account, even if they primarily work within an organisation.

PREPARATION: Create a GitHub account

Create a personal GitHub account as described in GitHub Docs → Creating an account on GitHub, if you don't have one yet.

Choose a sensible GitHub username as you are likely to use it for years to come in professional contexts e.g., in job applications.

[Optional, but recommended] Set up your GitHub profile, as explained in GitHub Docs → Setting up your profile.

Before you can interact with GitHub from your local Git client, you need to set up authentication. In the past, you could simply enter your GitHub username and password, but GitHub no longer accepts passwords for Git operations. Instead, you’ll use a more secure method — such as a Personal Access Token (PAT) or SSH keys — to prove your identity.

A Personal Access Token (PAT) is essentially a long, random string that acts like a password, but it can be scoped to specific permissions (e.g., read-only or full access) and revoked at any time. This makes it more secure and flexible than a traditional password.

Git supports two main protocols for communicating with GitHub: HTTPS and SSH.

With HTTPS, you connect over the web and authenticate using your GitHub username and a Personal Access Token.
With SSH, you connect using a cryptographic key pair you generate on your machine. Once you add your public key to your GitHub account, GitHub recognises your machine and lets you authenticate without typing anything further.

PREPARATION: Set up authentication with GitHub

Set up your computer's GitHub authentication, as described in the se-edu guide Setting up GitHub Authentication.

GitHub associates a commit to a user based on the email address in the commit metadata. When you push a commit, GitHub checks if the email matches a verified email on a GitHub account. If it does, the commit is shown as authored by that user. If the email doesn’t match any account, the commit is still accepted but won’t be linked to any profile.

GitHub provides a no-reply email (e.g., 12345678+username@users.noreply.github.com) that you can use as your Git user.email to hide your real email while still associating commits with your GitHub account.

PREPARATION: [Optional] Configure user.email to use the no-reply email from GitHub

If you prefer not to include your real email address in commits, you can do the following:

Find your no-reply email provided by GitHub: Navigate to the email settings of your GitHub account and select the option to Keep my email address private. The no-reply address will then be displayed, typically in the format ID+USERNAME@users.noreply.github.com.

Update your user.email with that email address e.g.,

git config --global user.email "12345678+username@users.noreply.github.com"

GitHub offers its own clients to make working with GitHub more convenient.

The GitHub Desktop app provides a GUI for performing GitHub operations from your desktop, without needing to visit the GitHub web UI.
The GitHub CLI (gh) brings GitHub-specific commands to your terminal, letting you perform operations on GitHub from your command line.

If you are using Git-Mastery exercises (strongly recommended), you need to install and configure GitHub CLI because it is needed by Git-Mastery exercises involving GitHub.

PREPARATION: Set up GitHub CLI

Windows

1. Download and run the installer from the GitHub CLI releases page. This is the file named as GitHub CLI {version} windows {chip variant} installer.

MacOS

1. Install GitHub CLI using Homebrew:

brew install gh

Linux

1. Install GitHub CLI, as explained in the GitHub CLI Linux installation guide for your distribution.

2. Authenticate yourself to GitHub account:

gh auth login

When prompted, choose the protocol (i.e., HTTPS or SSH) you used previously to set up your GitHub authentication.

3. Give GitHub CLI permission to delete repos in your account, as this is required for some of the Git-Mastery exercises.

gh auth refresh -s delete_repo

4. Verify the setup by checking the status of your GitHub CLI with your GitHub account.

gh auth status

You should see confirmation that you’re logged in.

5. Verify that Github and GitHub CLI is set up for Git-Mastery:

gitmastery check github

6. [Optional, Recommended] Ask Git-Mastery to switch on the 'progress sync' feature.

# cd into the gitmastery-exercises folder first
gitmastery progress sync on

What happens when you switch on the Git-Mastery 'progress sync' feature?

Your Git-Mastery exercises progress will be backed up to your GitHub account. If you wipe out your local progress data by mistake, the remote copy will still be preserved.
Git-Mastery will create a repo in your GitHub account, to back up your progress data. This repo will be publicly visible.

T2L3. Creating a Repo on GitHub

The first step of backing up a local repo on GitHub: create an empty repository on GitHub.

This lesson covers that part.

You can create a remote repository based on an existing local repository, to serve as a remote copy of your local repo. For example, suppose you created a local repo and worked with it for a while, but now you want to upload it onto GitHub. The first step is to create an empty repository on GitHub.

HANDS-ON: Creating an empty remote repo

1 Login to your GitHub account and choose to create a new repo.

2 In the next screen, provide a name for your repo. Refer the screenshot below on some guidance on how to provide the required information.

Click Create repository button to create the new repository.

If you enable any of the three Add _____ options shown above, GitHub will not only create a repo, but will also initialise it with some initial content. That is not what we want here. To create an empty remote repo, keep those options disabled.

3 Note the URL of the repo. It will be of the form
https://github.com/{your_user_name}/{repo_name}.git.
e.g., https://github.com/johndoe/foobar.git (note the .git at the end)

done!

EXERCISE: remote-control

T2L4. Linking a Local Repo With a Remote Repo

The second step of backing up a local repo on GitHub: link the local repo with the remote repo on GitHub.

This lesson covers that part.

A Git remote is a reference to a repository hosted elsewhere, usually on a server like GitHub, GitLab, or Bitbucket. It allows your local Git repo to communicate with another remote copy — for example, to upload locally-created commits that are missing in the remote copy.

By adding a remote, you are informing the local repo details of a remote repo it can communicate with, for example, where the repo exists and what name to use to refer to the remote.

The URL you use to connect to a remote repo depends on the protocol — HTTPS or SSH:

HTTPS URLs use the standard web protocol and start with https://github.com/ (for GitHub users). e.g.,

https://github.com/username/repo-name.git

SSH URLs use the secure shell protocol and start with git@github.com:. e.g.,

git@github.com:username/repo-name.git

A Git repo can have multiple remotes. You simply need to specify different names for each remote (e.g., upstream, central, production, other-backup ...).

HANDS-ON: Add a remote to a repo

Add the empty remote repo you created on GitHub as a remote of a local repo you have.

CLI

1 In a terminal, navigate to the folder containing the local repo things you created earlier.

2 List the current list of remotes using the git remote -v command, for a sanity check. No output is expected if there are no remotes yet.

3 Add a new remote repo using the git remote add <remote-name> <remote-url> command.

Format of the <remote-url>:

https://github.com/<owner>/<repo>.git  # using HTTPS
git@github.com:<owner>/<repo>.git  # using SSH

The full command:

git remote add origin https://github.com/JohnDoe/things.git  # using HTTPS
git remote add origin git@github.com:JohnDoe/things.git  # using SSH

4 List the remotes again to verify the new remote was added.

git remote -v

⤷

origin  https://github.com/johndoe/things.git (fetch)
origin  https://github.com/johndoe/things.git (push)

The same remote will be listed twice, to show that you can do two operations (fetch and push) using this remote. You can ignore that for now. The important thing is the remote you added is being listed.

Sourcetree

1 Open the local repo in Sourcetree.

2 Open the dialog for adding a remote, as follows:

Choose Repository → Repository Settings menu option.
Choose Repository → Repository Settings... → Choose Remotes tab.

3 Add a new remote to the repo with the following values.

Remote name: the name you want to assign to the remote repo i.e., origin

URL/path: the URL of your remote repo

https://github.com/<owner>/<repo>.git  # using HTTPS
git@github.com:<owner>/<repo>.git  # using SSH

e.g.,

https://github.com/JohnDoe/things.git  # using HTTPS
git@github.com:JohnDoe/things.git  # using SSH

Username: your GitHub username

4 Verify the remote was added by going to Repository → Repository Settings again.

5 Add another remote, to verify that a repo can have multiple remotes. You can use any name (e.g., backup and any URL for this).

done!

To find the URL of a repo on GitHub, you can click on the Code button:

EXERCISE: link-me

DETOUR: Managing Details of a Remote

To change the URL of a remote (e.g., origin), use git remote set-url <remote-name> <new-url> e.g.,

git remote set-url origin https://github.com/user/repo.git

To rename a remote, use git remote rename <old-name> <new-name> e.g.,

git remote rename origin upstream

To delete a remote from your Git repository, use git remote remove <remote-name> e.g.,

git remote remove origin

To check the current remotes and their URLs, use:

git remote -v

T2L5. Updating the Remote Repo

The third step of backing up a local repo on GitHub: push a copy of the local repo to the remote repo.

This lesson covers that part.

You can push content of one repository to another, usually from your local repo to a remote repo. Pushing transfers recorded Git history (such as past commits), but it does not transfer unstaged changes or untracked files.

To push, you need to have to the remote repo.
Pushing is performed one branch at a time; you must specify which branch you want to push.

You can configure Git to track a pairing between a local branch and a remote branch, so in future you can push from the same local branch to the corresponding remote branch without needing to specify them again. For example, you can set your local master branch to track the master branch on the remote repo origin i.e., local master branch will track the branch origin/master.

C3 master←HEAD origin/master

In the revision graph above, you see a new type of ref ( origin/master). This is a remote-tracking branch ref that represents the state of a corresponding branch in a remote repository (if you previously set up the branch to 'track' a remote branch). In this example, the master branch in the remote origin is also at the commit C3 (which means you have not created new commits after you pushed to the remote).

If you now create a new commit C4, the state of the revision graph will be as follows:

C4 master←HEAD

C3 origin/master

Explanation: When you create C4, the current branch master moves to C4, and HEAD moves along with it. However, the master branch in the remote origin remains at C3 (because you have not pushed C4 yet). That is, the remote-tracking branch origin/master is one commit behind the local branch master (or, the local branch is one commit ahead). The origin/master ref will move to C4 only after you push your local branch to the remote again.

HANDS-ON: Pushing a local repo to an empty remote repo

Preparation Use a local repo that is connected to an empty remote repo e.g., the things repo from previous hands-on practicals:

1 Push the master branch to the remote. Also instruct Git to track this branch pair.

CLI

Use the git push -u <remote-repo-name> <local-branch-name> to push the commits to a remote repository.

git push -u origin master

Explanation:

push: the Git sub-command that pushes the current local repo content to a remote repo
origin: name of the remote
master: branch to push
-u (or --set-upstream): the flag that tells Git to track that this local master is tracking origin/master branch

Sourcetree

Click the Push button on the buttons ribbon at the top.

In the next dialog, ensure the settings are as follows, ensure the Track option is selected, and click the Push button on the dialog.

2 Observe the remote-tracking branch origin/master is now pointing at the same commit as the master branch.

CLI

Use the git log --oneline --graph to see the revision graph.

* f761ea6 (HEAD -> master, origin/master) Add colours.txt, shapes.txt
* 2bedace Add figs to fruits.txt
* d5f91de Add fruits.txt

Sourcetree

Click the History to see the revision graph.

In some versions of Sourcetree, the HEAD ref may not be shown -- it is implied that the HEAD ref is pointing to the same commit the currently active branch ref is pointing.
If the remote-tracking branch ref (e.g., origin/master) is not showing up, you may need to enable the Show Remote Branches option.

done!

The push command can be used repeatedly to send further updates to another repo e.g., to update the remote with commits you created since you pushed the first time.

HANDS-ON: Pushing to send further updates to a repo

Target Add a commit to the same local repo, and push it to the remote repo.

1 Commit some changes in your local repo. Example:

echo "Elderberries" >> fruits.txt
git commit -am "Update fruits list"

CLI

Use the git commit command to create commits, as you did before.

Optionally, you can run the git status command, which should confirm that your local branch is 'ahead' by one commit (i.e., the local branch has commits that are not present in the corresponding branch in the remote repo).

git status

⤷

On branch master
Your branch is ahead of 'origin/master' by 1 commit.
  (use "git push" to publish your local commits)

nothing to commit, working tree clean

You can also use the git log --oneline --graph command to see where the branch refs are. Note how the remote-tracking branch origin/master is one commit behind the local master.

e60deae (HEAD -> master) Update fruits list
f761ea6 (origin/master) Add colours.txt, shapes.txt
2bedace Add figs to fruits.txt
d5f91de Add fruits.txt

Sourcetree

Create commits as you did before.

Before pushing the new commit, Sourcetree will indicate that your local branch is 'ahead' by one commit (i.e., the local branch has one new commit that is not in the corresponding branch in the remote repo).

2 Push the new commits to your fork on GitHub.

CLI

To push the newer commit(s) in the current branch master to the remote origin, you can use any of the following commands:

git push origin master
git push origin
→ Git will assume you are pushing the current branch (e.g., master) even if you don't specify it.
git push
→ Git will assume you are pushing the current branch (e.g., master). Due to tracking you've set up earlier, Git will assume that you want to push it to the matching branch on origin.

After pushing, the revision graph should look something like the following (note how both local and remote-tracking branch refs are pointing to the same commit again).

e60deae (HEAD -> master, origin/master) Update fruits list
f761ea6 Add colours.txt, shapes.txt
2bedace Add figs to fruits.txt
d5f91de Add fruits.txt

Sourcetree

To push, click the Push button on the top buttons ribbon, ensure the settings are as follows in the next dialog, and click the Push button on the dialog.

After pushing the new commit to the remote, the remote-tracking branch ref should move to the new commit:

done!

Note that you can push between two repos only if those repos have a shared history among them (i.e., one should have been created by copying the other).

EXERCISE: push-over

DETOUR: Pushing to Multiple Repos

You can push to any number of repos, as long as the target repos and your repo have a shared history.

Add the GitHub repo URL as a remote while giving a suitable name (e.g., upstream, central, production, backup ...), if you haven't done so already.
Push to the target repo -- remember to select the correct target repo when you do.

CLI

e.g., git push backup master

Sourcetree

T2L6. Omitting Files from Revision Control

Git allows you to specify which files should be omitted from revision control.

This lesson covers that part.

You can specify which files Git should ignore from revision control. While you can always omit files from revision control simply by not staging them, having an 'ignore-list' is more convenient, especially if there are files inside the working folder that are not suitable for revision control (e.g., temporary log files) or files you want to prevent from accidentally including in a commit (files containing confidential information).

A repo-specific ignore-list of files can be specified in a .gitignore file, stored in the root of the repo folder.

The .gitignore file itself can be either revision controlled or ignored.

To version control it (the more common choice – which allows you to track how the .gitignore file changes over time), simply commit it as you would commit any other file.
To ignore it, simply add its name to the .gitignore file itself.

The .gitignore file supports file patterns e.g., adding temp/*.tmp to the .gitignore file prevents Git from tracking any .tmp files in the temp directory.

SIDEBAR: .gitignore File Syntax

Blank lines: Ignored and can be used for spacing.

Comments: Begin with # (lines starting with # are ignored).

 # This is a comment

Write the name or pattern of files/directories to ignore.

log.txt          # Ignores a file named log.txt

Wildcards:

* matches any number of characters, except / (i.e., for matching a string within a single directory level):

abc/*.tmp     # Ignores all .tmp files in abc directory

** matches any number of characters (including /)

**/foo.tmp    # Ignores all foo.tmp files in any directory

? matches a single character

config?.yml   # Ignores config1.yml, configA.yml, etc.

[abc] matches a single character (a, b, or c)

file[123].txt # Ignores file1.txt, file2.txt, file3.txt

Directories:

Add a trailing / to match directories.

logs/         # Ignores the logs directory

Patterns without / match files/folders recursively.

*.bak         # Ignores all .bak files anywhere

Patterns with / are relative to the .gitignore location.

/secret.txt   # Only ignores secret.txt in the root directory

Negation: Use ! at the start of a line to not ignore something.

*.log           # Ignores all .log files
!important.log  # Except important.log

Example:

.gitignore
# Ignore all log files
*.log

# Ignore node_modules folder
node_modules/

# Don’t ignore main.log
!main.log

HANDS-ON: Adding a file to the ignore-list

1 Add a file into your repo's working folder that you presumably do not want to revision-control e.g., a file named temp.txt. Observe how Git has detected the new file.
Add a few other files with .tmp extension.

2 Configure Git to ignore those files:

CLI

Create a file named .gitignore in the working directory root and add the text temp.txt into it.

echo "temp.txt" >> .gitignore

⤷

.gitignore

temp.txt

Observe how temp.txt is no longer detected as 'untracked' by running the git status command (but now it will detect the .gitignore file as 'untracked'.

Update the .gitignore file as follows:

.gitignore

temp.txt
*.tmp

Observe how .tmp files are no longer detected as 'untracked' by running the git status command.

Sourcetree

The file should be currently listed under Unstaged files. Right-click it and choose Ignore.... Choose Ignore exact filename(s) and click OK.
Also take note of other options available e.g., Ignore all files with this extension etc. They may be useful in future.

Note how the temp.text is no longer listed under Unstaged files. Observe that a file named .gitignore has been created in the working directory root and has the following line in it. This new file is now listed under Unstaged files.

.gitignore

temp.txt

Right-click on any of the .tmp files you added, and choose Ignore... as you did previously. This time, choose the option Ignore files with this extension.

Note how .temp files are no longer shown as unstaged files, and the .gitignore file has been updated as given below:

.gitignore

temp.txt
*.tmp

3 Optionally, stage and commit the .gitignore file.

done!

Files recommended to be omitted from version control

Binary files generated when building your project e.g., *.class, *.jar, *.exe
Reasons:
1. no need to version control these files as they can be generated again from the source code
2. Revision control systems are optimized for tracking text-based files, not binary files.
Temporary files e.g., log files generated while testing the product
Local files i.e., files specific to your own computer e.g., local settings of your IDE (.idea/)
Sensitive content i.e., files containing sensitive/personal information e.g., credential files, personal identification data (especially if there is a possibility of those files getting leaked via the revision control system).

EXERCISE: ignoring-somethings

DETOUR: Ignoring Previously-Tracked Files

Adding a file to the .gitignore file is not enough if the file was already being tracked by Git in previous commits. In such cases, you need to do both of the following:

Untrack the file (i.e., remove the file from the staging area and stop tracking it in future), using the git rm --cached <file(s)> command.

git rm --cached data/ic.txt

Add it to the .gitignore file, as usual.

At this point: You should now be able to create a copy of your repo on GitHub, and keep it updated as you add more commits to your local repo. If something goes wrong with your local repo (e.g., disk crash), you can now recover the repo using the remote repo (this tour did not cover how exactly you can do that -- it will be covered in a future tour).

What's next: Tour 3: Working Off a Remote Repo

W2.3b

Git Learning Trail → Tour 3: Working Off a Remote Repo

Tour 3: Working Off a Remote Repo

Target Usage: To work with an existing remote repository.

Motivation: Often, you will need to start with an existing remote repository. In such cases, you may have to create your own copies of that repository, and keep those copies updated when more changes appear in the remote repository.

Lesson plan:

GitHub allows you to create a remote copy of another remote repo, called forking.

T3L1. Duplicating a Remote Repo on the Cloud covers that part.

The next step is to create a local copy of the remote repo, by cloning the remote repo.

T3L2. Creating a Local Copy of a Repo covers that part.

When there are new changes in the remote, you need to pull those changes down to your local repo.

T3L3. Downloading Data Into a Local Repo covers that part.

T3L1. Duplicating a Remote Repo on the Cloud

GitHub allows you to create a remote copy of another remote repo, called forking.

This lesson covers that part.

A fork is a copy of a remote repository created on the same hosting service such as GitHub, GitLab, or Bitbucket. On GitHub, you can fork a repository from another user or organisation into your own space (i.e., your user account or an organisation you have sufficient access to). Forking is particularly useful if you want to experiment with a repo but don’t have write permissions to the original -- you can fork it and work on your own remote copy without affecting the original repository.

HANDS-ON: Forking a repo on GitHub

Preparation Create a GitHub account if you don't have one yet.

1 Go to the GitHub repo you want to fork e.g., samplerepo-things

2 Click on the button in the top-right corner. In the next step,

choose to fork to your own account or to another GitHub organization that you are an admin of.
un-tick the [ ] Copy the master branch only option, so that you get copies of other branches (if any) in the repo.

done!

Forking is not a Git feature, but a feature provided by hosted Git services like GitHub, GitLab, or Bitbucket.

GitHub does not allow you to fork the same repo more than once to the same destination. If you want to re-fork, you need to delete the previous fork.

EXERCISE: fork-repo

T3L2. Creating a Local Copy of a Repo

The next step is to create a local copy of the remote repo, by cloning the remote repo.

This lesson covers that part.

You can clone a repository to create a full copy of it on your computer. This copy includes the entire revision history, branches, and files of the original, so it behaves just like the original repository. For example, you can clone a repository from a hosting service like GitHub to your computer, giving you a complete local version to work with.

Cloning a repo automatically creates a remote named origin which points to the repo you cloned from.

The repo you cloned from is often referred to as the upstream repo.

HANDS-ON: Cloning a remote repo

1 Clone the remote repo to your computer. For example, you can clone the samplerepo-things repo, or the fork you created from it in a previous lesson.

Note that the URL of the GitHub project is different from the URL you need to clone a repo in that GitHub project. e.g.

https://github.com/se-edu/samplerepo-things  # GitHub project URL
https://github.com/se-edu/samplerepo-things.git # the repo URL

CLI

You can use the git clone <repository-url> [directory-name] command to clone a repo.

<repository-url>: The URL of the remote repository you want to copy.
[directory-name] (optional): The name of the folder where you want the repository to be cloned. If you omit this, Git will create a folder with the same name as the repository.

git clone https://github.com/se-edu/samplerepo-things.git  # if using HTTPS
git clone git@github.com:se-edu/samplerepo-things.git  # if using SSH

git clone https://github.com/foo/bar.git my-bar-copy  # also specifies a dir to use

For exact steps for cloning a repo from GitHub, refer to this GitHub document.

Sourcetree

Windows

File → Clone / New ... and provide the URL of the repo and the destination directory.

Mac

File → New ... → Choose as shown below → Provide the URL of the repo and the destination directory in the next dialog.

2 Verify the clone has a remote named origin pointing to the upstream repo.

CLI

Use the git remote -v command that you learned earlier.

Sourcetree

Choose Repository → Repository Settings menu option.

done!

EXERCISE: clone-repo

T3L3. Downloading Data Into a Local Repo

When there are new changes in the remote, you need to pull those changes down to your local repo.

This lesson covers that part.

There are two steps to bringing over changes from a remote repository into a local repository: fetch and merge.

Fetch is the act of downloading the latest changes from the remote repository, but without applying them to your current branch yet. It updates metadata in your repo so that it knows what has changed in the remote repo, but your own local branch remains untouched.
Merge is what you do after fetching, to actually incorporate the fetched changes into your local branch. It combines your local branch with the changes from the corresponding branch from the remote repo.

HANDS-ON: Fetch and merge from a remote

1 Clone the repo se-edu/samplerepo-finances. It has 3 commits. Your clone now has a remote origin pointing to the remote repo you cloned from.

2 Change the remote origin to point to samplerepo-finances-2. This remote repo is a copy of the one you cloned, but it has two extra commits.

CLI

git remote set-url origin https://github.com/se-edu/samplerepo-finances-2.git

Sourcetree

Go to Repository → Repository settings ... to update remotes.

3 Verify the local repo is unaware of the extra commits in the remote.

CLI

git status

⤷

On branch master
Your branch is up to date with 'origin/master'.

nothing to commit, working tree clean

Sourcetree

The revision graph should look like the below:

If it looks like the below, it is possible that Sourcetree is auto-fetching data from the repo periodically.

4 Fetch from the new remote.

CLI

Use the git fetch <remote> command to fetch changes from a remote. If the <remote> is not specified, the default remote origin will be used.

git fetch origin

⤷

remote: Enumerating objects: 8, done.
... # more output ...
   afbe966..cc6a151  master     -> origin/master
 * [new tag]         beta       -> beta

Sourcetree

Click on the Fetch button on the top menu:

5 Verify the fetch worked i.e., the local repo is now aware of the two missing commits. Also observe how the local branch ref of the master branch, the staging area, and the working directory remain unchanged after the fetch.

CLI

Use the git status command to confirm the repo now knows that it is behind the remote repo.

git status

⤷

On branch master
Your branch is behind 'origin/master' by 2 commits, and can be fast-forwarded.
  (use "git pull" to update your local branch)

nothing to commit, working tree clean

Sourcetree

Now, the revision graph should look something like the below. Note how the origin/master ref is now two commits ahead of the master ref.

6 Merge the fetched changes.

CLI

Use the git merge <remote-tracking-branch> command to merge the fetched changes. Check the status and the revision graph to verify that the branch tip has now moved by two more commits.

git merge origin/master

git status
git log --oneline --decorate

Sourcetree

To merge the fetched changes, right-click on the latest commit on origin/remote branch and choose Merge.

In the next dialog, choose as follows:

The final result should be something like the below (same as the repo state before we started this hands-on practical):

Note that merging the fetched changes can get complicated if there are multiple branches or the commits in the local repo conflict with commits in the remote repo. We will address them when we learn more about Git branches, in a later lesson.

done!

Pull is a shortcut that combines fetch and merge — it fetches the latest changes from the remote and immediately merges them into your current branch. In practice, Git users typically use the pull instead of the fetch-then-merge.

pull = fetch + merge

HANDS-ON: Pull from a remote

1 Similar to the previous hands-on practical, clone the repo se-edu/samplerepo-finances (to a new location).
Change the remote origin to point to samplerepo-finances-2.

2 Pull the newer commits from the remote, instead of a fetch-then-merge.

CLI

Use the git pull <remote> <branch> command to pull changes.

git pull origin master

The following works too. If the <remote> and <branch> are not specified, Git will pull to the current branch from the remote branch it is tracking.

git pull

Sourcetree

Click on the Pull button on the top menu:

In the next dialog, choose as follows:

3 Verify the outcome is same as the fetch + merge steps you did in the previous hands-on practical.

done!

You can pull from any number of remote repos, provided the repos involved have a shared history. This can be useful when the upstream repo you forked from has some new commits that you wish to bring over to your copies of the repo (i.e., your fork and your local repo).

HANDS-ON: Sync your repos with the upstream repo

Preparation Fork se-edu/samplerepo-finances to your GitHub account.
Clone your fork to your computer.
Now, let's pretend that there are some new commits in upstream repo that you would like to bring over to your fork, and your local repo. Here are the steps:

1 Add the upstream repo se-edu/samplerepo-finances as remote named upstream in your local repo.

Adding remotes was covered in Lesson T2L4. Linking a Local Repo With a Remote Repo

2 Pull from the upstream repo. If there are new commits (in this case, there will be none), those will come over to your local repo. For example:

git pull upstream master

3 Push to your fork. Any new commits you pulled from the upstream repo will now appear in your fork as well. For example:

git push origin master

The method given above is the more 'standard' method of synchronising a fork with the upstream repo. In addition, platforms such as GitHub can provide other ways (example: GitHub's Sync fork feature).

4 For good measure, let's pull from another repo.

Add the upstream repo se-edu/samplerepo-finances-2 as remote named other-upstream in your local repo.
Pull from it to your local repo; this will bring some new commits.
Now, you can push those new commits to your fork.

git remote add other-upstream https://github.com/se-edu/samplerepo-finances-2.git
git pull other-upstream master
git push origin master

done!

EXERCISE: fetch-and-pull

DETOUR: Pulling from Multiple Remotes

You can pull from any number of repos, provided the repos involved have a shared history.

Add the GitHub repo URL as a remote while giving a suitable name (e.g., upstream, central, production, backup ...), if you haven't done so already.
Pull (or fetch) from the remote repo -- remember to select the correct remote repo when you do.

CLI

e.g., git pull backup master

Sourcetree

Similar to before, but remember to choose the intended remote to pull from.

At this point: Now you can create your own remote and local copies of any repo on GitHub, and update your copy when there are new changes in the upstream repo.

What's next: Tour 4: Using the Revision History of a Repo

[W2.4] RCS: Doing More with Revision History

W2.4a

Git Learning Trail → Tour 4: Using the Revision History of a Repo

Tour 4: Using the Revision History of a Repo

Target Usage: To make use of the revision history stored by Git.

Motivation: Having put in effort to record the revision history of the working folder, it only makes sense that we use the revision history to our benefit. For example, to be able to answer questions such as "What did I change in this file since last Monday?"

Lesson plan:

It is useful to be able to see what changes were included in a specific commit.

T4L1. Examining a Commit covers that part.

When working with many commits, it helps to tag specific commits with custom names so they’re easier to refer to later.

T4L2. Tagging Commits covers that part.

Git can tell you the net effect of changes between two points of history.

T4L3. Comparing Points of History covers that part.

Another useful feature of revision control is to be able to view the working directory as it was at a specific point in history, by checking out a commit created at that point.

T4L4. Traversing to a Specific Commit covers that part.

Git can also reset the revision history to a specific point so that you can start over from that point.

T4L5. Rewriting History to Start Over covers that part.

Git can add a new commit to reverse the changes done in a specific past commit, called reverting a commit.

T4L6. Reverting a Specific Commit covers that part.

T4L1. Examining a Commit

It is useful to be able to see what changes were included in a specific commit.

This lesson covers that part.

When you examine a commit, normally what you see is the 'changes made since the previous commit'. This does not mean that a Git commit contains only the changes made since the previous commit. As you recall, a Git commit contains a full snapshot of the working directory. However, tools used to examine commits typically show only the changes, as that is the more informative part.

Git shows changes included in a commit by dynamically calculating the difference between the snapshots stored in the target commit and the parent commit. This is because Git commits store snapshots of the working directory, not changes themselves.

Although each commit represents a copy of the entire working directory, Git uses space efficiently in two main ways:

Reuse of unchanged data: If a file hasn’t changed since a previous commit, the commit simply points to the already stored version of that file instead of making another copy. This means only new or changed files take up extra space, while unchanged files are reused.
Compression: Git also compresses all the files and data it stores using an algorithm (zlib). So, even the objects that are stored (whether reused or new) take up less disk space because they are saved in a compressed format.

To address a specific commit, you can use its SHA (e.g., e60deaeb2964bf2ebc907b7416efc890c9d4914b). In fact, just the first few characters of the SHA is enough to uniquely address a commit (e.g., e60deae), provided the partial SHA is long enough to uniquely identify the commit (i.e., only one commit has that partial SHA).
Naturally, a commit can be addressed using any ref pointing to it too (e.g., HEAD, master).
Another related technique is to use the <ref>~<n> notation (e.g., HEAD~1) to address the commit that is n commits prior to the commit pointed by <ref> i.e., "start with the commit pointed by <ref> and go back n commits".
A related alternative notation is HEAD~, HEAD~~, HEAD~~~, ... to mean HEAD~1, HEAD~2, HEAD~3 etc.

C3 master ←HEADThis commit can be addressed as HEAD or master

C2Can be addressed as HEAD~1 or master~1 or HEAD~ or master~

C1Can be addressed as HEAD~2 or master~2

Git uses the diff format to show file changes in a commit. The diff format was originally developed for Unix. It was later extended with headers and metadata to show changes between file versions and commits. Here is an example diff showing the changes to a file.

diff --git a/fruits.txt b/fruits.txt
index 7d0a594..f84d1c9 100644
--- a/fruits.txt
+++ b/fruits.txt
@@ -1,6 +1,6 @@
-apples
+apples, apricots
 bananas
 cherries
 dragon fruits
-elderberries
 figs
@@ -20,2 +20,3 @@
 oranges
+pears
 raisins
diff --git a/colours.txt b/colours.txt
new file mode 100644
index 0000000..55c8449
--- /dev/null
+++ b/colours.txt
@@ -0,0 +1 @@
+a file for colours

A Git diff can consist of multiple file diffs, one for each changed file. Each file diff can contain one or more hunk i.e., a localised group of changes within the file — including lines added, removed, or left unchanged (included for context).

Given below is how the above diff is divided into its components:

All changes in the commit:

File diff for fruits.txt:

diff --git a/fruits.txt b/fruits.txt
index 7d0a594..f84d1c9 100644
--- a/fruits.txt
+++ b/fruits.txt

Hunk 1:

@@ -1,6 +1,6 @@
-apples
+apples, apricots
 bananas
 cherries
 dragon fruits
-elderberries
 figs

Hunk 2:

@@ -20,2 +20,3 @@
 oranges
+pears
 raisins

File diff for colours.txt:

diff --git a/colours.txt b/colours.txt
new file mode 100644
index 0000000..55c8449
--- /dev/null
+++ b/colours.txt

Hunk 1:

@@ -0,0 +1 @@
+a file for colours

Here is an explanation of the diff:

Part of Diff	Explanation
`diff --git a/fruits.txt b/fruits.txt`	The diff header, indicating that it is comparing the file `fruits.txt` between two versions: the old (`a/`) and new (`b/`).
`index 7d0a594..f84d1c9 100644`	Shows the before and after the change, and the file mode (`100` means a regular file, `644` are file permission indicators).
`--- a/fruits.txt` `+++ b/fruits.txt`	Marks the old version of the file (`a/fruits.txt`) and the new version of the file (`b/fruits.txt`).
`@@ -1,6 +1,6 @@`	This hunk header shows that lines 1-6 (i.e., starting at line `1`, showing `6` lines) in the old file were compared with lines 1–6 in the new file.
`-apples` `+apples, apricots`	Removed line `apples` and added line `apples, apricots`.
`bananas` `cherries` `dragon fruits`	Unchanged lines, shown for context.
`-elderberries`	Removed line: `elderberries`.
`figs`	Unchanged line, shown for context.
`@@ -20,2 +20,3 @@`	Hunk header showing that lines 20-21 in the old file were compared with lines 20–22 in the new file.
`oranges` `+pears` `raisins`	Unchanged line. Added line: `pears`. Unchanged line.
`diff --git a/colours.txt b/colours.txt`	The usual diff header, indicates that Git is comparing two versions of the file `colours.txt`: one before and one after the change.
`new file mode 100644`	This is a new file being added. `100644` means it’s a normal, non-executable file with standard read/write permissions.
`index 0000000..55c8449`	The usual SHA hashes for the two versions of the file. `0000000` indicates the file did not exist before.
`--- /dev/null` `+++ b/colours.txt`	Refers to the "old" version of the file (`/dev/null` means it didn’t exist before), and the new version.
`@@ -0,0 +1 @@`	Hunk header, saying: “0 lines in the old file were replaced with 1 line in the new file, starting at line 1.”
`+a file for colours`	Added line

Points to note:

+ indicates a line being added.
- indicates a line being deleted.
Editing a line is seen as deleting the original line and adding the new line.

HANDS-ON: View specific commits

TargetView contents of specific commits in a repo.

Preparation You can use any repo that has commits e.g., the things repo.

CLI

1 Locate the commits to view, using the revision graph.

git log --oneline --decorate

⤷

 e60deae (HEAD -> master, origin/master) Update fruits list
 f761ea6 Add colours.txt, shapes.txt
 2bedace Add figs to fruits.txt
 d5f91de Add fruits.txt

2 Use the git show command to view specific commits.

git show  # shows the latest commit

⤷

commit e60deaeb2964bf2ebc907b7416efc890c9d4914b (HEAD -> master, origin/master)
Author: damithc <...@...>
Date:   Sat Jun ...

    Update fruits list

diff --git a/fruits.txt b/fruits.txt
index 7d0a594..6d502c3 100644
--- a/fruits.txt
+++ b/fruits.txt
@@ -1,6 +1,6 @@
-apples
+apples, apricots
 bananas
+blueberries
 cherries
 dragon fruits
-elderberries
 figs

To view the parent commit of the latest commit, you can use any of these commands:

git show HEAD~1
git show master~1
git show e60deae  # first few characters of the SHA
git show e60deae.....  # run git log to find the full SHA and specify the full SHA

To view the commit that is two commits before the latest commit, you can use git show HEAD~2 etc.

Sourcetree

Click on the commit. The remaining panels (indicated in the image below) will be populated with the details of the commit.

done!

PRO-TIP: Use Git Aliases to Work Faster

The Git alias feature allows you to create custom shortcuts for frequently used Git commands. This saves time and reduces typing, especially for long or complex commands. Once an alias is defined, you can use the alias just like any other Git command e.g., use git lodg as an alias for git log --oneline --decorate --graph.

To define a global git alias, you can use the git config --global alias.<alias> "<command>" command. e.g.,

git config --global alias.lodg "log --oneline --graph --decorate"

You can also create shell-level aliases using your shell configuration (e.g., .bashrc, .zshrc) to make even shorter aliases. This lets you create shortcuts for any command, including Git commands, and even combine them with other tools. e.g., instead of the Git alias git lodg, you can define a shorter shell-level alias glodg.

Windows + Git-Bash

1. Locate your .bash_profile file (likely to be in : C:\Users\<YourName>\.bash_profile -- if it doesn’t exist, create it.)

Windows + WSL (Ubuntu or other Linux distro)

1. Locate your shell's config file e.g., .bashrc or .zshrc (likely to be in your ~ folder)

MacOS | Linux

1. Locate your shell's config file e.g., .bashrc or .zshrc (likely to be in your ~ folder)

Oh-My-Zsh for Zsh terminal supports a Git plugin that adds a wide array of Git command aliases to your terminal.

2. Add aliases to that file:

alias gs='git status'
alias glod='git log --oneline --graph --decorate'

3. Apply changes by running the command source ~/.zshrc or source ~/.bash_profile or source ~/.bashrc, depending on which file you put the aliases in.

T4L2. Tagging Commits

When working with many commits, it helps to tag specific commits with custom names so they’re easier to refer to later.

This lesson covers that part.

Git lets you tag commits with names, making them easy to reference later. This is useful when you want to mark specific commits -- such as releases or key milestones (e.g., v1.0 or v2.1). Using tags to refer to commits is much more convenient than using SHA hashes. In the diagram below, v1.0 and interim are tags.

C3 master←HEAD interimUpdate list

C2 v1.0Populate list

C1Add empty list

A tag stays fixed to the commit. Unlike branch refs or HEAD, tags do not move automatically as new commits are made. As you see below, after adding a new commit, tags stay in the previous commits while master←HEAD has moved to the new commit.

C4 master←HEADTrim the list

C3 interimUpdate list

C2 v1.0Populate list

C1Add empty list

Git supports two kinds of tags:

A lightweight tag is just a ref that points directly to a commit, like a branch that doesn’t move.
An annotated tag is a full Git object that stores a reference to a commit along with metadata such as the tagger’s name, date, and a message.

Annotated tags are generally preferred for versioning and public releases, while lightweight tags are often used for less formal purposes, such as marking a commit for your own reference.

HANDS-ON: Adding tags

Target Add a few tags to a repository.

Preparation Fork and clone the samplerepo-preferences. Use the cloned repo on your computer for the following steps.

CLI

1 Add a lightweight tag to the current commit as v1.0:

git tag v1.0

2 Verify the tag was added. To view tags:

git tag

⤷

v1.0

To view tags in the context of the revision graph:

git log --oneline --decorate

⤷

507bb74 (HEAD -> master, tag: v1.0, origin/master, origin/HEAD) Add donuts
de97f08 Add cake
5e6733a Add bananas
3398df7 Add food.txt

3 Use the tag to refer to the commit e.g., git show v1.0 should show the changes in the tagged commit.

4 Add an annotated tag to an earlier commit. The example below adds a tag v0.9 to the commit HEAD~2 with the message First beta release. The -a switch tells Git this is an annotated tag.

git tag -a v0.9  HEAD~2 -m "First beta release"

5 Check the new annotated tag. While both types of tags appear similarly in the revision graph, the show command on an annotated tag will show the details of the tag and the details of the commit it points to.

git show v0.9

⤷

tag v0.9
Tagger: ... <...@...>
Date:   Sun Jun ...

First beta release

commit ....999087124af... (tag: v0.9)
Author: ... <...@...>
Date:   Sat Jun ...

    Add figs to fruits.txt

diff --git a/fruits.txt b/fruits.txt
index a8a0a01..7d0a594 100644
# rest of the diff goes here

Sourcetree

Right-click on the commit (in the graphical revision graph) you want to tag and choose Tag….

Specify the tag name e.g., v1.0 and click Add Tag.

Configure tag properties in the next dialog and press Add. For example, you can choose whether to make it a lightweight tag or an annotated tag (default).

Tags will appear as labels in the revision graph, as seen below. To see the details of an annotated tag, you need to use the menu indicated in the screenshot.

done!

If you need to change what a tag points to, you must delete the old one and create a new tag with the same name. This is because tags are designed to be fixed references to a specific commit, and there is no built-in mechanism to 'move' a tag.

HANDS-ON: Deleting/moving tags

Preparation Continue with the same repo you used for the previous hands-on practical.

Move the v1.0 tag to the commit HEAD~1, by deleting it first and creating it again at the destination commit.

CLI

Delete the previous v1.0 tag by using the -d . Add it again to the other commit, as before.

git tag -d v1.0
git tag v1.0 HEAD~1

Sourcetree

The same dialog used to add a tag can be used to delete and even move a tag. Note that the 'moving' here translates to deleting and re-adding behind the scene.

done!

Tags are different from commit messages, in purpose and in form. A commit message is a description of the commit that is part of the commit itself. A tag is a short name for a commit, which you can use to address a commit.

Pushing commits to a remote does not push tags automatically. You need to push tags specifically.

HANDS-ON: Pushing tags to a remote

Target Push tags you created earlier to the remote.

Preparation Continue with the same repo you used for the previous hands-on practical.

You can go to your remote on GitHub link https://github.com/{USER}/{REPO}/tags (e.g., https://github.com/johndoe/samplerepo-preferences/tags) to verify the tag is present there.

Note how GitHub assumes these tags are meant as releases, and automatically provides zip and tar.gz archives of the repo (as at that tag).

CLI

1 Push a specific tag in the local repo to the remote (e.g., v1.0) using the git push <remote> <tag-name> command.

git push origin v1.0

In addition to verifying the tag's presence via GitHub, you can also use the following command to list the tags presently in the remote.

git ls-remote --tags origin

2 Delete a tag in the remote, using the git push --delete <remote> <tag-name> command.

git push --delete origin v1.0

3 Push all tags to the remote repo, using the git push <remote> --tags command.

git push origin --tags

Sourcetree

To push a specific tag, use the following menu:

To push all tags, you can tick the Push all tags option when pushing commits:

done!

T4L3. Comparing Points of History

Git can tell you the net effect of changes between two points of history.

This lesson covers that part.

Git's diff feature can show you what changed between two points in the revision history. Given below are some use cases.

Usage 1: Examining changes in the working directory
Example use case: To verify the next commit will include exactly what you intend it to include.

HANDS-ON: Examining staged and unstaged changes

Preparation For this, you can use the things repo you created earlier. If you don't have it, you can clone a copy of a similar repo given here.

1 Do some changes to the working directory. Stage some (but not all) changes. For example, you can run the following commands.

echo -e "blue\nred\ngreen" >> colours.txt
git add .  # a shortcut to stage all changes
echo "no shapes added yet" >> shapes.txt

2 Examine the staged and unstaged changes.

CLI

The git diff command shows unstaged changes in the working directory (tracked files only). The output of the diff command, is a diff view (introduced in this lesson).

git diff

⤷

diff --git a/shapes.txt b/shapes.txt
index 5c2644b..949c676 100644
--- a/shapes.txt
+++ b/shapes.txt
@@ -1 +1,2 @@
a file for shapes
+no shapes added yet!

The git diff --staged command shows the staged changes (same as git diff --cached).

git diff --staged

Sourcetree

Select the two commits: Click on one commit, and Ctrl-Click (or Cmd-Click) on the second commit. The changes between the two selected commits will appear in the other panels, as shown below:

done!

Usage 2: Comparing two commits at different points of the revision graph
Example use case: Suppose you’re trying to improve the performance of a piece of software by experimenting with different code tweaks. You commit after each change (as you should). After several commits, you now want to review the overall effect of all those changes on the code.

HANDS-ON: Comparing two commits

Target Compare two commits in a repo.

Preparation You can use any repo with multiple commits e.g., the things repo.

CLI

You can use the git diff <commit1> <commit2> command for this.

You may use any valid way to refer to commits (e.g., SHA, tag, HEAD~n etc.).
You may also use the .. notation to specify the commit range too e.g., 0023cdd..fcd6199, HEAD~2..HEAD

git diff v0.9 HEAD

⤷

diff --git a/colours.txt b/colours.txt
new file mode 100644
index 0000000..55c8449
--- /dev/null
+++ b/colours.txt
@@ -0,0 +1 @@
+a file for colours
# rest of the diff ...

Swap the commit order in the command and see what happens.

git diff HEAD v0.9

⤷

diff --git a/colours.txt b/colours.txt
deleted file mode 100644
index 55c8449..0000000
--- a/colours.txt
+++ /dev/null
@@ -1 +0,0 @@
-a file for colours
# rest of the diff ...

As you can see, the diff is directional i.e., diff <commit1> <commit2> shows what changes you need to do to go from the <commit1> to <commit2>. If you swap <commit1> and <commit2>, the output will change accordingly e.g., lines previously shown as 'added' will now be shown as 'deleted'.

Sourcetree

Select the two commits: Click on one commit, and Ctrl-Click (or Cmd-Click) on the second commit. The changes between the two selected commits will appear in the other panels, as shown below:

The same method can be used to compare the current state of the working directory (which might have uncommitted changes) to a point in the history.

done!

Usage 3: Examining changes to a specific file
Example use case: Similar to other use cases but when you are interested in a specific file only.

HANDS-ON: Examining changes to a specific file

Target Examine the changes done to a file between two different points in the version history (including the working directory).

Preparation Use any repo with multiple commits e.g. the things repo.

CLI

Add the -- path/to/file to a previous diff command to narrow the output to a specific file. Some examples:

git diff -- fruits.txt               # unstaged changes to fruits.txt
git diff --staged -- src/main.java   # staged changes to src/main.java
git diff HEAD~2..HEAD -- fruits.txt  # changes to fruits.txt between commits

Sourcetree

Sourcetree UI shows changes to one file at a time by default; just click on the file to view changes to that file. To view changes to multiple files, Ctrl-Click (or Cmd-Click) on multiple files to select them.

done!

T4L4. Traversing to a Specific Commit

Another useful feature of revision control is to be able to view the working directory as it was at a specific point in history, by checking out a commit created at that point.

This lesson covers that part.

Suppose you added a new feature to a software product, and while testing it, you noticed that another feature added two commits ago doesn’t handle a certain edge case correctly. Now you’re wondering: did the new feature break the old one, or was it already broken? Can you go back to the moment you committed the old feature and test it in isolation, and come back to the present after you found the answer? With Git, you can.

To view the working directory at a specific point in history, you can check out a commit created at that point.

When you check out a commit, Git:

Updates your working directory to match the snapshot in that commit, overwriting current files as needed.
Moves the HEAD ref to that commit, marking it as the current state you’re viewing.

C3 master←HEAD

→
[check out commit C2...]

C3 master

C2←HEAD detached head!

Checking out a specific commit puts you in a "detached HEAD" state: i.e., the HEAD no longer points to a branch, but directly to a commit (see the above diagram for an example). This isn't a problem by itself, but any commits you make in this state can be lost, unless certain follow-up actions are taken. It is perfectly fine to be in a detached state if you are only examining the state of the working directory at that commit.

To get out of a "detached HEAD" state, you can simply check out a branch, which "re-attaches" HEAD to the branch you checked out.

C3 master

C2←HEAD detached head!

→
[check out master...]

C3 master←HEAD head re-attached!

HANDS-ON: Checking out some commits

Target Checkout a few commits in a local repo, while examining the working directory to verify that it matches the state when you created the corresponding commit

Preparation Use any repo with commits e.g., the things repo

CLI

1 Examine the revision tree, to get your bearing first.

git log --oneline --decorate

Reminder: You can use aliases to reduce typing Git commands.

⤷

e60deae (HEAD -> master, origin/master) Update fruits list
f761ea6 (tag: v1.0) Add colours.txt, shapes.txt
2bedace (tag: v0.9) Add figs to fruits.txt
d5f91de Add fruits.txt

2 Use the checkout <commit-identifier> command to check out a commit other than the one currently pointed by HEAD. You can use any of the following methods:

git checkout v1.0: checks out the commit tagged v1.0
git checkout 0023cdd: checks out the commit with the hash 0023cdd
git checkout HEAD~2: checks out the commit 2 commits behind the most recent commit.

git checkout HEAD~2

⤷

Note: switching to 'HEAD~2'.

You are in 'detached HEAD' state.
# rest of the warning about the detached head ...

HEAD is now at 2bedace Add figs to fruits.txt

3 Verify HEAD and the working directory have updated as expected.

HEAD should now be pointing at the target commit
The working directory should match the state it was in at that commit (e.g., files added after that commit -- such as shapes.txt should not be in the folder).

git log --oneline --decorate

⤷

2bedace (HEAD, tag: v0.9) Add figs to fruits.txt
d5f91de Add fruits.txt

HEAD is indeed pointing at the target commit.

But note how the output does not show commits you added after the checked-out commit.

The --all switch tells git log to show commits from all refs, not just those reachable from the current HEAD. This includes commits from other branches, tags, and remotes.

git log --oneline --decorate --all

⤷

e60deae (origin/master, master) Update fruits list
f761ea6 (tag: v1.0) Add colours.txt, shapes.txt
2bedace (HEAD, tag: v0.9) Add figs to fruits.txt
d5f91de Add fruits.txt

4 Go back to the latest commit by checking out the master branch again.

git checkout master

Sourcetree

In the revision graph, double-click the commit you want to check out, or right-click on that commit and choose Checkout....

Click OK to the warning about ‘detached HEAD’ (similar to below).

The specified commit is now loaded onto the working folder, as indicated by the HEAD label.

To go back to the latest commit on the master branch, double-click the master branch.

If you check out a commit that comes before the commit in which you added a certain file (e.g., temp.txt) to the .gitignore file, and if the .gitignore file is version controlled as well, Git will now show it under ‘unstaged modifications’ because at Git hasn’t been told to ignore that file yet.

done!

If there are uncommitted changes in the working directory, Git proceeds with a checkout only if it can preserve those changes.

Example 1: There is a new file in the working directory that is not committed yet.
→ Git will proceed with the checkout and will keep the uncommitted file as well.
Example 2: There is an uncommitted change to a file that conflicts with the version of that file in the commit you wish to check out.
→ Git will abort the checkout, and the repo will remain in the current commit.

The Git stash feature temporarily sets aside uncommitted changes you’ve made (in your working directory and staging area), without committing them. This is useful when you’re in the middle of some work, but need to switch to another state (e.g., checkout a previous commit), and your current changes are not yet ready to be committed or discarded. You can later reapply the stashed changes when you’re ready to resume that work.

DETOUR: Stashing Uncommitted Changes Temporarily

For basic usage, you can use the following two commands:

git stash: Stash staged and unstaged changes
git stash pop: Reapplies the latest stashed changes and removes it from the stash list.

RESOURCES

A more detailed explanation of stashing: https://www.atlassian.com/git/tutorials/saving-changes/git-stash
A video explanation:

DETOUR: Dealing with Uncommitted Conflicting Changes at a Checkout

To proceed with a checkout when there are conflicting uncommitted changes in the working directory, there are several options:

Commit the changes.
Stash the changes.
Discard the changes.

T4L5. Rewriting History to Start Over

Git can also reset the revision history to a specific point so that you can start over from that point.

This lesson covers that part.

Suppose you realise your last few commits have gone in the wrong direction, and you want to go back to an earlier commit and continue from there — as if the “bad” commits never happened. Git’s reset feature can help you do that.

Git reset moves the tip of the current branch to a specific commit, optionally adjusting your staged and unstaged changes to match. This effectively rewrites the branch's history by discarding any commits that came after that point.

Resetting is different from the checkout feature:

Reset: Lets you start over from a past state. It rewrites history by moving the branch ref to a new location.
Checkout: Lets you explore a past state without rewriting history. It just moves the HEAD ref.

C3 master←HEAD (original tip of the branch)

→
[reset to C2...]

C3commit no longer in the master branch!

C2 master←HEAD (the new tip)

There are three types of resets: soft, mixed, hard. All three move the branch pointer to a new commit, but they vary based on what happens to the staging area and the working directory.

soft reset: Moves the cumulative changes from the discarded commits into the staging area, waiting to be committed again. Any staged and unstaged changes that existed before the reset will remain untouched.
mixed reset: Cumulative changes from the discarded commits, and any existing staged changes, are moved into the working directory.
hard reset: All staged and unstaged changes are discarded. Both the working directory and the staging area are aligned with the target commit (as if no changes were done after that commit).

HANDS-ON: Resetting to past commits

Preparation First, set the stage as follows (e.g., in the things repo):
i) Add four commits that are supposedly 'bad' commits.
ii) Do a 'bad' change to one file and stage it.
iii) Do a 'bad' change to another file, but don't stage it.

B4 master←HEADAdd incorrect.txt

B3Incorrectly update fruits.txt

B2Incorrectly update shapes.txt

B1Incorrectly update colours.txt

C4Update fruits list

The following commands can be used to add commits B1-B4:

echo "bad colour" >> colours.txt
git add colours.txt
git commit -m "Incorrectly update colours.txt"

echo "bad shape" >> shapes.txt
git add shapes.txt
git commit -m "Incorrectly update shapes.txt"

echo "bad fruit" >> fruits.txt
git add fruits.txt
git commit -m "Incorrectly update fruits.txt"

echo "bad line" >> incorrect.txt
git add incorrect.txt
git commit -m "Add incorrect.txt"

echo "another bad colour" >> colours.txt
git add colours.txt

echo "another bad shape" >> shapes.txt

Now we have some 'bad' commits and some 'bad' changes in both the staging area and the working directory. Let's use the reset feature to get rid of all of them, but do it in three steps so that you can learn all three types of resets.

1 Do a soft reset to B2 (i.e., discard last two commits). Verify,

the master branch is now pointing at B2, and,
the changes that were in the discarded commits (i.e., B3 and B4) are now in the staging area.

CLI

Use the git reset --soft <commit> command to do a soft reset.

git reset --soft HEAD~2

You can run the following commands to verify the current status of the repo is as expected.

git status                    # check overall status
git log --oneline --decorate  # check the branch tip
git diff                      # check unstaged changes
git diff --staged             # check staged changes

Sourcetree

Right-click on the commit that you want to reset to, and choose Reset <branch-name> to this commit option.

In the next dialog, choose Soft - keep all local changes.

2 Do a mixed reset to commit B1. Verify,

the master branch is now pointing at B1.
the staging area is empty.
the accumulated changes from all three discarded commits (including those from the previous soft reset) are now appearing as unstaged changes in the working directory.
Note how incorrect.txt appears as an 'untracked' file -- this is because unstaging a change of type 'add file' results in an untracked file.

CLI

Use the git reset --mixed <commit> command to do a mixed reset. The --mixed flag is the default, and can be omitted.

git reset HEAD~1

Verify the repo status, as before.

Sourcetree

Similar to the previous reset, but choose the Mixed - keep working copy but reset index option in the reset dialog.

3 Do a hard reset to commit C4. Verify,

the master branch is now pointing at C4 i.e., all 'bad' commits are gone.
the staging area is empty.
there are no unstaged changes (except for the untracked files incorrect.txt -- Git leaves untracked files alone, as untracked files are not meant to be under Git's control).

CLI

Use the git reset --hard <commit> command.

git reset --hard HEAD~1

Verify the repo status, as before.

Sourcetree

Similar to the previous reset, but choose the Hard - discard all working copy changes option.

done!

Rewriting history can cause your local repo to diverge from its remote counterpart. For example, if you discard earlier commits and create new ones in their place, and you’ve already pushed the original commits to a remote repository, your local branch history will no longer match the corresponding remote branch. Git refers to this as a diverged history.

To protect the integrity of the remote, Git will reject attempts to push a diverged branch using a normal push. If you want to overwrite the remote history with your local version, you must perform a force push.

HANDS-ON: Force-push commits

Preparation Choose a local-remote repo pair under your control e.g., the things repo from Tour 2: Backing up a Repo on the Cloud.

1 Rewrite the last commit: Reset the current branch back by one commit, and add a new commit.
For example, you can use the following commands.

git reset --hard HEAD~1
echo "water" >> drinks.txt
git add .
git commit -m "Add drinks.txt"

2 Observe how the local branch is diverged.

git log --oneline --graph --all

⤷

* fc1d04e (HEAD -> master) Add drinks.txt
| * e60deae (upstream/master, origin/master) Update fruits list
|/
* f761ea6 (tag: v1.0) Add colours.txt, shapes.txt
* 2bedace (tag: v0.9) Add figs to fruits.txt
* d5f91de Add fruits.txt

3 Attempt to push to the remote. Observe Git rejects the push.

git push origin master

⤷

To https://github.com/.../things.git
 ! [rejected]        master -> master (non-fast-forward)
error: failed to push some refs to 'https://github.com/.../things.git'
hint: Updates were rejected because the tip of your current branch is behind
hint: its remote counterpart. If you want to integrate the remote changes,
hint: ...

4 Do a force-push.

You can use the --force (or -f) flag to force push.

git push -f origin master

A safer alternative to --force is --force-with-lease which overwrites the remote branch only if it hasn’t changed since you last fetched it (i.e., only if remote doesn't have recent changes that you are unaware of):

git push --force-with-lease origin master

done!

DETOUR: Resetting Uncommitted Changes

At times, you might need to get rid of uncommitted changes so that you have a fresh start to the next commit.

To get rid of uncommitted changes, you can reset the repo to the last commit (i.e., HEAD):

CLI

The command git reset (without specifying a commit) defaults to git reset HEAD.

git reset: moves any staged changes to working directory (i.e., unstage).
git reset --hard: get rid of any staged and unstaged changes.

Sourcetree

Related DETOUR: Updating the Last Commit

Git allows you to amend the most recent commit. This is useful when you realise there’s something you’d like to change — e.g., fix a typo in the commit message, or to exclude some unintended change from the commit.

That aspect is covered in the tour Updating the Last Commit given in the lesson T5L3. Reorganising Commits.

DETOUR: Undoing/Deleting Recent Commits

How do you undo or delete the last few commits if you realise they were incorrect, unnecessary, or done too soon?

Undoing or deleting recent n commits is easily accomplished with Git's reset feature.

To delete recent n commits and discard the those changes entirely, do a hard reset the commit HEAD~n e.g.,

git reset --hard HEAD~3

To undo recent n commits, but keep changes staged, do a soft reset the commit HEAD~n e.g.,

git reset --soft HEAD~3

To undo recent n commits, and move changes to the working directory, do a mixed reset the commit HEAD~n e.g.,

git reset --mixed HEAD~3

To do the above for the most recent commit only, use HEAD~1 (or just HEAD~).

DETOUR: Resetting a Remote-Tracking Branch Ref

Suppose you moved back the current branch ref by two commits, as follows:

git reset --hard HEAD~2

C4 master←HEAD origin/master

→

C4 origin/master

C2 master←HEAD

If you now wish to move back the remote-tracking branch ref by two commits, so that the local repo 'forgets' that it previously pushed two more commits to the remote, you can do:

git update-ref refs/remotes/origin/master HEAD

C4 origin/master

C2 master←HEAD

→

C2 master←HEAD origin/master

The git update-ref refs/remotes/origin/master HEAD commands resets the remote-tracking branch ref origin/master to follow the current HEAD.

update-ref is an example of what are known as Git plumbing commands -- lower-level commands used by Git internally. In contrast, day-to-day Git commands (such as commit, log, push etc.) are known as porcelain commands (as in, in bathrooms we see the porcelain parts but not the plumbing parts that operates below the surface to make everything work).

T4L6. Reverting a Specific Commit

Git can add a new commit to reverse the changes done in a specific past commit, called reverting a commit.

This lesson covers that part.

When a past commit introduced a bug or an unwanted change, but you do not want to modify that commit — because rewriting history can cause problems if others have already based work on it — you can instead revert that commit.

Reverting creates a new commit that cancels out the changes of the earlier one i.e., Git computes the opposite of the changes introduced by that commit — essentially a reverse diff — and applies it as a new commit on top of the current branch. This way, the problematic changes are reversed while preserving the full history, including the "bad" commit and the "fix".

→
[revert C2]

R2This commit is the reverse of C2

HANDS-ON: Revert a commit

Preparation Run the following commands to create a repo with a few commits:

mkdir pioneers
cd pioneers
git init

echo "hacked the matrix" >> neo.txt
git add .
git commit -m "Add Neo"

echo "father of theoretical computing" >> alan-turing.txt
git add .
git commit -m "Add Turing"

echo "created COBOL, compiler pioneer" >> grace-hopper.txt
git add .
git commit -m "Add Hopper"

C3←HEADAdd Hopper

C2Add Turing

C1Add Neo

1 Revert the commit Add Neo.

CLI

You can use the git revert <commit> command to revert a commit. In this case, we want to revert the commit that is two commits behind the HEAD.

git revert HEAD~2

What happens next:

Git prepares a new commit which reverses the target commit
Git opens your default text editor containing a proposed commit message. You can edit it, or accept the proposed text.
Once you close the editor, Git will create the new commit.

Sourcetree

In the revision graph, right-click on the commit you want to revert, and choose Reverse commit...

done!

A revert can result in a conflict, if the new changes done to reverse the previous commit conflict with the changes done in other more recent commits. Then, you need to resolve the conflict before the revert operation can proceed. Conflict resolution is covered in a later topic.

At this point: You should now be able to use a repository’s revision history to understand how the working directory evolved over time -- and use that insight to inform your work.
How useful this history is depends greatly on how well it was constructed -- for example, how frequently and meaningfully commits were made -- we’ll explore that in a later tour.

What's next: Tour 5: Fine-Tuning the Revision History

W2.4b

Git Learning Trail → Tour 5: Fine-Tuning the Revision History

Tour 5: Fine-Tuning the Revision History

Target Usage: To maintain a clean and meaningful revision history.

Motivation: The usefulness of the revision history increases if it consists of well-crafted and well-documented commits.

Lesson plan:

To create well-crafted commits, you need to know how to control which precise changes go into a commit.

T5L1. Controlling What Goes Into a Commit covers that part.

Detailed and well-written commit messages can increase the value of Git revision history.

T5L2. Writing Good Commit Messages covers that part.

When the revision history gets 'messy', Git has a way to 'tidy up' the recent commits.

T5L3. Reorganising Commits covers that part.

T5L1. Controlling What Goes Into a Commit

To create well-crafted commits, you need to know how to control which precise changes go into a commit.

This lesson covers that part.

Crafting a commit involves two aspects:

What changes to include in it: deciding what changes belong together in a single commit — this is about commit granularity, ensuring each commit represents a meaningful, self-contained change.
How to include those changes: carefully staging just those changes — this is about using Git’s tools to precisely control what ends up in the commit.

SIDEBAR: Guidelines on what to include in a commit

A good commit represents a single, logical unit of change — something that can be described clearly in one sentence. For example, fixing a specific bug, adding a specific feature, or refactoring a specific function. If each commit tells a clear story about why the change was made and what it achieves, your repository history becomes a valuable narrative of the project’s development. Here are some (non-exhaustive) guidelines:

No more than one change per commit: Avoid lumping unrelated changes into one commit, as this makes the history harder to understand, review, or revert (if each commit contains one standalone change, to reverse that change can be done by deleting or reverting that specific commit entirely, without affecting any other changes).
Make the commit standalone: Don’t split a single logical change across multiple commits unnecessarily, as this can clutter the history and make it harder to follow the evolution of an idea or feature.
Small enough to review easily, but large enough to stand on its own: For example, fixing the same typo in five files can be one commit — splitting it into five separate commits is excessive. Conversely, implementing a big feature may be too much for one commit — instead, break it down into a series of commits, each containing a meaningful yet standalone step towards the final goal.

Git can let you choose not just which files, but which specific changes within those files, to include in a commit. Most Git tools — including the command line and many GUIs — let you interactively select which "hunks" or even individual lines of a file to stage. This allows you to separate unrelated changes and avoid committing unnecessary edits. If you make multiple changes in the same file, you can selectively stage only the parts that belong to the current logical change.

This level of control is particularly useful when:

You noticed and fixed a small, unrelated issue while working on something else.
You experimented with multiple approaches in the same file and now want to commit only the final, clean solution.
You want your commit history to clearly separate concerns, even when the edits touch the same files.

HANDS-ON: Stage changes selectively

Preparation You can use any repo for this.

1 Do several changes to some tracked files. Change multiple files. Also change multiple locations in the same file.

2 Stage some changes in some files while keeping other changes in the same files unstaged.

CLI

As you know, you can use git add <filename> to stage changes to an entire file.

To select which hunks to stage, you can use the git add -p command instead (-p stands for 'by patch'):

git add -p

This command will take you to an interactive mode in which you can go through each hunk and decide if you want to stage it. The video below contains a demonstration of how this feature works:

Sourcetree

To stage a hunk, you can click the Stage button above the hunk in question:

To stage specific lines, select the lines first before clicking the `Stage` button above the hunk in question:

Unstaging can be done similarly:

Most git operations can be done faster through the CLI than equivalent Git GUI clients, once you are familiar enough with the CLI commands.

However, selective staging is one exception where a good GUI can do better than the CLI, if you need to do many fine-grained staging operations (e.g., frequently staging only parts of hunks).

done!

T5L2. Writing Good Commit Messages

Detailed and well-written commit messages can increase the value of Git revision history.

This lesson covers that part.

Every commit you make in Git also includes a commit message that explains the change. While one-line messages are fine for small or obvious changes, as your revision history grows, good commit messages become an important source of information — for example, to understand the rationale behind a specific change made in the past.

A commit message is meant to explain the intent behind the changes, not just what was changed. The code (or diff) already shows what changed. Well-written commit messages make collaboration, code reviews, debugging, and future maintenance easier by helping you and others quickly understand the project’s history without digging into the code of every commit.

A complete commit message can include a short summary line (the subject) followed by a more detailed body if needed. The subject line should be a concise description of the change, while the body can elaborate on the context, rationale, side effects, or other details if the change is more complex.

A commit message has the following structure (note how the subject and the body are separated by a blank line):

Subject line
<blank line>
Body

# lines starting with '#' are ignored (they will not be included in the commit message)

Here is an example commit message:

Find command: make matching case-insensitive

Find command is case-sensitive.

A case-insensitive find is more user-friendly because users cannot be
expected to remember the exact case of the keywords.

Let's,
* update the search algorithm to use case-insensitive matching
* add a script to migrate stress tests to the new format

HANDS-ON: Add a full commit message

Do some changes to a repo you have.
Commit the changes while writing a full commit message (i.e., subject + body).

CLI

When you are ready to commit, use the git commit command (without specifying a commit message).

git commit

This will open your default text editor (like Vim, Nano, or VS Code). Write the commit message inside the editor.

Save and close the editor to create the commit.

Sourcetree

You can write your full commit message in the textbox you have been using to write commit messages already.

done!

Following a style guide makes your commit messages more consistent and fit-for-purpose. Many teams adopt established guidelines. These style guides typically contain common conventions that Git users follow when writing commit messages. For example:

Keep the subject line (the first line) under 50–72 characters.
Write the subject in the imperative mood (e.g., Fix typo in README rather than Fixed typo or Fixes typo).
Leave a blank line between the subject and the body, if you include a body.
Wrap the body at around 72 characters per line for readability.

PRO-TIP: Configure Git to use your preferred text editor

Git will use the default text editor when it needs you to write a commit message. However, Git can be configured to use a different text editor of your choice.

You can use the following command to set the Git's default text editor:

git config --global core.editor "<editor command>"

Some examples for <editor command>

Editor	Command to use
Vim (default)	`vim`
Nano	`nano`
VS Code	`code --wait` e.g., `git config --global core.editor "code --wait"` For this to work, your computer should already be configured to launch VS Code using the `code` command. See here to find how (refer the 'Launching from command line' section).
Sublime Text	`subl -n -w`
Atom	`atom --wait`
Notepad++	`notepad++.exe` (Windows only)
Notepad	`notepad` (Windows built-in)

Why use --wait or -w? Graphical editors (like VS Code or Sublime) start a separate process, which can take a few seconds. Without --wait, Git may think editing is done before you actually write the message. --wait makes Git pause until the editor window is closed.

RESOURCES

Recommended style guide for Git commit messages: Git Conventions @se-edu/guides

T5L3. Reorganising Commits

When the revision history gets 'messy', Git has a way to 'tidy up' the recent commits.

This lesson covers that part.

Git has a powerful tool called interactive rebasing which lets you review and reorganise your recent commits. With it, you can reword commit messages, change their order, delete commits, combine several commits into one (squash), or split a commit into smaller pieces. This feature is useful for tidying up a commit history that has become messy — for example, when some commits are out of order, poorly described, or include changes that would be clearer if split up or combined.

HANDS-ON: Tidy-up commits

Preparation Run the following commands to create a sample repo that we'll be using for this hands-on practical:

mkdir samplerepo-sitcom
cd samplerepo-sitcom
git init

echo "Aspiring actress" >> Penny.txt
git add .
git commit -m "C1: Add Penny.txt"

echo "Scientist" >> Sheldon.txt
git add .
git commit -m "C3: Add Sheldon.txt"

echo "Comic book store owner" >> Stuart.txt
git add .
git commit -m "C2: Add Stuart.txt"

echo "Engineer" >> Stuart.txt
git commit -am "X: Incorrectly update Stuart.txt"

echo "Engineer" >> Howard.txt
git add .
git commit -m "C4: Adddd Howard.txt"

Target Here are the commits that should be in the created repo, and how each commit needs to be 'tidied up'.

C4: Adddd Howard.txt -- Fix typo in the commit message Adddd → Add.
X: Incorrectly update Stuart.txt -- Drop this commit.
C2: Add Stuart.txt -- Swap this commit with the one below.
C3: Add Sheldon.txt -- Swap this commit with the one above.
C1: Add Penny.txt -- No change required.

1 Start the interactive rebasing.

CLI

To start the interactive rebase, use the git rebase -i <start-commit> command. -i stands for 'interactive'. In this case, we want to modify the last four commits (hence, HEAD~4).

git rebase -i HEAD~4

⤷

pick 97a8c4a C3: Add Sheldon.txt
pick 60bd28d C2: Add Stuart.txt
pick 8b9a36f X: Incorrectly update Stuart.txt
pick 8ab6941 C4: Adddd Howard.txt

# Rebase ee04afe..8ab6941 onto ee04afe (4 commands)
#
# Commands:
# p, pick <commit> = use commit
# r, reword <commit> = use commit, but edit the commit message
# e, edit <commit> = use commit, but stop for amending
# s, squash <commit> = use commit, but meld into previous commit
# f, fixup [-C | -c] <commit> = like "squash" but keep only the previous
#                    commit's log message, unless -C is used, in which case
#                    keep only this commit's message; -c is same as -C but
#                    opens the editor
# x, exec <command> = run command (the rest of the line) using shell
# b, break = stop here (continue rebase later with 'git rebase --continue')
# d, drop <commit> = remove commit
# l, label <label> = label current HEAD with a name
# t, reset <label> = reset HEAD to a label
# m, merge [-C <commit> | -c <commit>] <label> [# <oneline>]
#         create a merge commit using the original merge commit's
#         message (or the oneline, if no original merge commit was
#         specified); use -c <commit> to reword the commit message
# u, update-ref <ref> = track a placeholder for the <ref> to be updated
#                       to this position in the new commits. The <ref> is
#                       updated at the end of the rebase
#
# These lines can be re-ordered; they are executed from top to bottom.
#
# If you remove a line here THAT COMMIT WILL BE LOST.
#
# However, if you remove everything, the rebase will be aborted.
#

The command will take you to the text editor, which will present you with a wall of text similar to the above. It has two parts:

At the top, the list of commits and the action to take on each, oldest commit first, with the action pick indicated by default (pick means 'use this commit in the result') for each.
At the bottom, instructions on how to edit those lines.

2 Edit the commit list to specify the rebase actions, as follows:

pick 60bd28d C2: Add Stuart.txt
pick 97a8c4a C3: Add Sheldon.txt
drop 8b9a36f X: Incorrectly update Stuart.txt
reword 8ab6941 C4: Addddd Howard.txt

4 Once you save edits and exit the text editor, Git will perform the rebase based on the actions you specified, from top to bottom.

At some steps, Git will pause the rebase and ask for your inputs. In this case, it will ask you to specify the new commit message when it is processing the following line.

reword 8ab6941 C4: Addddd Howard.txt

Sourcetree

To go to the interactive rebase mode, right-click the parent commit of the earliest commit you want to reorganise (in this case, it is C1: Add Penny.txt) and choose Rebase children of <SHA> interactively...

2 To indicate what action you want to perform on each commit, select the commit in the list and click on the button for the action you want to do on it:

3 To execute the rebase, after indicating the action for all commits (the dialog will look like the below), click OK.

The final result should be something like the following, 'tidied up' exactly as we wanted:

* 727d877 C4: Add Howard.txt
* 764fc29 C3: Add Sheldon.txt
* 08a965a C2: Add Stuart.txt
* 6436598 C1: Add Penny.txt

done!

Rebasing rewrites history. It is not recommended to rebase commits you have already shared with others.

DETOUR: Updating the Last Commit

Updating the commit message

CLI

To change the commit message subject only, use the git commit --amend -m "<new commit message>" command.

git commit --amend -m "Fix bug that froze the GUI"

To change the entire commit message (not just the subject), run the git commit --amend command, which will open the text editor for you to edit the commit message. The commit will be updated when you close the text editor.

Sourcetree

Click on the Commit button on the top menu. In the region that you use to enter the commit message, use one of the two methods given below to go into the 'Amend last commit' mode.

This will populate the text box with the previous commit message. Amend it as you wish, and click the `Commit` button to update the commit.

Updating changes in the commit

While there are multiple ways to do this, one method that will work universally is to do a 'soft reset' of the last commit, update the staging area as you wish, and commit again.

'Updating' a commit does not really update that commit -- it simply creates a new commit with the new data. The original commit remains and is 'left behind' in the repo, and will be garbage-collected after a while if it is not referenced by anything else.

At this point: You should now be able to create more meaningful commits from the start, and also refine them further after they’ve been created.

What's next: Tour 6: Branching Locally

Guidance for the item(s) below:

As you are likely to be using an IDE for the iP, let's learn at least enough about IDEs to get you started using one.

🤔 In case you are puzzled by the sudden change of topic, it's because we take an iterative approach to covering topics, as explained in the panel below:

[W2.5] IDEs: Basic Features

W2.5a

Implementation → IDEs → What

Professional software engineers often write code using Integrated Development Environments (IDEs). IDEs support most development-related work within the same tool (hence, the term integrated).

An IDE generally consists of:

A source code editor that includes features such as syntax coloring, auto-completion, easy code navigation, error highlighting, and code-snippet generation.
A compiler and/or an interpreter (together with other build automation support) that facilitates the compilation/linking/running/deployment of a program.
A debugger that allows the developer to execute the program one step at a time to observe the run-time behavior in order to locate bugs.
Other tools that aid various aspects of coding e.g., support for automated testing, drag-and-drop construction of UI components, version management support, simulation of the target runtime platform, modeling support, AI-assisted coding help, collaborative coding with others.

Examples of popular IDEs:

Java: Eclipse, IntelliJ IDEA, NetBeans
C#, C++: Visual Studio
Swift: XCode
Python: PyCharm
Multiple languages: VS Code

Some web-based IDEs have appeared in recent times too e.g., Amazon's Cloud9 IDE.

Some experienced developers, in particular those with a UNIX background, prefer lightweight yet powerful text editors with scripting capabilities (e.g., Emacs) over heavier IDEs.

Exercises:

Which of these are features available in IDEs?

W2.5b

Tools → IDEs → Project setup

Refer to these se-edu guides:

Intellij IDEA: Setting up
VS Code: Refer to the first few tutorials given here.

Guidance for the item(s) below:

As you start adding features to your project iteratively, you'll need a way to detect if the new code breaks the existing code. Next, let's learn a rather simple way to do that using a certain type of testing (we'll be learning more sophisticated methods in later weeks).

This also means we are now switching focus from the implementation aspect to the testing aspect of SE.

[W2.6] Automated Testing of Text UIs

W2.6a

Quality Assurance → Testing → Introduction → What

Video

Testing: Operating a system or component under specified conditions, observing or recording the results, and making an evaluation of some aspect of the system or component. –- source: IEEE

When testing, you execute a set of test cases. A test case specifies how to perform a test. At a minimum, it specifies the input to the software under test (SUT) and the expected behavior.

Example: A minimal test case for testing a browser:

Input – Start the browser using a blank page (vertical scrollbar disabled). Then, load longfile.html located in the test data folder.
Expected behavior – The scrollbar should be automatically enabled upon loading longfile.html.

Other details a test case can contain ... extra

Test cases can be determined based on the specification, reviewing similar existing systems, or comparing to the past behavior of the SUT.

For each test case you should do the following:

Feed the input to the SUT
Observe the actual output
Compare actual output with the expected output

A test case failure is a mismatch between the expected behavior and the actual behavior. A failure indicates a potential defect (or a bug) -- we say 'potential' because the error could be in the test case itself.

Example: In the browser example above, a test case failure is implied if the scrollbar remains disabled after loading longfile.html. The defect/bug causing that failure could be an uninitialized variable.

A deeper look at the definition of testing extra

W2.6b

Quality Assurance → Testing → Regression Testing → What

Video

When you modify a system, the modification may result in some unintended and undesirable effects on the system. Such an effect is called a regression.

Regression testing is the re-testing of the software to detect regressions. The typical way to detect regressions is retesting all related components, even if they had been tested before.

Regression testing is more effective when it is done frequently, after each small change. However, doing so can be prohibitively expensive if testing is done manually. Hence, regression testing is more practical when it is automated.

Exercises:

Regression Testing definition: T/F?

W2.6c

Quality Assurance → Testing → Test Automation → What

An automated test case can be run programmatically and the result of the test case (pass or fail) is determined programmatically. Compared to manual testing, automated testing reduces the effort required to run tests repeatedly and increases precision of testing (because manual testing is susceptible to human errors).

Resources:

[Quora post] What is the best way to avoid bugs
[Quora post] Is automated testing relevant to startups?

W2.6d

Quality Assurance → Testing → Test Automation → Automated testing of CLI applications

Video

A simple way to semi-automate testing of a CLI (Command Line Interface) app is by using input/output re-direction. Here are the high-level steps:

First, you feed the app with a sequence of test inputs that is stored in a file while redirecting the output to another file.
Next, you compare the actual output file with another file containing the expected output.

Let's assume you are testing a CLI app called AddressBook. Here are the detailed steps:

Store the test input in the text file input.txt.

Example input.txt
Store the output you expect from the SUT in another text file expected.txt.

Example expected.txt
Run the program as given below, which will redirect the text in input.txt as the input to AddressBook and similarly, will redirect the output of AddressBook to a text file output.txt. Note that this does not require any changes in AddressBook code.
```
java AddressBook < input.txt > output.txt
```
- The way to run a CLI program differs based on the language.
  e.g., In Python, assuming the code is in AddressBook.py file, use the command
  python AddressBook.py < input.txt > output.txt
- If you are using Windows, use a normal MS-DOS terminal (i.e., cmd.exe) to run the app, not a PowerShell window.

Next, you compare output.txt with the expected.txt. This can be done using a utility such as Windows' FC (i.e., File Compare) command, Unix's diff command, or a GUI tool such as WinMerge.

FC output.txt expected.txt

Note that the above technique is only suitable when testing CLI apps, and only if the exact output can be predetermined. If the output varies from one run to the other (e.g., it contains a time stamp), this technique will not work. In those cases, you need more sophisticated ways of automating tests.

Follow up notes for the item(s) above:

Congrats! You've made it to the end of this week's topics. It feels like a lot right now but now that we got an early start, this stuff will be second nature to you by the time you are done with the semester. 😃