Cluster Alignment With a Teacher For Unsupervised Domain Adaptation


Deep network has achieved remarkable performance in computer vision tasks like image recognition and object detection. However, in most cases, the classifier that trained on a specific dataset does not generalize well in any new dataset, this phenomenon is known as data shift problem. In order to tackle this data shift problem, we will learn how data adaptation works — enable or adapt the model to generalize well in both source and target domain.

Proposed Method

In this paper, the authors introduced a regularization approach called Cluster Alignment with a Teacher (shorts for CAT), in UDA. It has 3 main learning objectives.

First, CAT tries to minimize the supervised classification loss in the source domain. It borrows the idea of teacher-student paradigm from semi-supervised learning, which creates a teacher model in the source domain and provides the pseudo labels to the unlabeled target data domain. The class conditional distribution in the source domain is obvious because the samples are fully labeled but not in the case of the target domain, due to lack of labels. With the implementation of the teacher model, it acts as labeling function and propagates the labels to the unlabeled target data, and produces their corresponding class conditional distribution.

Secondly, with the pseudo labels in the target data domain, it can deploy a class-conditional clustering loss to encourage the features from the same class to concentrate together and push those features from different classes far away.

Besides, CAT also aligns the clusters correspond to the same class but from different domains ( source and target domain) in the feature space so that they can be separated by the margin distribution. It delivers domain-invariant feature space with enhanced discriminative power.

Let me walk through the notation conventions here. X_{s} is a set of samples containing labeled source data, whereas Y_{s} is a set of samples containing unlabeled target data.

The supervised classification loss is defined as:

Supervised classification loss function

where loss can be cross-entropy loss, optimizes the h classifier w.r.t. the labeled source data. This learning minimizes the supervised loss of the model to render a teacher model.

In order to achieve discriminative learning and class-conditional distribution alignment between source and target domain, they proposed discriminative clustering loss, which forms clusters on the features that belong to the same class and discriminates those of different classes. The class-conditional structure can be shaped to be more discriminative by optimizing this clustering loss. It modifies the structure in the representation space gradually.

The discriminative clustering loss is defined as:

Discriminative Clustering loss function

Besides, cluster-based alignment loss was also proposed to align the clusters corresponding to the same class in different domains. The geometrical alignment of the clusters from two domains has to be adapted in order to learn the domain-invariant features better and adjust the target feature space to be classifiable.

The cluster-based alignment loss is defined as:

Cluster-based alignment loss function

Given the algorithms mentioned above, CAT optimizes the classifier via follows:

Resultant loss function
Figure 1: The framework of CAT (taken from



[2] Github:



Engineer | Personal Finance | Investor

Love podcasts or audiobooks? Learn on the go with our new app.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store