Classifiers

class text_classification.classifier.base.BaseClassifier[source]

Bases: abc.ABC

Base class that all classifier classes should inherit from to ensure uniformity.

abstract evaluate(preprocessor, evaluate_test=True, evaluate_dev=False)[source]

Should make predictions on the preprocessor’s train and/or dev set and print out evaluation metrics.

Parameters
  • preprocessor (BasePreprocessor) – Preprocessor containing dev/test samples.

  • evaluate_test (bool) – Whether to evaluate on the test set.

  • evaluate_dev (bool) – Whether to evaluate on dev set.

classmethod load(filename)[source]

Loads a previously saved classifier from a binary file.

Parameters

filename (str) – Name of the binary file that the classifier should be loaded from.

Returns

Classifier instance.

abstract predict(preprocessor, predict_train=False, predict_test=True, predict_dev=False)[source]

Should add the field prediction to the preprocessor’s instances containing the predicted label.

Parameters
  • preprocessor (BasePreprocessor) – Preprocessor containing the samples to make predictions on.

  • predict_train (bool) – Whether to make predictions on the train set.

  • predict_test (bool) – Whether to make predictions on the test set.

  • predict_dev (bool) – Whether to make predictions on the dev set.

save(filename)[source]

Saves current classifier instance in binary format.

Parameters

filename (str) – Name of the file where the classifier should be saved.

abstract train(preprocessor)[source]

Should train the classifier on the preprocessor’s train set.

Parameters

preprocessor (BasePreprocessor) – Preprocessor instance that contains the train set to train the classifier on.

Returns

BaseClassifier

class text_classification.classifier.class_average.ClassAverageClassifier[source]

Bases: text_classification.classifier.base.BaseClassifier

A classifier that computes average feature values for each class and predicts the class whose average feature vector is most similar to the instance to predict tha class for.

__init__()[source]

Initialize self. See help(type(self)) for accurate signature.

evaluate(preprocessor, evaluate_test=True, evaluate_dev=False)[source]

Evaluates the current model on the preprocessor’s test and/or dev set and prints a classification report containing accuracy, precision, recall and F1-scores.

Parameters
  • preprocessor (BasePreprocessor) – Preprocessor containing dev/test samples.

  • evaluate_test (bool) – Whether to evaluate on the test set.

  • evaluate_dev (bool) – Whether to evaluate on dev set.

classmethod load_average_feature_vectors(filename, delimiter='\t', label_col='label')[source]

Loads trained average vectors from a CSV-file and instantiates a ClassAverageClassifier instance-

Parameters
  • filename (str) – File where average vectors are saved

  • delimiter (str) – Delimiter used in CSV-file.

  • label_col (str) – Name of label column.

Returns

ClassAverageClassifier instance.

predict(preprocessor, predict_train=False, predict_test=True, predict_dev=False)[source]

Makes predictions for samples inside preprocessor in-place, i.e. for each instance, a key ‘prediction’ containing the prediction is added. Instances have to be featurized before using the same Featurizer that was used for training instances.

Parameters
  • preprocessor (BasePreprocessor) – Preprocessor containing the samples to make predictions on.

  • predict_train (bool) – Whether to make predictions on the train set.

  • predict_test (bool) – Whether to make predictions on the test set.

  • predict_dev (bool) – Whether to make predictions on the dev set.

predict_from_dicts(dicts)[source]

Make predictions on a a list of dictionaries. Dictionaries must contain key ‘feature_vector’ consisting of the feature vector.

Parameters

dicts (List[dict]) – List of dicts, where each dict represents an instance,

Returns

Updated list of dictionaries.

save_average_feature_vectors(filename, delimiter='\t', label_col='label')[source]

Saves the trained average vectors to a CSV-file.

Parameters
  • filename (str) – File where average vectors should be saved.

  • delimiter (str) – Delimiter used in CSV-file.

  • label_col (str) – Name of label column.

train(preprocessor)[source]

Computes the average feature vector for each class in the pre- processor’s train set.

Parameters

preprocessor (BasePreprocessor) – Preprocessor instance that contains a train set and has been already featurized, i.e. each train instance should contain the keys “feature_vector”, “feature_names” and “label”.

Returns

ClassAverageClassifier