Classifiers¶
-
class
text_classification.classifier.base.
BaseClassifier
[source]¶ Bases:
abc.ABC
Base class that all classifier classes should inherit from to ensure uniformity.
-
abstract
evaluate
(preprocessor, evaluate_test=True, evaluate_dev=False)[source]¶ Should make predictions on the preprocessor’s train and/or dev set and print out evaluation metrics.
- Parameters
preprocessor (BasePreprocessor) – Preprocessor containing dev/test samples.
evaluate_test (bool) – Whether to evaluate on the test set.
evaluate_dev (bool) – Whether to evaluate on dev set.
-
classmethod
load
(filename)[source]¶ Loads a previously saved classifier from a binary file.
- Parameters
filename (str) – Name of the binary file that the classifier should be loaded from.
- Returns
Classifier instance.
-
abstract
predict
(preprocessor, predict_train=False, predict_test=True, predict_dev=False)[source]¶ Should add the field
prediction
to the preprocessor’s instances containing the predicted label.- Parameters
preprocessor (BasePreprocessor) – Preprocessor containing the samples to make predictions on.
predict_train (bool) – Whether to make predictions on the train set.
predict_test (bool) – Whether to make predictions on the test set.
predict_dev (bool) – Whether to make predictions on the dev set.
-
save
(filename)[source]¶ Saves current classifier instance in binary format.
- Parameters
filename (str) – Name of the file where the classifier should be saved.
-
abstract
train
(preprocessor)[source]¶ Should train the classifier on the preprocessor’s train set.
- Parameters
preprocessor (BasePreprocessor) – Preprocessor instance that contains the train set to train the classifier on.
- Returns
BaseClassifier
-
abstract
-
class
text_classification.classifier.class_average.
ClassAverageClassifier
[source]¶ Bases:
text_classification.classifier.base.BaseClassifier
A classifier that computes average feature values for each class and predicts the class whose average feature vector is most similar to the instance to predict tha class for.
-
evaluate
(preprocessor, evaluate_test=True, evaluate_dev=False)[source]¶ Evaluates the current model on the preprocessor’s test and/or dev set and prints a classification report containing accuracy, precision, recall and F1-scores.
- Parameters
preprocessor (BasePreprocessor) – Preprocessor containing dev/test samples.
evaluate_test (bool) – Whether to evaluate on the test set.
evaluate_dev (bool) – Whether to evaluate on dev set.
-
classmethod
load_average_feature_vectors
(filename, delimiter='\t', label_col='label')[source]¶ Loads trained average vectors from a CSV-file and instantiates a ClassAverageClassifier instance-
- Parameters
filename (str) – File where average vectors are saved
delimiter (str) – Delimiter used in CSV-file.
label_col (str) – Name of label column.
- Returns
ClassAverageClassifier instance.
-
predict
(preprocessor, predict_train=False, predict_test=True, predict_dev=False)[source]¶ Makes predictions for samples inside preprocessor in-place, i.e. for each instance, a key ‘prediction’ containing the prediction is added. Instances have to be featurized before using the same Featurizer that was used for training instances.
- Parameters
preprocessor (BasePreprocessor) – Preprocessor containing the samples to make predictions on.
predict_train (bool) – Whether to make predictions on the train set.
predict_test (bool) – Whether to make predictions on the test set.
predict_dev (bool) – Whether to make predictions on the dev set.
-
predict_from_dicts
(dicts)[source]¶ Make predictions on a a list of dictionaries. Dictionaries must contain key ‘feature_vector’ consisting of the feature vector.
- Parameters
dicts (List[dict]) – List of dicts, where each dict represents an instance,
- Returns
Updated list of dictionaries.
-
save_average_feature_vectors
(filename, delimiter='\t', label_col='label')[source]¶ Saves the trained average vectors to a CSV-file.
- Parameters
filename (str) – File where average vectors should be saved.
delimiter (str) – Delimiter used in CSV-file.
label_col (str) – Name of label column.
-
train
(preprocessor)[source]¶ Computes the average feature vector for each class in the pre- processor’s train set.
- Parameters
preprocessor (BasePreprocessor) – Preprocessor instance that contains a train set and has been already featurized, i.e. each train instance should contain the keys “feature_vector”, “feature_names” and “label”.
- Returns
ClassAverageClassifier
-