Classifiers

class aisploit.classifiers.MarkdownInjectionClassifier

Bases: BaseTextClassifier[List[Any]]

A text classifier to detect Markdown injection in input text.

score(input: str, references: List[str] | None = None, metadata: Dict[str, Any] | None = None) Score[List[Any]]

Score the input and return a Score object.

Args:

input (Input): The input to be scored. references (List[Input], optional): List of reference inputs. Defaults to None. metadata (Dict[str, Any], optional): Additional metadata for scoring. Defaults to {}.

Returns:

Score[T]: A Score object representing the score of the input.

class aisploit.classifiers.PythonPackageHallucinationClassifier(python_version: str = '3.12')

Bases: BaseTextClassifier[List[str]]

A text classifier that identifies hallucinated Python package names in code.

python_version: str
score(input: str, references: List[str] | None = None, metadata: Dict[str, Any] | None = None) Score[List[str]]

Scores the input based on the presence of hallucinated Python package names.

Args:

input (str): The input text to analyze.

Returns:

Score[List[str]]: A score object containing information about the analysis results.

tags: List[str]
class aisploit.classifiers.RegexClassifier(*, pattern: Pattern, flag_matches=True)

Bases: BaseTextClassifier[bool]

A text classifier based on regular expressions.

score(input: str, references: List[str] | None = None, metadata: Dict[str, Any] | None = None) Score[bool]

Score the input based on the regular expression pattern.

Args:

input (str): The input text to be scored.

Returns:

Score[bool]: A Score object representing the result of scoring.

class aisploit.classifiers.RepeatedTokenClassifier

Bases: BaseTextClassifier[str]

score(input: str, references: List[str] | None = None, metadata: Dict[str, Any] | None = None) Score[str]

Score the input and return a Score object.

Args:

input (Input): The input to be scored. references (List[Input], optional): List of reference inputs. Defaults to None. metadata (Dict[str, Any], optional): Additional metadata for scoring. Defaults to {}.

Returns:

Score[T]: A Score object representing the score of the input.

class aisploit.classifiers.SelfSimilarityClassifier(*, embeddings: ~langchain_core.embeddings.embeddings.Embeddings = <factory>, threshold: float = 0.7, aggregation: ~typing.Literal['mean', 'min'] = 'mean')

Bases: BaseTextClassifier[Dict[str, Any]]

A text classifier based on self-similarity using cosine similarity scores.

aggregation: Literal['mean', 'min']
embeddings: Embeddings
score(input: str, references: List[str] | None = None, metadata: Dict[str, Any] | None = None) Score[Dict[str, Any]]

Score the input text based on its self-similarity to reference texts.

Args:

input (str): The input text to be scored. references (List[str], optional): List of reference texts. Defaults to None.

Raises:

ValueError: If references is None or if the number of references is not at least 1.

Returns:

Score[Dict[Any]]: A Score object representing the self-similarity score of the input.

tags: List[str]
threshold: float
class aisploit.classifiers.SubstringClassifier(*, substring: str, ignore_case=True, flag_matches=True)

Bases: RegexClassifier

A text classifier based on substring matching.

class aisploit.classifiers.TextTokenClassifier(token: str)

Bases: BaseTextClassifier[bool]

score(input: str, references: List[str] | None = None, metadata: Dict[str, Any] | None = None) Score[bool]

Score the input and return a Score object.

Args:

input (Input): The input to be scored. references (List[Input], optional): List of reference inputs. Defaults to None. metadata (Dict[str, Any], optional): Additional metadata for scoring. Defaults to {}.

Returns:

Score[T]: A Score object representing the score of the input.

token: str
class aisploit.classifiers.amazon.ComprehendPIIClassifier(session: ~boto3.session.Session = <factory>, region_name: str = 'us-east-1', *, language: str = 'en', threshold: float = 0.7, filter_func: ~typing.Callable[[str, dict], bool] | None = None)

Bases: BaseComprehendClassifier[List[Any]]

A classifier that uses Amazon Comprehend to detect personally identifiable information (PII).

filter_func: Callable[[str, dict], bool] | None
language: str
score(input: str, references: List[str] | None = None, metadata: Dict[str, Any] | None = None) Score[List[Any]]

Score the input for PII using Amazon Comprehend.

Args:

input (str): The input text to be scored. _references: List of reference inputs (ignored).

Returns:

Score[List[Any]]: A Score object representing the PII entities found in the input.

tags: List[str]
threshold: float
class aisploit.classifiers.amazon.ComprehendToxicityClassifier(session: ~boto3.session.Session = <factory>, region_name: str = 'us-east-1', language: str = 'en', threshold: float = 0.7)

Bases: BaseComprehendClassifier[Dict[str, Any]]

A classifier that uses Amazon Comprehend to detect toxicity in text.

language: str
score(input: str, references: List[str] | None = None, metadata: Dict[str, Any] | None = None) Score[Dict[str, Any]]

Score the input for toxicity using Amazon Comprehend.

Args:

input (str): The input text to be scored. _references: List of reference inputs (ignored).

Returns:

Score[Dict[str, Any]]: A Score object representing the toxicity score of the input.

tags: List[str]
threshold: float
class aisploit.classifiers.huggingface.BertScoreClassifier(threshold: float = 0.8, model_type: str = 'distilbert-base-uncased')

Bases: BaseTextClassifier[Dict[str, Any]]

A classifier that computes BERTScore for text inputs.

bertscore: EvaluationModule
model_type: str
score(input: str, references: List[str] | None = None, metadata: Dict[str, Any] | None = None) Score[Dict[str, Any]]

Score the input using BERTScore computed by the evaluate module.

Args:

input (str): The input text to be scored. references (List[str], optional): List of reference texts. Defaults to None.

Raises:

ValueError: If references is None or if the number of references is not equal to 1.

Returns:

Score[Dict[str, Any]]: A Score object representing the BERTScore of the input.

threshold: float
class aisploit.classifiers.huggingface.BleuClassifier(threshold: float = 0.2)

Bases: BaseTextClassifier[Dict[str, Any]]

A classifier that computes BLEU score for text inputs.

bleu: EvaluationModule
score(input: str, references: List[str] | None = None, metadata: Dict[str, Any] | None = None) Score[Dict[str, Any]]

Score the input using BLEU score computed by the evaluate module.

Args:

input (str): The input text to be scored. references (List[str], optional): List of reference texts. Defaults to None.

Raises:

ValueError: If the number of references is not equal to 1.

Returns:

Score[Dict[str, Any]]: A Score object representing the BLEU score of the input.

threshold: float
class aisploit.classifiers.huggingface.PipelinePromptInjectionClassifier(*, model_name: str = 'laiyer/deberta-v3-base-prompt-injection', injection_label: str = 'INJECTION', threshold: float = 0.5)

Bases: BaseTextClassifier[float]

score(input: str, references: List[str] | None = None, metadata: Dict[str, Any] | None = None) Score[float]

Score the input and return a Score object.

Args:

input (Input): The input to be scored. references (List[Input], optional): List of reference inputs. Defaults to None. metadata (Dict[str, Any], optional): Additional metadata for scoring. Defaults to {}.

Returns:

Score[T]: A Score object representing the score of the input.

class aisploit.classifiers.openai.ModerationClassifier(*, api_key: str | None = None)

Bases: BaseTextClassifier[Moderation]

A classifier that uses the OpenAI Moderations API for scoring.

score(input: str, references: List[str] | None = None, metadata: Dict[str, Any] | None = None) Score[Moderation]

Score the input using the OpenAI Moderations API.

Args:

input (str): The input text to be scored. _: List of references (ignored).

Returns:

Score[Moderation]: A Score object representing the moderation score of the input.

class aisploit.classifiers.presidio.PresidioAnalyserClassifier(*, language: str = 'en', entities: ~typing.List[str] | None = None, threshold: float = 0.7, additional_recognizers: ~typing.List[~presidio_analyzer.entity_recognizer.EntityRecognizer] = <factory>, filter_func: ~typing.Callable[[str, ~presidio_analyzer.recognizer_result.RecognizerResult], bool] | None = None)

Bases: BaseTextClassifier[List[RecognizerResult]]

A text classifier using the Presidio Analyzer for detecting Personally Identifiable Information (PII).

additional_recognizers: List[EntityRecognizer]
entities: List[str] | None
filter_func: Callable[[str, RecognizerResult], bool] | None
language: str
score(input: str, references: List[str] | None = None, metadata: Dict[str, Any] | None = None) Score[List[RecognizerResult]]

Score the input text for Personally Identifiable Information (PII) entities.

Args:

input (str): The input text to be scored. _references: List[str], optional): Ignored parameter. Defaults to None.

Returns:

Score[List[RecognizerResult]]: A Score object representing the results of PII detection.

tags: List[str]
threshold: float