The Detector Engine

Query structure

Detectors are composed of rules, or queries that are compiled into an efficient detector and are ran with the Spectral engine against files.

Each query is a group of patterns, called a pattern_group and is hierarchical (a pattern group can contain more pattern groups and so on).

A pattern group is a collection of patterns with an aggregate relation (you might some similarities with theories behind Datalog).

    pattern_group:
      aggregate: or | and | append
      patterns:
      - pattern: "(:?key|token|secret|password|pwd|passwd)=(.*)" # assignment
        pattern_type: single
      - pattern: "hello" # assignment
        pattern_type: multi

`pattern_group`

Aggregating patterns will:

and - bail out on the first mismatch
or - try any of the matches
append - try any of the matches, collect matches from those how matched

`pattern`

Each pattern is of a type (pattern_type):

single - match once per file
multi - match many times per file

Both accept a performant, binary and text traversing regex.

Prematch testers

A prematch tester is a test that runs before applying a more costly matching and detection logic. As an example, it might make sense "bail out" of detection if a file a binary, if it is of class documentation, or if it is too small.

`test_content_prematch`

This meta-tester uses our content classification and inference engine. It is a collection of testers that are useful when deciding if a certain file is worth getting a deep dive into.

By testing for content you can:

Filter out an unexpected binary file
Ensure a non-empty file goes through for further detection
Be able to run on classes of files where Spectral has classified those by their content nature

Content class	Example
`code/infra`	Ruby, Python, etc.
`data/infra`	SQL, JSON, etc.
`binary`	Binary files
`docs`	Markdown, Text, etc.
`tests`	Unit tests, other test code
`examples`	Example code, demo code and others
`vendor`	3rd party code sitting in `node_modules` and others
`files`	A general file class not fitting a single class

Usage

pattern: ".*"
test_content_prematch:
    binary: false
    minlen: 20,
    maxlen: 2000,
    content_classes: 
    - code/infra # our own classfication engine results
    # content_classes_not: 
    # - Code # the inverse of content_class
    content_types:
    - Python # a programmingg language *name* (if you want extension, there's ways for that too)
    content_types_not: 
    # - Ruby # the inverse of content_types

Test positive

<a Python file, size at 20-2000 bytes>

Test negative

<an SQL file, or a small file, or a binary file, etc.>

`test_regex_prematch`

You can test for a specific pre-match structure before Spectral deep dives into further matching.

By testing for Regex prematch you can:

Make sure a certain file structure exists before applying futher testing, such as variable assignments
Verify that a certain 'sentinel' word exists in a large file by applying a cheap word lookup, before applying a more costly matching

Usage

pattern: "pass:(.*)"
test_regex_prematch:
    - on: 0 # on full text
      pattern: "aws\\.amazon\\.com"

Test positive

<large documentation file>
Here is how to connect to our database
1. Log into AWS console (console.aws.amazon.com)
2. Use following details:
DB pass: shazam123

Test negative

<Big file, not containing any mention of AWS detail>

Content testers

`test_fingerprints`

Spectral can create one-way fingerprints for you to use when you need to detect pieces of information you can't reveal.

By using test_fingerprints you can:

Detect credit cards
Find classified or private domains or hosts

First you need to generate your fingerprint. All generation is done locally on your machine using a secure and salted one-way hash:

$ $HOME/.spectral/spectral fingerprint --text <your private text>
< fingerprint >

Then copy the resulting fingerprint

Usage

pattern: "host=([a-zA-Z0-9_-.]+)"
test_fingerprints:
  - on: 1
    with: "<your fingerprint>"
    is: true

Note that by specifying the character class and narrowing it down, we give some
useful information to attackers looking to bruteforce private information. Always be mindful that your character classes and secrets are wide enough.

Test positive

<private host>

Test negative

<any other text>

`test_from_env`

Sometimes you want to grab secrets from your ENV, rather than encode those as fingerprints and still search for them in your code. Spectral supports fetching those from your ENV, and relaying to the detector to use.

By using test_from_env you can:

Detect secrets that you already have in your environment (local machine or CI) without exposing them
Find secrets that you don't want to expose in a persistent way

To test, make sure to export something first

SOME_SECRET_VAR=shazam $HOME/.spectral/spectral scan --nosend

This will work with the following:

Usage

pattern: "host=(.*)"
test_from_env:
  - on: 1
    with: "SOME_SECRET_VAR"
    is: true

Test positive

shazam

Test negative

foobar

`test_luhn`

A Luhn test. The Luhn algorithm is used for checksumming credit card and many forms of SSN numbers such as the SSNs of the US, Canada and Israel.

By testing for Luhn you can:

Ensure a number is a valid credit card number
Verify that a given string match passes as a valid SSN, which helps separate real ones from fake or "test" ones

Usage

pattern: "account=([0-9]+)"
test_luhn:
  - on: 1
    is: true

Test positive

79927398713

Test negative

79927398710

References

Wikipedia

`test_number`

Available from: v1.4.2

Test for an generic representation of a number.

By testing for numbers:

You can rule out a value that is supposed to be a password or a token

Usage

pattern: "key=(.*)"
test_number:
  - on: 1
    is: false

Test positive

Note that by returning false and is: false, test_number actually gives a positive outcome.

key=<random token>

Test negative

key=0.1234

`test_base64`, `test_base64bin`

Verify that a piece of text is a base64 encoded text or binary encoded data. Supports all common variants of encoding (URL safe and others).

By testing for base64 you can:

Ensure that a match is base64 and fail fast in a sequence of tests when you're looking for a token
Validate that a string is in fact base64 encoded given you suspect that it may contain sensitive information

Usage

pattern: "account_encoded='([[:alnum:]/+]+[=]{0,2})'"
test_base64:
  - on: 1
    is: true

Test positive

account_encoded='eyAiYWNjb3VudCI6ICJzZWNyZXQtbnVtYmVyIiB9'

Test negative

account_encoded='replace_me'

The binary variant will first decode the base64 encoded string, and then test whether it is binary or not:

pattern: "account_encoded='([0-9]+)'"
test_base64bin:
  - on: 1
    is: true

`test_binary`

Since Spectral detectors are binary-aware, you have the option to test for binary matches in any capturing expression you need.

By testing for binary data you can:

Flag and avoid matches that are false and contain non-text

Usage

pattern: "token=(.*)"
test_binary:
  - on: 1
    is: false

Test positive

<BINARY DATA>token=<BINARY_DATA>

Test negative

token=48SfRa4idxxUVyPAejafXxwjkreyj8MoJkjV

The binary variant will first decode the base64 encoded string, and then test whether it is binary or not:

pattern: "account_encoded='([0-9]+)'"
test_base64bin:
  - on: 1
    is: true

`test_maxlen`, `test_minlen`

Test for content size for minimum or maximum size.

By testing for content size you can:

Ensure to fail fast for very short strings, or very large content, and skip the match
Validate that on top of the various structural captures that you've done, you end up with a reasonable sized match

Usage

pattern: "account_encoded='(.*)'"
test_minlen:
  - on: 1
    score: 2

Test positive

account_encoded='eyAiYWNjb3VudCI6ICJzZWNyZXQtbnVtYmVyIiB9'

Test negative

account_encoded='XX'

In the same way, we can use maxlen:

pattern: "account_encoded='(.*)'"
test_maxlen:
  - on: 1
    score: 2000

Structural testers

`test_jwt`

A JWT test. A JWT (JSON Web Token) is an Internet proposed standard for creating data with optional signature and/or optional encryption whose payload holds JSON that asserts some number of claims, often used for service-to-service authentication.

By testing for JWT you can:

Make sure the key structure fits a standard JWT
Verify that a certain JWT is semantically valid (header is valid)

Usage

pattern: "token=(\\S+)"
test_jwt:
  - on: 1
    is: true

Test positive

eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJsb2dnZWRJbkFzIjoiYWRtaW4iLCJpYXQiOjE0MjI3Nzk2Mzh9.gzSraSYS8EXBxLN_oWnFSRgCzcmJmMjLiuyu5CSpyHI

Test negative

bad_token

References

JSON Web Token at Wikipedia

`test_uri`

A URI/URL parsing test. A given string is tested to be a valid URI.

By testing for URI you can:

Isolate URLs, which can often be sensitive before applying more matching logic
Detect various kinds of authentication such as Bearer, Basic and more, given a URL request structure (e.g. curl'ing URLs)

Usage

pattern: "curl\\s.*(http.*)"
test_uri:
  - on: 1
    is: true

Test positive

curl -L -o https://dev.acme.corp/secure/credentials.json -H"Authorization: Bearer <token>"

Test negative

sh curl.sh arg1 arg2

`test_tvar`

Test for various template variables, common in configuration and IaC files.

By testing for template variables you can:

Filter out legitimate configuration that was built with proper template variables instead of hardcoded secrets

Usage

pattern: "DB_PASS=(.*)"
test_tvar:
  - on: 1
    is: false

Test positive
Note is: false so a positive outcome is candidate NOT containing a template variable:

DB_PASS=my-secret-password

Test negative

DB_PASS={{.Env.DBPass}}

`test_changeme`

Available from: v1.4.2

Test for various "changeme" values. As engineers, we sometimes indicate a value to be replaced by various commonly-known idioms such as fixme and XXX which we fondly call changeme.

By testing for changeme:

You can filter out mock values, or "TODO: replace this" values.
Use this combined with other testers to create a powerful detector.

Usage

pattern: "DB_PASS=(.*)"
test_changeme:
  - on: 1
    is: false

Test positive
Note is: false so a positive outcome is candidate NOT containing a changeme value:

DB_PASS="<real password>"

Test negative

DB_PASS="XXX"

`test_assignment`

Available from: v1.4.2

Test for an assignment structure.

By testing for assignment:

You can set the scene for detectors which are only interested in one part of an assignment clause
Mix an expected assignment with another tester to create a more powerful detector

Usage

pattern: "DB_PASS(.*)"
test_assignment:
  - on: 0 # on the complete expression
    is: true
test_token:
  - on: 1
    is: true

Test positive

DB_PASS=<random token>

Test negative

DB_PASS, foo, bar

`test_uuid`

Test if a given string is a UUID. Supporting all UUID types (v4, v1, etc.) and formats (with or without hyphens, and with or without a prefix).

By testing for UUID you can:

Ignore suspect strings that are randomly generated but in fact are IDs (database IDs or other).

Usage

pattern: "key=(.*)"
test_uuid:
  - on: 1
    is: false

Test positive
Note is: false so a positive outcome is candidate NOT containing a UUID:

key=my-secret-key

Test negative

key=<UUID representing a DB table primary key>

`test_regex`, `test_regex_not`

A test_regex is a tester that can verify a structural form, after a match have been located. With it, you can verify the match further.

By using test_regex you can:

Apply a clearer set of validations, readable and maintainable
Split verification into stages to pronounce a specific use case:
- Capture something vague (e.g. Bearer (.*))
- Run a semantic tester (e.g. test_token on the token part of the bearer)
- And only then run a structural tester (e.g. "it should look like a curl request.") with test_regex
Apply verification that is beyond a Regex DFA capabilities (e.g. a state machine with more aggressive but performant backtracking can first be achieved by running two separate ones and combining later)

As an array based tester, an AND relation is created between elements, and short-circuiting (failing fast) is applied.

test_regex - all must apply, fail if one does not apply
test_regex_not - all must not apply, fail if one applies

Usage

pattern: "token=(.+)"
test_regex:
    - on: 1
    pattern: "([0-9].*){2}" # the value include at least 2 numbers
    - on: 1
    pattern: "([a-zA-Z].*){2}" # the value include at least 2 letters

Test positive

token=48SfRa4idxxUVyPAejafXxwjkreyj8MoJkjV

Test negative

token=env.get('token')

Usage (test_regex_not)

pattern: "token=(\\S+)"
test_regex_not:
    - on: 1
    pattern: "[$][a-zA-Z0-9_-]+" # the value include valid template variable.
    - on: 1
    pattern: "(?i)(exmaple|test|fake|1234|abcde|xxxx|foobar)" # the value include some word or pattern that can tell that this is just a token placeholder.

Test positive

token=48SfRa4idxxUVyPAejafXxwjkreyj8MoJkjV

Test negative

token=$my_token
token=48SfRa4idxxUVyPAejafXxwjkreyjEXMAPLE
token=testRa4idxxUVyPAejafXxwjkreyj8MoJkjV
token=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
token=test1234abcdefoobarfake

Semantic testers

`test_cword`

Test for the percentage of "common words" in a given string. Based on a unique and massive tech-related common words dictionary model.

By testing for common words:

Rule out non-machine generated keys
Validate that a given match passes as a machine generated secret

Usage

pattern: "pass=(.*)"
test_cword:
  - on: 1
    from: 0.0 # defines a range of accepted percentage
    to: 0.2   # low percentage of common words (up to 20%)

Test positive

zx28821a{_)

Test negative

hello

`test_zx`

Test for password strength based on the popular zxcvbn library.

By testing for zx (abbreviated) you can:

Detect strong passwords amongst fake ones
Apply existing policies because you're already familiar and using zxcvbn for enforcing password strength

Usage

pattern: "pass=(.*)"
test_zx:
  - on: 1
    score: 4.0 # same standard score scale (0-4) from zxcvbn

Test positive

zxHELLOyw{_)

Test negative

foobar

`test_pass`

Test for password strength (own model). Pick a threshold on a scale of 0-100.0. A password with strength > 80 is considered strong.

By using test_pass you can:

Detect strong passwords amongst fake ones, and fine-tune to your desire.

Usage

pattern: "pass=(.*)"
test_pass:
  - on: 1
    score: 80.0 # scale: 0-100

Test positive

zxHELLOyw{_)

Test negative

foobar

`test_token`

Test for tokens, keys, and machine-generated secrets (own model).

By using test_token you can:

Detect real tokens, keys, and secrets
Verify that a machine generated token is actually typically looking secret by model attributes

Usage

    - pattern: "token=(.*)"
      pattern_type: multi 
      test_token:
      - on: 1
        score: 0.6  # True if the score is bigger then 0.6 
                    # max is 1, min is 0

Test positive

token=48SfRa4idxxUVyPAejafXxwjkreyj8MoJkjV

Test negative

token=AnotherVariableOfClientData[0];

`test_entropy`

A normalized entropy test. We do not recommend this test as entropy is a metric not optimized for finding secrets and sensitive information. Spectral has much more advanced tests to use instead, and we still offer entropy just for those who rely on it because of their existing legacy infrastructure and policies.

Usage

    - pattern: "token=(.*)"
      pattern_type: multi 
      test_entropy:
      - on: 1
        score: 4.0  # True if the entropy of the value is bigger then 4
                    # max is 5, min is 0

Test positive

token=48SfRa4idxxUVyPAejafXxwjkreyj8MoJkjV

Test negative

token=G6q5oRa4idxxxxxxxxxxxxxwjkreyj8MoJkjV
token=FooBarFooBarFooBarFooBarFooBarFooBar
token=asdfdsafdsfasdfadsfadsfdasfasdfsdafsdf

Testing your detector

To test, you can selectively include your new detectors by using --just-ids and/or --just-tags. With these you can use any of the common Spectral commands:

If you want to run your new rule on your entire Github org:

$HOME/.spectral/spectral github ... --just-ids PRV001

Alternatively, just to scan your current repo:

$HOME/.spectral/spectral run ... --just-tags acme-security

Submit your detector for review

Feel free to talk to us and send us your detector (please make sure to redact any sensitive information in the detector if exists). We'll help you build it and give you a free detector building session.

Updated about 2 years ago

Query structure

pattern_group

pattern

Prematch testers

test_content_prematch

test_regex_prematch

Content testers

test_fingerprints

test_from_env

test_luhn

test_number

test_base64, test_base64bin

test_binary

test_maxlen, test_minlen

Structural testers

test_jwt

test_uri

test_tvar

test_changeme

test_assignment

test_uuid

test_regex, test_regex_not

Semantic testers

test_cword

test_zx

test_pass

test_token

test_entropy

Testing your detector

Submit your detector for review

`pattern_group`

`pattern`

`test_content_prematch`

`test_regex_prematch`

`test_fingerprints`

`test_from_env`

`test_luhn`

`test_number`

`test_base64`, `test_base64bin`

`test_binary`

`test_maxlen`, `test_minlen`

`test_jwt`

`test_uri`

`test_tvar`

`test_changeme`

`test_assignment`

`test_uuid`

`test_regex`, `test_regex_not`

`test_cword`

`test_zx`

`test_pass`

`test_token`

`test_entropy`