The Detector Engine

Query structure

Detectors are composed of rules, or queries that are compiled into an efficient detector and are ran with the Spectral engine against files.

Each query is a group of patterns, called a pattern_group and is hierarchical (a pattern group can contain more pattern groups and so on).

A pattern group is a collection of patterns with an aggregate relation (you might some similarities with theories behind Datalog).

pattern_group: aggregate: or | and | append patterns: - pattern: "(:?key|token|secret|password|pwd|passwd)=(.*)" # assignment pattern_type: single - pattern: "hello" # assignment pattern_type: multi

pattern_group

Aggregating patterns will:

  • and - bail out on the first mismatch
  • or - try any of the matches
  • append - try any of the matches, collect matches from those how matched

pattern

Each pattern is of a type (pattern_type):

  • single - match once per file
  • multi - match many times per file

Both accept a performant, binary and text traversing regex.

Prematch testers

A prematch tester is a test that runs before applying a more costly matching and detection logic. As an example, it might make sense "bail out" of detection if a file a binary, if it is of class documentation, or if it is too small.


test_content_prematch

This meta-tester uses our content classification and inference engine. It is a collection of testers that are useful when deciding if a certain file is worth getting a deep dive into.

By testing for content you can:

  • Filter out an unexpected binary file
  • Ensure a non-empty file goes through for further detection
  • Be able to run on classes of files where Spectral has classified those by their content nature
Content classExample
code/infraRuby, Python, etc.
data/infraSQL, JSON, etc.
binaryBinary files
docsMarkdown, Text, etc.
testsUnit tests, other test code
examplesExample code, demo code and others
vendor3rd party code sitting in node_modules and others
filesA general file class not fitting a single class

Usage

pattern: ".*" test_content_prematch: binary: false minlen: 20, maxlen: 2000, content_classes: - code/infra # our own classfication engine results # content_classes_not: # - Code # the inverse of content_class content_types: - Python # a programmingg language *name* (if you want extension, there's ways for that too) content_types_not: # - Ruby # the inverse of content_types

Test positive

<a Python file, size at 20-2000 bytes>

Test negative

<an SQL file, or a small file, or a binary file, etc.>

test_regex_prematch

You can test for a specific pre-match structure before Spectral deep dives into further matching.

By testing for Regex prematch you can:

  • Make sure a certain file structure exists before applying futher testing, such as variable assignments
  • Verify that a certain 'sentinel' word exists in a large file by applying a cheap word lookup, before applying a more costly matching

Usage

pattern: "pass:(.*)" test_regex_prematch: - on: 0 # on full text pattern: "aws\\.amazon\\.com"

Test positive

<large documentation file>
Here is how to connect to our database
1. Log into AWS console (console.aws.amazon.com)
2. Use following details:
DB pass: shazam123

Test negative

<Big file, not containing any mention of AWS detail>

Content testers


test_fingerprints

Spectral can create one-way fingerprints for you to use when you need to detect pieces of information you can't reveal.

By using test_fingerprints you can:

  • Detect credit cards
  • Find classified or private domains or hosts

First you need to generate your fingerprint. All generation is done locally on your machine using a secure and salted one-way hash:

$ $HOME/.spectral/spectral fingerprint --text <your private text>
< fingerprint >

Then copy the resulting fingerprint

Usage

pattern: "host=([a-zA-Z0-9_-.]+)" test_fingerprints: - on: 1 with: "<your fingerprint>" is: true
Note that by specifying the character class and narrowing it down, we give some useful information to attackers looking to bruteforce private information. Always be mindful that your character classes and secrets are wide enough.

Test positive

<private host>

Test negative

<any other text>

test_from_env

Sometimes you want to grab secrets from your ENV, rather than encode those as fingerprints and still search for them in your code. Spectral supports fetching those from your ENV, and relaying to the detector to use.

By using test_from_env you can:

  • Detect secrets that you already have in your environment (local machine or CI) without exposing them
  • Find secrets that you don't want to expose in a persistent way

To test, make sure to export something first

SOME_SECRET_VAR=shazam $HOME/.spectral/spectral scan --nosend

This will work with the following:

Usage

pattern: "host=(.*)" test_from_env: - on: 1 with: "SOME_SECRET_VAR" is: true

Test positive

shazam

Test negative

foobar

test_luhn

A Luhn test. The Luhn algorithm is used for checksumming credit card and many forms of SSN numbers such as the SSNs of the US, Canada and Israel.

By testing for Luhn you can:

  • Ensure a number is a valid credit card number
  • Verify that a given string match passes as a valid SSN, which helps separate real ones from fake or "test" ones

Usage

pattern: "account=([0-9]+)" test_luhn: - on: 1 is: true

Test positive

79927398713

Test negative

79927398710

References


test_number

Available from: v1.4.2

Test for an generic representation of a number.

By testing for numbers:

  • You can rule out a value that is supposed to be a password or a token

Usage

pattern: "key=(.*)" test_number: - on: 1 is: false

Test positive

Note that by returning false and is: false, test_number actually gives a positive outcome.

key=<random token>

Test negative

key=0.1234

test_base64, test_base64bin

Verify that a piece of text is a base64 encoded text or binary encoded data. Supports all common variants of encoding (URL safe and others).

By testing for base64 you can:

  • Ensure that a match is base64 and fail fast in a sequence of tests when you're looking for a token
  • Validate that a string is in fact base64 encoded given you suspect that it may contain sensitive information

Usage

pattern: "account_encoded='([[:alnum:]/+]+[=]{0,2})'" test_base64: - on: 1 is: true

Test positive

account_encoded='eyAiYWNjb3VudCI6ICJzZWNyZXQtbnVtYmVyIiB9'

Test negative

account_encoded='replace_me'

The binary variant will first decode the base64 encoded string, and then test whether it is binary or not:

pattern: "account_encoded='([0-9]+)'" test_base64bin: - on: 1 is: true

test_binary

Since Spectral detectors are binary-aware, you have the option to test for binary matches in any capturing expression you need.

By testing for binary data you can:

  • Flag and avoid matches that are false and contain non-text

Usage

pattern: "token=(.*)" test_binary: - on: 1 is: false

Test positive

<BINARY DATA>token=<BINARY_DATA>

Test negative

token=48SfRa4idxxUVyPAejafXxwjkreyj8MoJkjV

The binary variant will first decode the base64 encoded string, and then test whether it is binary or not:

pattern: "account_encoded='([0-9]+)'" test_base64bin: - on: 1 is: true

test_maxlen, test_minlen

Test for content size for minimum or maximum size.

By testing for content size you can:

  • Ensure to fail fast for very short strings, or very large content, and skip the match
  • Validate that on top of the various structural captures that you've done, you end up with a reasonable sized match

Usage

pattern: "account_encoded='(.*)'" test_minlen: - on: 1 score: 2

Test positive

account_encoded='eyAiYWNjb3VudCI6ICJzZWNyZXQtbnVtYmVyIiB9'

Test negative

account_encoded='XX'

In the same way, we can use maxlen:

pattern: "account_encoded='(.*)'" test_maxlen: - on: 1 score: 2000

Structural testers

test_jwt

A JWT test. A JWT (JSON Web Token) is an Internet proposed standard for creating data with optional signature and/or optional encryption whose payload holds JSON that asserts some number of claims, often used for service-to-service authentication.

By testing for JWT you can:

  • Make sure the key structure fits a standard JWT
  • Verify that a certain JWT is semantically valid (header is valid)

Usage

pattern: "token=(\\S+)" test_jwt: - on: 1 is: true

Test positive

eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJsb2dnZWRJbkFzIjoiYWRtaW4iLCJpYXQiOjE0MjI3Nzk2Mzh9.gzSraSYS8EXBxLN_oWnFSRgCzcmJmMjLiuyu5CSpyHI

Test negative

bad_token

References


test_uri

A URI/URL parsing test. A given string is tested to be a valid URI.

By testing for URI you can:

  • Isolate URLs, which can often be sensitive before applying more matching logic
  • Detect various kinds of authentication such as Bearer, Basic and more, given a URL request structure (e.g. curl'ing URLs)

Usage

pattern: "curl\\s.*(http.*)" test_uri: - on: 1 is: true

Test positive

curl -L -o https://dev.acme.corp/secure/credentials.json -H"Authorization: Bearer <token>"

Test negative

sh curl.sh arg1 arg2

test_tvar

Test for various template variables, common in configuration and IaC files.

By testing for template variables you can:

  • Filter out legitimate configuration that was built with proper template variables instead of hardcoded secrets

Usage

pattern: "DB_PASS=(.*)" test_tvar: - on: 1 is: false

Test positive
Note is: false so a positive outcome is candidate NOT containing a template variable:

DB_PASS=my-secret-password

Test negative

DB_PASS={{.Env.DBPass}}

test_changeme

Available from: v1.4.2

Test for various "changeme" values. As engineers, we sometimes indicate a value to be replaced by various commonly-known idioms such as fixme and XXX which we fondly call changeme.

By testing for changeme:

  • You can filter out mock values, or "TODO: replace this" values.
  • Use this combined with other testers to create a powerful detector.

Usage

pattern: "DB_PASS=(.*)" test_changeme: - on: 1 is: false

Test positive
Note is: false so a positive outcome is candidate NOT containing a changeme value:

DB_PASS="<real password>"

Test negative

DB_PASS="XXX"

test_assignment

Available from: v1.4.2

Test for an assignment structure.

By testing for assignment:

  • You can set the scene for detectors which are only interested in one part of an assignment clause
  • Mix an expected assignment with another tester to create a more powerful detector

Usage

pattern: "DB_PASS(.*)" test_assignment: - on: 0 # on the complete expression is: true test_token: - on: 1 is: true

Test positive

DB_PASS=<random token>

Test negative

DB_PASS, foo, bar

test_uuid

Test if a given string is a UUID. Supporting all UUID types (v4, v1, etc.) and formats (with or without hyphens, and with or without a prefix).

By testing for UUID you can:

  • Ignore suspect strings that are randomly generated but in fact are IDs (database IDs or other).

Usage

pattern: "key=(.*)" test_uuid: - on: 1 is: false

Test positive
Note is: false so a positive outcome is candidate NOT containing a UUID:

key=my-secret-key

Test negative

key=<UUID representing a DB table primary key>

test_regex, test_regex_not

A test_regex is a tester that can verify a structural form, after a match have been located. With it, you can verify the match further.

By using test_regex you can:

  • Apply a clearer set of validations, readable and maintainable
  • Split verification into stages to pronounce a specific use case:
    • Capture something vague (e.g. Bearer (.*))
    • Run a semantic tester (e.g. test_token on the token part of the bearer)
    • And only then run a structural tester (e.g. "it should look like a curl request.") with test_regex
  • Apply verification that is beyond a Regex DFA capabilities (e.g. a state machine with more aggressive but performant backtracking can first be achieved by running two separate ones and combining later)

As an array based tester, an AND relation is created between elements, and short-circuiting (failing fast) is applied.

  • test_regex - all must apply, fail if one does not apply
  • test_regex_not - all must not apply, fail if one applies

Usage

pattern: "token=(.+)" test_regex: - on: 1 pattern: "([0-9].*){2}" # the value include at least 2 numbers - on: 1 pattern: "([a-zA-Z].*){2}" # the value include at least 2 letters

Test positive

token=48SfRa4idxxUVyPAejafXxwjkreyj8MoJkjV

Test negative

token=env.get('token')

Usage (test_regex_not)

pattern: "token=(\\S+)" test_regex_not: - on: 1 pattern: "[$][a-zA-Z0-9_-]+" # the value include valid template variable. - on: 1 pattern: "(?i)(exmaple|test|fake|1234|abcde|xxxx|foobar)" # the value include some word or pattern that can tell that this is just a token placeholder.

Test positive

token=48SfRa4idxxUVyPAejafXxwjkreyj8MoJkjV

Test negative

token=$my_token
token=48SfRa4idxxUVyPAejafXxwjkreyjEXMAPLE
token=testRa4idxxUVyPAejafXxwjkreyj8MoJkjV
token=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
token=test1234abcdefoobarfake

Semantic testers

test_cword

Test for the percentage of "common words" in a given string. Based on a unique and massive tech-related common words dictionary model.

By testing for common words:

  • Rule out non-machine generated keys
  • Validate that a given match passes as a machine generated secret

Usage

pattern: "pass=(.*)" test_cword: - on: 1 from: 0.0 # defines a range of accepted percentage to: 0.2 # low percentage of common words (up to 20%)

Test positive

zx28821a{_)

Test negative

hello

test_zx

Test for password strength based on the popular zxcvbn library.

By testing for zx (abbreviated) you can:

  • Detect strong passwords amongst fake ones
  • Apply existing policies because you're already familiar and using zxcvbn for enforcing password strength

Usage

pattern: "pass=(.*)" test_zx: - on: 1 score: 4.0 # same standard score scale (0-4) from zxcvbn

Test positive

zxHELLOyw{_)

Test negative

foobar

test_pass

Test for password strength (own model). Pick a threshold on a scale of 0-100.0. A password with strength > 80 is considered strong.

By using test_pass you can:

  • Detect strong passwords amongst fake ones, and fine-tune to your desire.

Usage

pattern: "pass=(.*)" test_pass: - on: 1 score: 80.0 # scale: 0-100

Test positive

zxHELLOyw{_)

Test negative

foobar

test_token

Test for tokens, keys, and machine-generated secrets (own model).

By using test_token you can:

  • Detect real tokens, keys, and secrets
  • Verify that a machine generated token is actually typically looking secret by model attributes

Usage

- pattern: "token=(.*)" pattern_type: multi test_token: - on: 1 score: 0.6 # True if the score is bigger then 0.6 # max is 1, min is 0

Test positive

token=48SfRa4idxxUVyPAejafXxwjkreyj8MoJkjV

Test negative

token=AnotherVariableOfClientData[0];

test_entropy

A normalized entropy test. We do not recommend this test as entropy is a metric not optimized for finding secrets and sensitive information. Spectral has much more advanced tests to use instead, and we still offer entropy just for those who rely on it because of their existing legacy infrastructure and policies.

Usage

- pattern: "token=(.*)" pattern_type: multi test_entropy: - on: 1 score: 4.0 # True if the entropy of the value is bigger then 4 # max is 5, min is 0

Test positive

token=48SfRa4idxxUVyPAejafXxwjkreyj8MoJkjV

Test negative

token=G6q5oRa4idxxxxxxxxxxxxxwjkreyj8MoJkjV
token=FooBarFooBarFooBarFooBarFooBarFooBar
token=asdfdsafdsfasdfadsfadsfdasfasdfsdafsdf

Testing your detector

To test, you can selectively include your new detectors by using --just-ids and/or --just-tags. With these you can use any of the common Spectral commands:

If you want to run your new rule on your entire Github org:

$HOME/.spectral/spectral github ... --just-ids PRV001

Alternatively, just to scan your current repo:

$HOME/.spectral/spectral run ... --just-tags acme-security

Submit your detector for review

Feel free to talk to us and send us your detector (please make sure to redact any sensitive information in the detector if exists). We'll help you build it and give you a free detector building session.


Did this page help you?