The Detector Engine
Query structure
Detectors are composed of rules, or queries that are compiled into an efficient detector and are ran with the Spectral engine against files.
Each query is a group of patterns, called a pattern_group
and is hierarchical (a pattern group can contain more pattern groups and so on).
A pattern group is a collection of patterns with an aggregate relation (you might some similarities with theories behind Datalog).
pattern_group:
aggregate: or | and | append
patterns:
- pattern: "(:?key|token|secret|password|pwd|passwd)=(.*)" # assignment
pattern_type: single
- pattern: "hello" # assignment
pattern_type: multi
pattern_group
pattern_group
Aggregating patterns will:
and
- bail out on the first mismatchor
- try any of the matchesappend
- try any of the matches, collect matches from those how matched
pattern
pattern
Each pattern is of a type (pattern_type
):
single
- match once per filemulti
- match many times per file
Both accept a performant, binary and text traversing regex.
Prematch testers
A prematch tester is a test that runs before applying a more costly matching and detection logic. As an example, it might make sense "bail out" of detection if a file a binary, if it is of class documentation, or if it is too small.
test_content_prematch
test_content_prematch
This meta-tester uses our content classification and inference engine. It is a collection of testers that are useful when deciding if a certain file is worth getting a deep dive into.
By testing for content you can:
- Filter out an unexpected binary file
- Ensure a non-empty file goes through for further detection
- Be able to run on classes of files where Spectral has classified those by their content nature
Content class | Example |
---|---|
code/infra | Ruby, Python, etc. |
data/infra | SQL, JSON, etc. |
binary | Binary files |
docs | Markdown, Text, etc. |
tests | Unit tests, other test code |
examples | Example code, demo code and others |
vendor | 3rd party code sitting in node_modules and others |
files | A general file class not fitting a single class |
Usage
pattern: ".*"
test_content_prematch:
binary: false
minlen: 20,
maxlen: 2000,
content_classes:
- code/infra # our own classfication engine results
# content_classes_not:
# - Code # the inverse of content_class
content_types:
- Python # a programmingg language *name* (if you want extension, there's ways for that too)
content_types_not:
# - Ruby # the inverse of content_types
Test positive
<a Python file, size at 20-2000 bytes>
Test negative
<an SQL file, or a small file, or a binary file, etc.>
test_regex_prematch
test_regex_prematch
You can test for a specific pre-match structure before Spectral deep dives into further matching.
By testing for Regex prematch you can:
- Make sure a certain file structure exists before applying futher testing, such as variable assignments
- Verify that a certain 'sentinel' word exists in a large file by applying a cheap word lookup, before applying a more costly matching
Usage
pattern: "pass:(.*)"
test_regex_prematch:
- on: 0 # on full text
pattern: "aws\\.amazon\\.com"
Test positive
<large documentation file>
Here is how to connect to our database
1. Log into AWS console (console.aws.amazon.com)
2. Use following details:
DB pass: shazam123
Test negative
<Big file, not containing any mention of AWS detail>
Content testers
test_fingerprints
test_fingerprints
Spectral can create one-way fingerprints for you to use when you need to detect pieces of information you can't reveal.
By using test_fingerprints
you can:
- Detect credit cards
- Find classified or private domains or hosts
First you need to generate your fingerprint. All generation is done locally on your machine using a secure and salted one-way hash:
$ $HOME/.spectral/spectral fingerprint --text <your private text>
< fingerprint >
Then copy the resulting fingerprint
Usage
pattern: "host=([a-zA-Z0-9_-.]+)"
test_fingerprints:
- on: 1
with: "<your fingerprint>"
is: true
Note that by specifying the character class and narrowing it down, we give some
useful information to attackers looking to bruteforce private information. Always be mindful that your character classes and secrets are wide enough.
Test positive
<private host>
Test negative
<any other text>
test_from_env
test_from_env
Sometimes you want to grab secrets from your ENV, rather than encode those as fingerprints and still search for them in your code. Spectral supports fetching those from your ENV, and relaying to the detector to use.
By using test_from_env
you can:
- Detect secrets that you already have in your environment (local machine or CI) without exposing them
- Find secrets that you don't want to expose in a persistent way
To test, make sure to export something first
SOME_SECRET_VAR=shazam $HOME/.spectral/spectral scan --nosend
This will work with the following:
Usage
pattern: "host=(.*)"
test_from_env:
- on: 1
with: "SOME_SECRET_VAR"
is: true
Test positive
shazam
Test negative
foobar
test_luhn
test_luhn
A Luhn test. The Luhn algorithm is used for checksumming credit card and many forms of SSN numbers such as the SSNs of the US, Canada and Israel.
By testing for Luhn you can:
- Ensure a number is a valid credit card number
- Verify that a given string match passes as a valid SSN, which helps separate real ones from fake or "test" ones
Usage
pattern: "account=([0-9]+)"
test_luhn:
- on: 1
is: true
Test positive
79927398713
Test negative
79927398710
References
test_number
test_number
Available from: v1.4.2
Test for an generic representation of a number.
By testing for numbers:
- You can rule out a value that is supposed to be a password or a token
Usage
pattern: "key=(.*)"
test_number:
- on: 1
is: false
Test positive
Note that by returning false
and is: false
, test_number
actually gives a positive outcome.
key=<random token>
Test negative
key=0.1234
test_base64
, test_base64bin
test_base64
, test_base64bin
Verify that a piece of text is a base64 encoded text or binary encoded data. Supports all common variants of encoding (URL safe and others).
By testing for base64 you can:
- Ensure that a match is base64 and fail fast in a sequence of tests when you're looking for a token
- Validate that a string is in fact base64 encoded given you suspect that it may contain sensitive information
Usage
pattern: "account_encoded='([[:alnum:]/+]+[=]{0,2})'"
test_base64:
- on: 1
is: true
Test positive
account_encoded='eyAiYWNjb3VudCI6ICJzZWNyZXQtbnVtYmVyIiB9'
Test negative
account_encoded='replace_me'
The binary variant will first decode the base64 encoded string, and then test whether it is binary or not:
pattern: "account_encoded='([0-9]+)'"
test_base64bin:
- on: 1
is: true
test_binary
test_binary
Since Spectral detectors are binary-aware, you have the option to test for binary matches in any capturing expression you need.
By testing for binary data you can:
- Flag and avoid matches that are false and contain non-text
Usage
pattern: "token=(.*)"
test_binary:
- on: 1
is: false
Test positive
<BINARY DATA>token=<BINARY_DATA>
Test negative
token=48SfRa4idxxUVyPAejafXxwjkreyj8MoJkjV
The binary variant will first decode the base64 encoded string, and then test whether it is binary or not:
pattern: "account_encoded='([0-9]+)'"
test_base64bin:
- on: 1
is: true
test_maxlen
, test_minlen
test_maxlen
, test_minlen
Test for content size for minimum or maximum size.
By testing for content size you can:
- Ensure to fail fast for very short strings, or very large content, and skip the match
- Validate that on top of the various structural captures that you've done, you end up with a reasonable sized match
Usage
pattern: "account_encoded='(.*)'"
test_minlen:
- on: 1
score: 2
Test positive
account_encoded='eyAiYWNjb3VudCI6ICJzZWNyZXQtbnVtYmVyIiB9'
Test negative
account_encoded='XX'
In the same way, we can use maxlen
:
pattern: "account_encoded='(.*)'"
test_maxlen:
- on: 1
score: 2000
Structural testers
test_jwt
test_jwt
A JWT test. A JWT (JSON Web Token) is an Internet proposed standard for creating data with optional signature and/or optional encryption whose payload holds JSON that asserts some number of claims, often used for service-to-service authentication.
By testing for JWT you can:
- Make sure the key structure fits a standard JWT
- Verify that a certain JWT is semantically valid (header is valid)
Usage
pattern: "token=(\\S+)"
test_jwt:
- on: 1
is: true
Test positive
eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJsb2dnZWRJbkFzIjoiYWRtaW4iLCJpYXQiOjE0MjI3Nzk2Mzh9.gzSraSYS8EXBxLN_oWnFSRgCzcmJmMjLiuyu5CSpyHI
Test negative
bad_token
References
test_uri
test_uri
A URI/URL parsing test. A given string is tested to be a valid URI.
By testing for URI you can:
- Isolate URLs, which can often be sensitive before applying more matching logic
- Detect various kinds of authentication such as Bearer, Basic and more, given a URL request structure (e.g.
curl
'ing URLs)
Usage
pattern: "curl\\s.*(http.*)"
test_uri:
- on: 1
is: true
Test positive
curl -L -o https://dev.acme.corp/secure/credentials.json -H"Authorization: Bearer <token>"
Test negative
sh curl.sh arg1 arg2
test_tvar
test_tvar
Test for various template variables, common in configuration and IaC files.
By testing for template variables you can:
- Filter out legitimate configuration that was built with proper template variables instead of hardcoded secrets
Usage
pattern: "DB_PASS=(.*)"
test_tvar:
- on: 1
is: false
Test positive
Note is: false
so a positive outcome is candidate NOT containing a template variable:
DB_PASS=my-secret-password
Test negative
DB_PASS={{.Env.DBPass}}
test_changeme
test_changeme
Available from: v1.4.2
Test for various "changeme" values. As engineers, we sometimes indicate a value to be replaced by various commonly-known idioms such as fixme
and XXX
which we fondly call changeme.
By testing for changeme:
- You can filter out mock values, or "TODO: replace this" values.
- Use this combined with other testers to create a powerful detector.
Usage
pattern: "DB_PASS=(.*)"
test_changeme:
- on: 1
is: false
Test positive
Note is: false
so a positive outcome is candidate NOT containing a changeme value:
DB_PASS="<real password>"
Test negative
DB_PASS="XXX"
test_assignment
test_assignment
Available from: v1.4.2
Test for an assignment structure.
By testing for assignment:
- You can set the scene for detectors which are only interested in one part of an assignment clause
- Mix an expected assignment with another tester to create a more powerful detector
Usage
pattern: "DB_PASS(.*)"
test_assignment:
- on: 0 # on the complete expression
is: true
test_token:
- on: 1
is: true
Test positive
DB_PASS=<random token>
Test negative
DB_PASS, foo, bar
test_uuid
test_uuid
Test if a given string is a UUID. Supporting all UUID types (v4, v1, etc.) and formats (with or without hyphens, and with or without a prefix).
By testing for UUID you can:
- Ignore suspect strings that are randomly generated but in fact are IDs (database IDs or other).
Usage
pattern: "key=(.*)"
test_uuid:
- on: 1
is: false
Test positive
Note is: false
so a positive outcome is candidate NOT containing a UUID:
key=my-secret-key
Test negative
key=<UUID representing a DB table primary key>
test_regex
, test_regex_not
test_regex
, test_regex_not
A test_regex
is a tester that can verify a structural form, after a match have been located. With it, you can verify the match further.
By using test_regex
you can:
- Apply a clearer set of validations, readable and maintainable
- Split verification into stages to pronounce a specific use case:
- Capture something vague (e.g.
Bearer (.*)
) - Run a semantic tester (e.g.
test_token
on the token part of the bearer) - And only then run a structural tester (e.g. "it should look like a
curl
request.") withtest_regex
- Capture something vague (e.g.
- Apply verification that is beyond a Regex DFA capabilities (e.g. a state machine with more aggressive but performant backtracking can first be achieved by running two separate ones and combining later)
As an array based tester, an AND
relation is created between elements, and short-circuiting (failing fast) is applied.
test_regex
- all must apply, fail if one does not applytest_regex_not
- all must not apply, fail if one applies
Usage
pattern: "token=(.+)"
test_regex:
- on: 1
pattern: "([0-9].*){2}" # the value include at least 2 numbers
- on: 1
pattern: "([a-zA-Z].*){2}" # the value include at least 2 letters
Test positive
token=48SfRa4idxxUVyPAejafXxwjkreyj8MoJkjV
Test negative
token=env.get('token')
Usage (test_regex_not
)
pattern: "token=(\\S+)"
test_regex_not:
- on: 1
pattern: "[$][a-zA-Z0-9_-]+" # the value include valid template variable.
- on: 1
pattern: "(?i)(exmaple|test|fake|1234|abcde|xxxx|foobar)" # the value include some word or pattern that can tell that this is just a token placeholder.
Test positive
token=48SfRa4idxxUVyPAejafXxwjkreyj8MoJkjV
Test negative
token=$my_token
token=48SfRa4idxxUVyPAejafXxwjkreyjEXMAPLE
token=testRa4idxxUVyPAejafXxwjkreyj8MoJkjV
token=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
token=test1234abcdefoobarfake
Semantic testers
test_cword
test_cword
Test for the percentage of "common words" in a given string. Based on a unique and massive tech-related common words dictionary model.
By testing for common words:
- Rule out non-machine generated keys
- Validate that a given match passes as a machine generated secret
Usage
pattern: "pass=(.*)"
test_cword:
- on: 1
from: 0.0 # defines a range of accepted percentage
to: 0.2 # low percentage of common words (up to 20%)
Test positive
zx28821a{_)
Test negative
hello
test_zx
test_zx
Test for password strength based on the popular zxcvbn library.
By testing for zx
(abbreviated) you can:
- Detect strong passwords amongst fake ones
- Apply existing policies because you're already familiar and using
zxcvbn
for enforcing password strength
Usage
pattern: "pass=(.*)"
test_zx:
- on: 1
score: 4.0 # same standard score scale (0-4) from zxcvbn
Test positive
zxHELLOyw{_)
Test negative
foobar
test_pass
test_pass
Test for password strength (own model). Pick a threshold on a scale of 0-100.0. A password with strength > 80
is considered strong.
By using test_pass
you can:
- Detect strong passwords amongst fake ones, and fine-tune to your desire.
Usage
pattern: "pass=(.*)"
test_pass:
- on: 1
score: 80.0 # scale: 0-100
Test positive
zxHELLOyw{_)
Test negative
foobar
test_token
test_token
Test for tokens, keys, and machine-generated secrets (own model).
By using test_token
you can:
- Detect real tokens, keys, and secrets
- Verify that a machine generated token is actually typically looking secret by model attributes
Usage
- pattern: "token=(.*)"
pattern_type: multi
test_token:
- on: 1
score: 0.6 # True if the score is bigger then 0.6
# max is 1, min is 0
Test positive
token=48SfRa4idxxUVyPAejafXxwjkreyj8MoJkjV
Test negative
token=AnotherVariableOfClientData[0];
test_entropy
test_entropy
A normalized entropy test. We do not recommend this test as entropy is a metric not optimized for finding secrets and sensitive information. Spectral has much more advanced tests to use instead, and we still offer entropy just for those who rely on it because of their existing legacy infrastructure and policies.
Usage
- pattern: "token=(.*)"
pattern_type: multi
test_entropy:
- on: 1
score: 4.0 # True if the entropy of the value is bigger then 4
# max is 5, min is 0
Test positive
token=48SfRa4idxxUVyPAejafXxwjkreyj8MoJkjV
Test negative
token=G6q5oRa4idxxxxxxxxxxxxxwjkreyj8MoJkjV
token=FooBarFooBarFooBarFooBarFooBarFooBar
token=asdfdsafdsfasdfadsfadsfdasfasdfsdafsdf
Testing your detector
To test, you can selectively include your new detectors by using --just-ids
and/or --just-tags
. With these you can use any of the common Spectral commands:
If you want to run your new rule on your entire Github org:
$HOME/.spectral/spectral github ... --just-ids PRV001
Alternatively, just to scan your current repo:
$HOME/.spectral/spectral run ... --just-tags acme-security
Submit your detector for review
Feel free to talk to us and send us your detector (please make sure to redact any sensitive information in the detector if exists). We'll help you build it and give you a free detector building session.
Updated over 1 year ago