Building Detectors

Here's a basic "Hello world" detector:

rules:
  id: HELO001
  pattern_group:
    aggregate: or
    patterns:
    - pattern: "hello (.*)"
      test_regex:
      - on: 1
        pattern: universe|world
    - pattern: "namaste (.*)"
      test_regex:
      - on: 1
        pattern: bramhaand|vishv

A detector is a combination of one or more building blocks, such as test_regex above, and a workflow building block suchas pattern_group above expressing: logic, patterns, and machine learning and statistical tests (test as in laboratory or statistics rather than as in unit test).

Spectral optimizes the following for you:

Extreme performance. Built with Rust's zero-overhead principles and low-level optimization.
Security. Spectral is built with safe-only code and sandboxes detectors.
Productivity. Spectral detectors are programming language agnostic there's no need to understand Java to scan Java; or any other programming language.
Rapid detector building. No compilation - code and run. Spectral will compile and optimize each detector automatically.
Declarative over programmatic. Declare what you want to find and Spectral will find it.
Automatic fingerprinting & tracking. Spectral will analyze each finding and automatically create a secure irreversible and trackable fingerprint.

Play by play

Detecting sensitive files

We want to build a detector that can detect sensitive files throughout our codebase. These sensitive files may be those that your infrastructure team creates and is private-knowledge, or it can be public-knowledge, such as SSH-related files.

In fact, let's build a detector for SSH-related sensitive files.

$ cd my-project
$ $HOME/.spectral/spectral init

Create a file in: .spectral/rules/my-rules.yaml

rules:
  - id: SENS001
    applies_to:
      - "(?i).*_(rsa|dsa|ed25519|ecdsa)$"
    description: An SSH related sensitive file was found
    name: Sensitive SSH file
    recommendation_template: Please add sensitive files to your .gitignore
    severity: high
    tags:
      - base
      - sensitive-files
    pattern_group:
      aggregate: or
      patterns:
        - pattern: "."
          match_on_path: true
          pattern_type: single

Run a scan (use $HOME/.spectral/spectral run for interactive sessions, and $HOME/.spectral/spectral scan in your CI)

$HOME/.spectral/spectral run --nosend

We use --nosend to not send findings to SpectralOps.

Now you can drop a dummy file just to test things out:

echo 'x' > id_rsa

Run $HOME/.spectral/spectral run again and see your new detection.

What just happened?

1. We looked for the appropriate file names

rules:
  - id: SENS001
    applies_to:
    - "(?i).*_(rsa|dsa|ed25519|ecdsa)$"
    # applies_not_to:
    # - some-file
...

rules: - will indicate to Spectral that this is a detector rules file
SENS001 - this detector ID will appear in SpectralOps, so pick wisely :)
applies_to - will detect against the full file path, you also can use applies_not_to in combination

2. Filled in description and policies

    description: An SSH related sensitive file was found
    name: Sensitive SSH file
    recommendation_template: Please add sensitive files to your .gitignore
    severity: high
    tags:
      - base
      - sensitive-files
...

Picking a base tag will add it to the base ruleset of Spectral, which means it will be ran by default with all other detectors.

3. Built our detector query patterns

    pattern_group:
      aggregate: or
      patterns:
      - pattern: "."
        match_on_path: true
        pattern_type: single

pattern_group - we're setting up a pattern group of 1 element with a logical OR relationship between elements. This means just one pattern. You can add more patterns down the road
match_on_path - rewires the engine to look at the file path as the tested content
pattern_type: single - apply a single attempt at matching (no multiple results in same file here)
pattern: "." - match any character on the path

Find a hardcoded JWT secret in your codebase

JWT (JSON Web Token) usage is growing, especially in architectures where there are many services and different ways to decentralize, or centralize service-to-service authentication. Using JWTs on client-side apps is also growing.

We want to scan our entire repo for stray hardcoded JWTs, in code, docs, and maybe production tokens, maybe test tokens and make sure nothing suspicious is left in our code.

$ cd my-project
$ $HOME/.spectral/spectral init

Create a file in: .spectral/rules/my-rules.yaml

rules:
  - id: SEC001
    description: A JWT (JSON Web Token) has been found to be hardcoded
    name: Sensitive JWT (JSON Web Token)
    recommendation_template: Please remove the hardcoded token, report it to SecOps for rotation, and fix with using .env 
    severity: high
    tags:
    - base
    - secrets
    pattern_group:
      aggregate: or
      patterns:
      - pattern: "=\\s+(.*)" # assignment
        pattern_type: multi
        test_jwt:
        - on: 1
          is: true

Run a scan (use $HOME/.spectral/spectral run for interactive sessions, and $HOME/.spectral/spectral scan in your CI)

$HOME/.spectral/spectral run --nosend

You can now create a dummy JWT token and re-run the scan to view your findings.

What just happened?

1. We filled in description and policies

    description: A JWT (JSON Web Token) has been found to be hardcoded
    name: Sensitive JWT (JSON Web Token)
    recommendation_template: Please remove the hardcoded token, report it to SecOps for rotation, and fix with using .env 
    severity: high
    tags:
      - base
      - secrets
...

Picking a base tag will add it to the base ruleset of Spectral, which means it will be ran by default with all other detectors.

2. Wrote our detector query patterns

    pattern_group:
      aggregate: or
      patterns:
      - pattern: "=\\s+(.*)" # assignment
        pattern_type: multi
        test_jwt:
        - on: 1
          is: true

pattern_group - we're setting up a pattern group of 1 element with a logical OR relationship between elements. This means just one pattern. You can add more patterns down the road
match_on_path - rewires the engine to look at the file path as the tested content
pattern_type: single - apply a single attempt at matching (no multiple results in same file here)
pattern: "." - match any character on the path

Find an actual secret you know of

Some times you may want to ensure your codebase does not contain a well-known, organization-specific secret, such as:

Dev team credit card number
Private network domains
Internal server addresses
Vendor and customer secrets

But there's a challenge, we don't want this sensitive piece of text to appear anywhere, not even in the
detector rule you're building. That's why we're going to use fingerprinting.

$ cd my-project
$ $HOME/.spectral/spectral init
$ $HOME/.spectral/spectral fingerprint --text sekr3t
[fingerprint text]

Create a file in: .spectral/rules/my-rules.yaml

rules:
  - id: PRIV001
    description: A private organization secret is found hardcoded in files
    name: Private secret
    recommendation_template: Please remove the hardcoded secret, report it to SecOps for rotation, and fix with using .env 
    severity: high
    tags:
    - base
    - secrets
    pattern_group:
      aggregate: or
      patterns:
      - pattern: "(:?key|token|secret|password|pwd|passwd)=(.*)" # assignment
        pattern_type: multi
        test_fingerprints:
        - on: 1
          fp: [fingerprint text]
          is: true

Run a scan (use $HOME/.spectral/spectral run for interactive sessions, and $HOME/.spectral/spectral scan in your CI)

$HOME/.spectral/spectral run --nosend

Spectral will now detect your private secret in a secure way - without verbatim specification of that secret anywhere.

What just happened?

1. We created a fingerprint

$HOME/.spectral/spectral fingerprint --text sekr3t

2. Filled in description and policy

    description: A JWT (JSON Web Token) has been found to be hardcoded
    name: Sensitive JWT (JSON Web Token)
    recommendation_template: Please remove the hardcoded token, report it to SecOps for rotation, and fix with using .env 
    severity: high
    tags:
    - base
    - secrets

3. Wrote our detector query patterns

    pattern_group:
      aggregate: or
      patterns:
      - pattern: "(:?key|token|secret|password|pwd|passwd)=(.*)" # assignment
        pattern_type: multi
        test_fingerprints:
        - on: 1
          fp: [fingerprint text]
          is: true

We have two groups, one of them ignored (:? .. ) which is why we keep our test on the first capture group (on: 1)
test_fingerprints - will try any of the listed fingerprints and if one matches will return a result. We expect it to be true with is: true.

Updated over 2 years ago