Skip to content

feat: security contacts worker [CM-1297]#4283

Merged
mbani01 merged 21 commits into
mainfrom
feat/security_contacts_worker
Jul 2, 2026
Merged

feat: security contacts worker [CM-1297]#4283
mbani01 merged 21 commits into
mainfrom
feat/security_contacts_worker

Conversation

@mbani01

@mbani01 mbani01 commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

This pull request introduces support for processing and storing security contacts for repositories, including database schema changes, new worker service setup, configuration, and supporting code. It also updates dependencies and environment files as needed. Below are the most important changes grouped by theme:

Security Contacts Feature Implementation

  • Adds a new database table security_contacts and several related columns to the repos table to store security contact information and metadata.
  • Implements the processSecurityContactsBatch activity and supporting extractor logic to fetch and process security contacts, including HTTP helpers and parsing utilities. [1] [2]
  • Adds a new security-contacts-worker service with Docker Compose configuration, build integration, and npm scripts for running and developing the worker. [1] [2] [3]

Configuration and Environment

  • Adds new environment variables for configuring the security contacts worker (interval, user agent, concurrency, timeouts, batch size) in both composed and local environment files, and exposes them through a new getSecurityContactsConfig function. [1] [2] [3]

Dependency and Codebase Maintenance

  • Updates and adds dependencies for js-yaml and its types in both the lockfile and package.json to support new parsing needs. [1] [2] [3] [4] [5] [6] [7]
  • Refactors the InstallationPool class into its own file to be reused and removes its inline definition from the enricher loop. [1] [2] [3]

These changes lay the groundwork for collecting, processing, and storing security contact information for repositories in a scalable and configurable way.


Note

Medium Risk
Large new worker with outbound fetches to GitHub, registries, and repo homepages; mitigations include SSRF allowlists and fail-safe writes, but correctness of stored contacts and GitHub rate limits still matter in production.

Overview
Adds a security-contacts-worker on the packages Temporal queue that periodically enriches GitHub repos tied to critical packages with vulnerability-reporting contacts and policy metadata.

Persistence: New security_contacts table (scored contacts with JSON provenance) plus repos columns for PVR flag, policy/reporting URLs, security.txt, and contacts_last_refreshed.

Ingestion pipeline: Daily cron (0 6 * * *) runs ingestSecurityContacts, which batches repos and continueAsNew until the sweep is empty. Per repo, tiered extractors pull from GitHub (SECURITY-INSIGHTS, PVR API, SECURITY_CONTACTS, repo files), homepage security.txt (with SSRF guards), SECURITY.md, and package registries (npm, PyPI, Maven, Cargo, NuGet, RubyGems, Composer). Results are merged, scored, capped at five contacts, and written in a transaction; failed extractors skip destructive updates and only advance contacts_last_refreshed.

Infra: Docker Compose service, build list entry, SECURITY_CONTACTS_USER_AGENT env, js-yaml dependency. InstallationPool is extracted from the GitHub enricher and reused by a rate-limit-aware githubApiGet (installation round-robin + concurrency gate).

Reviewed by Cursor Bugbot for commit b4e91dc. Bugbot is set up for automated code reviews on this repo. Configure here.

mbani01 added 10 commits June 29, 2026 13:17
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
…CM-1243)

Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
…1243)

Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
@mbani01 mbani01 self-assigned this Jun 30, 2026
Copilot AI review requested due to automatic review settings June 30, 2026 18:04
Comment thread services/apps/packages_worker/src/security-contacts/processBatch.ts

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a security-contacts ingestion pipeline to packages_worker. It introduces a daily Temporal cron workflow that selects stale GitHub repos tied to critical packages, fans each repo out across six extractor families (SECURITY-INSIGHTS, GitHub PVR, SECURITY_CONTACTS, security.txt, SECURITY.md, and package-registry manifests), reconciles and scores the discovered contacts, and transactionally persists the top results into a new security_contacts table plus new policy columns on repos. It also wires up a dedicated security-contacts-worker entrypoint (bin, schedule, Docker Compose, build list, env/config), extracts the shared InstallationPool out of the enricher for GitHub App token round-robin, and adds js-yaml for SECURITY-INSIGHTS parsing.

Changes:

  • New security-contacts module: extractors, scoring, reconciliation, batch processing, transactional write, schedule, workflow, and activity.
  • DB migration adding security_contacts and repos policy/refresh columns; config + env vars for the worker.
  • Supporting refactors/deps: InstallationPool extracted to its own file; js-yaml added; new worker entrypoint, npm scripts, Compose service, and build wiring.

Reviewed changes

Copilot reviewed 35 out of 37 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
.../security-contacts/processBatch.ts Orchestrates batch fetch, per-repo extractor fan-out, PVR veto, reconcile, write; defines a local concurrency helper
.../security-contacts/score.ts Pure scoring (tier/channel/freshness/corroboration) and confidence bands
.../security-contacts/reconcile.ts Dedupe, identity-link handle→email, sort, cap to 5 contacts
.../security-contacts/writeContacts.ts Transactional replace of contacts + repo policy column refresh
.../security-contacts/types.ts Shared types for contacts, provenance, policies, extractors
.../security-contacts/extractors/* HTTP helpers, PVR, security.txt/md, SECURITY-INSIGHTS, SECURITY_CONTACTS, registry fetchers
.../security-contacts/extractors/registry/* Per-ecosystem manifest fetchers (npm/pypi/maven/cargo/nuget/rubygems/composer) + purl parser
.../security-contacts/{workflows,schedule,activities,githubToken}.ts Temporal workflow, cron schedule, activity, cached GitHub token pool
.../bin/security-contacts-worker.ts New worker entrypoint (init → schedule → start)
.../enricher/installationPool.ts & runEnrichmentLoop.ts Extracts InstallationPool to its own file and removes the inline copy
.../src/{config,activities,workflows/index}.ts Registers config getter, activity, and workflow
backend/src/osspckgs/migrations/V1782950400__security_contacts.sql New table, indexes, and repos columns
backend/.env.dist.*, scripts/..., package.json, pnpm-lock.yaml Env vars, Compose service, build list, npm scripts, js-yaml dependency
Files not reviewed (1)
  • pnpm-lock.yaml: Generated file

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread services/apps/packages_worker/src/security-contacts/processBatch.ts Outdated
Comment thread services/apps/packages_worker/src/security-contacts/score.ts
Comment thread services/apps/packages_worker/src/security-contacts/extractors/securityMd.ts Outdated
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
@CLAassistant

Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 35 out of 37 changed files in this pull request and generated 5 comments.

Files not reviewed (1)
  • pnpm-lock.yaml: Generated file

Comment thread services/apps/packages_worker/src/security-contacts/processBatch.ts Outdated
Comment thread services/apps/packages_worker/src/security-contacts/extractors/securityTxt.ts Outdated
Comment thread services/apps/packages_worker/src/security-contacts/reconcile.ts
Comment thread services/apps/packages_worker/src/security-contacts/processBatch.ts Outdated
Comment thread services/apps/packages_worker/src/security-contacts/score.ts
mbani01 added 2 commits July 1, 2026 12:46
…nstants

Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
Comment thread services/apps/packages_worker/src/security-contacts/processBatch.ts
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
Copilot AI review requested due to automatic review settings July 1, 2026 12:34

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 35 out of 37 changed files in this pull request and generated 3 comments.

Files not reviewed (1)
  • pnpm-lock.yaml: Generated file

Comment thread services/apps/packages_worker/src/security-contacts/processBatch.ts
Comment thread services/apps/packages_worker/src/security-contacts/score.ts
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
Comment thread services/apps/packages_worker/src/security-contacts/githubToken.ts
@mbani01 mbani01 requested a review from themarolt July 1, 2026 14:11
Copilot AI review requested due to automatic review settings July 1, 2026 14:16

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 35 out of 37 changed files in this pull request and generated 5 comments.

Files not reviewed (1)
  • pnpm-lock.yaml: Generated file

Comment thread services/apps/packages_worker/src/security-contacts/processBatch.ts
Comment thread services/apps/packages_worker/src/config.ts
Comment thread services/apps/packages_worker/src/security-contacts/score.ts
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
Comment thread services/apps/packages_worker/src/security-contacts/processBatch.ts
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
Copilot AI review requested due to automatic review settings July 1, 2026 14:50
Comment thread services/apps/packages_worker/src/security-contacts/processBatch.ts

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 35 out of 37 changed files in this pull request and generated 4 comments.

Files not reviewed (1)
  • pnpm-lock.yaml: Generated file

Comment on lines +112 to +117
const declaredAt: string | undefined =
typeof root.header?.['last-updated'] === 'string'
? root.header['last-updated']
: typeof root.header?.['last-reviewed'] === 'string'
? root.header['last-reviewed']
: undefined
Comment on lines +14 to +16
function isBlockedHost(h: string): boolean {
return h === 'localhost' || h === '::1' || h === '0.0.0.0' || h.startsWith('127.')
}
Comment on lines +49 to +54
export function getSecurityContactsConfig() {
return {
// Sent on all registry calls; crates.io rejects requests without an identifying UA.
userAgent: requireEnv('SECURITY_CONTACTS_USER_AGENT'),
}
}
Comment on lines +94 to +97
export function scoreContact(
contact: RawContact,
now: Date = new Date(),
): { score: number; confidence: ConfidenceBand } {
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
themarolt
themarolt previously approved these changes Jul 1, 2026
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
Copilot AI review requested due to automatic review settings July 2, 2026 09:49

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 35 out of 37 changed files in this pull request and generated 6 comments.

Files not reviewed (1)
  • pnpm-lock.yaml: Generated file

Comment on lines +140 to +144
for (const [key, value] of Object.entries(r.value.policies)) {
if (!(policies as Record<string, unknown>)[key] && value != null) {
;(policies as Record<string, unknown>)[key] = value
}
}
Comment on lines +47 to +51
for (const [key, value] of Object.entries(result.policies)) {
if (!(policies as Record<string, unknown>)[key] && value != null) {
;(policies as Record<string, unknown>)[key] = value
}
}
Comment on lines +222 to +224
} catch {
continue
}
Comment on lines +49 to +53
export function getSecurityContactsConfig() {
return {
// Sent on all registry calls; crates.io rejects requests without an identifying UA.
userAgent: requireEnv('SECURITY_CONTACTS_USER_AGENT'),
}
Comment on lines +12 to +16
// target.homepage is externally-sourced. Requiring https already blocks the classic SSRF target
// (cloud-metadata IMDS is http-only); we also reject obvious loopback/localhost.
function isBlockedHost(h: string): boolean {
return h === 'localhost' || h === '::1' || h === '0.0.0.0' || h.startsWith('127.')
}
Comment on lines +18 to +22
export function parseSecurityTxt(
text: string,
sourceUrl: string,
fetchedAt: string,
): ExtractorResult {
if (redirected.text) return parseSecurityInsights(redirected.text, redirect, fetchedAt)
}

return mapSecurityInsights(doc, path, fetchedAt)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redirect failure maps pointer stub

High Severity

When SECURITY-INSIGHTS.yml declares an allowed project-si-source redirect but fetchText returns no body (404/410/422), the extractor still calls mapSecurityInsights on the local pointer document instead of trying the next path or skipping. A full successful refresh can then replace stored contacts with an empty or incomplete set parsed from the stub.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit e8d95b3. Configure here.

@mbani01 mbani01 merged commit 17c0bc9 into main Jul 2, 2026
11 checks passed
@mbani01 mbani01 deleted the feat/security_contacts_worker branch July 2, 2026 10:07

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

Fix All in Cursor

Reviewed by Cursor Bugbot for commit b4e91dc. Configure here.

export const extractPvr: Extractor = async (target, deps) => {
// GitHub's PVR endpoint rejects archived (and private) repos with 422; skip the call for
// known-archived repos. Non-archived-yet-unknown repos still get the 422→unknown safety net.
if (target.archived) return { contacts: [], policies: {} }

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Archived repos keep stale PVR

Low Severity

Known-archived repos skip the PVR API and emit no pvrEnabled policy, while writeContacts keeps the previous pvr_enabled and reporting URL via COALESCE when PVR was not resolved this run, so archived repositories can still show PVR as enabled after refresh.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit b4e91dc. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants