feat: security contacts worker [CM-1297]#4283
Conversation
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
…CM-1243) Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
…1243) Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
There was a problem hiding this comment.
Pull request overview
This PR adds a security-contacts ingestion pipeline to packages_worker. It introduces a daily Temporal cron workflow that selects stale GitHub repos tied to critical packages, fans each repo out across six extractor families (SECURITY-INSIGHTS, GitHub PVR, SECURITY_CONTACTS, security.txt, SECURITY.md, and package-registry manifests), reconciles and scores the discovered contacts, and transactionally persists the top results into a new security_contacts table plus new policy columns on repos. It also wires up a dedicated security-contacts-worker entrypoint (bin, schedule, Docker Compose, build list, env/config), extracts the shared InstallationPool out of the enricher for GitHub App token round-robin, and adds js-yaml for SECURITY-INSIGHTS parsing.
Changes:
- New security-contacts module: extractors, scoring, reconciliation, batch processing, transactional write, schedule, workflow, and activity.
- DB migration adding
security_contactsandrepospolicy/refresh columns; config + env vars for the worker. - Supporting refactors/deps:
InstallationPoolextracted to its own file;js-yamladded; new worker entrypoint, npm scripts, Compose service, and build wiring.
Reviewed changes
Copilot reviewed 35 out of 37 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
.../security-contacts/processBatch.ts |
Orchestrates batch fetch, per-repo extractor fan-out, PVR veto, reconcile, write; defines a local concurrency helper |
.../security-contacts/score.ts |
Pure scoring (tier/channel/freshness/corroboration) and confidence bands |
.../security-contacts/reconcile.ts |
Dedupe, identity-link handle→email, sort, cap to 5 contacts |
.../security-contacts/writeContacts.ts |
Transactional replace of contacts + repo policy column refresh |
.../security-contacts/types.ts |
Shared types for contacts, provenance, policies, extractors |
.../security-contacts/extractors/* |
HTTP helpers, PVR, security.txt/md, SECURITY-INSIGHTS, SECURITY_CONTACTS, registry fetchers |
.../security-contacts/extractors/registry/* |
Per-ecosystem manifest fetchers (npm/pypi/maven/cargo/nuget/rubygems/composer) + purl parser |
.../security-contacts/{workflows,schedule,activities,githubToken}.ts |
Temporal workflow, cron schedule, activity, cached GitHub token pool |
.../bin/security-contacts-worker.ts |
New worker entrypoint (init → schedule → start) |
.../enricher/installationPool.ts & runEnrichmentLoop.ts |
Extracts InstallationPool to its own file and removes the inline copy |
.../src/{config,activities,workflows/index}.ts |
Registers config getter, activity, and workflow |
backend/src/osspckgs/migrations/V1782950400__security_contacts.sql |
New table, indexes, and repos columns |
backend/.env.dist.*, scripts/..., package.json, pnpm-lock.yaml |
Env vars, Compose service, build list, npm scripts, js-yaml dependency |
Files not reviewed (1)
- pnpm-lock.yaml: Generated file
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
|
|
…nstants Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
| const declaredAt: string | undefined = | ||
| typeof root.header?.['last-updated'] === 'string' | ||
| ? root.header['last-updated'] | ||
| : typeof root.header?.['last-reviewed'] === 'string' | ||
| ? root.header['last-reviewed'] | ||
| : undefined |
| function isBlockedHost(h: string): boolean { | ||
| return h === 'localhost' || h === '::1' || h === '0.0.0.0' || h.startsWith('127.') | ||
| } |
| export function getSecurityContactsConfig() { | ||
| return { | ||
| // Sent on all registry calls; crates.io rejects requests without an identifying UA. | ||
| userAgent: requireEnv('SECURITY_CONTACTS_USER_AGENT'), | ||
| } | ||
| } |
| export function scoreContact( | ||
| contact: RawContact, | ||
| now: Date = new Date(), | ||
| ): { score: number; confidence: ConfidenceBand } { |
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
| for (const [key, value] of Object.entries(r.value.policies)) { | ||
| if (!(policies as Record<string, unknown>)[key] && value != null) { | ||
| ;(policies as Record<string, unknown>)[key] = value | ||
| } | ||
| } |
| for (const [key, value] of Object.entries(result.policies)) { | ||
| if (!(policies as Record<string, unknown>)[key] && value != null) { | ||
| ;(policies as Record<string, unknown>)[key] = value | ||
| } | ||
| } |
| } catch { | ||
| continue | ||
| } |
| export function getSecurityContactsConfig() { | ||
| return { | ||
| // Sent on all registry calls; crates.io rejects requests without an identifying UA. | ||
| userAgent: requireEnv('SECURITY_CONTACTS_USER_AGENT'), | ||
| } |
| // target.homepage is externally-sourced. Requiring https already blocks the classic SSRF target | ||
| // (cloud-metadata IMDS is http-only); we also reject obvious loopback/localhost. | ||
| function isBlockedHost(h: string): boolean { | ||
| return h === 'localhost' || h === '::1' || h === '0.0.0.0' || h.startsWith('127.') | ||
| } |
| export function parseSecurityTxt( | ||
| text: string, | ||
| sourceUrl: string, | ||
| fetchedAt: string, | ||
| ): ExtractorResult { |
| if (redirected.text) return parseSecurityInsights(redirected.text, redirect, fetchedAt) | ||
| } | ||
|
|
||
| return mapSecurityInsights(doc, path, fetchedAt) |
There was a problem hiding this comment.
Redirect failure maps pointer stub
High Severity
When SECURITY-INSIGHTS.yml declares an allowed project-si-source redirect but fetchText returns no body (404/410/422), the extractor still calls mapSecurityInsights on the local pointer document instead of trying the next path or skipping. A full successful refresh can then replace stored contacts with an empty or incomplete set parsed from the stub.
Reviewed by Cursor Bugbot for commit e8d95b3. Configure here.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 2 total unresolved issues (including 1 from previous review).
Reviewed by Cursor Bugbot for commit b4e91dc. Configure here.
| export const extractPvr: Extractor = async (target, deps) => { | ||
| // GitHub's PVR endpoint rejects archived (and private) repos with 422; skip the call for | ||
| // known-archived repos. Non-archived-yet-unknown repos still get the 422→unknown safety net. | ||
| if (target.archived) return { contacts: [], policies: {} } |
There was a problem hiding this comment.
Archived repos keep stale PVR
Low Severity
Known-archived repos skip the PVR API and emit no pvrEnabled policy, while writeContacts keeps the previous pvr_enabled and reporting URL via COALESCE when PVR was not resolved this run, so archived repositories can still show PVR as enabled after refresh.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit b4e91dc. Configure here.


This pull request introduces support for processing and storing security contacts for repositories, including database schema changes, new worker service setup, configuration, and supporting code. It also updates dependencies and environment files as needed. Below are the most important changes grouped by theme:
Security Contacts Feature Implementation
security_contactsand several related columns to therepostable to store security contact information and metadata.processSecurityContactsBatchactivity and supporting extractor logic to fetch and process security contacts, including HTTP helpers and parsing utilities. [1] [2]security-contacts-workerservice with Docker Compose configuration, build integration, and npm scripts for running and developing the worker. [1] [2] [3]Configuration and Environment
getSecurityContactsConfigfunction. [1] [2] [3]Dependency and Codebase Maintenance
js-yamland its types in both the lockfile andpackage.jsonto support new parsing needs. [1] [2] [3] [4] [5] [6] [7]InstallationPoolclass into its own file to be reused and removes its inline definition from the enricher loop. [1] [2] [3]These changes lay the groundwork for collecting, processing, and storing security contact information for repositories in a scalable and configurable way.
Note
Medium Risk
Large new worker with outbound fetches to GitHub, registries, and repo homepages; mitigations include SSRF allowlists and fail-safe writes, but correctness of stored contacts and GitHub rate limits still matter in production.
Overview
Adds a security-contacts-worker on the packages Temporal queue that periodically enriches GitHub repos tied to critical packages with vulnerability-reporting contacts and policy metadata.
Persistence: New
security_contactstable (scored contacts with JSON provenance) plusreposcolumns for PVR flag, policy/reporting URLs,security.txt, andcontacts_last_refreshed.Ingestion pipeline: Daily cron (
0 6 * * *) runsingestSecurityContacts, which batches repos andcontinueAsNewuntil the sweep is empty. Per repo, tiered extractors pull from GitHub (SECURITY-INSIGHTS, PVR API,SECURITY_CONTACTS, repo files), homepagesecurity.txt(with SSRF guards),SECURITY.md, and package registries (npm, PyPI, Maven, Cargo, NuGet, RubyGems, Composer). Results are merged, scored, capped at five contacts, and written in a transaction; failed extractors skip destructive updates and only advancecontacts_last_refreshed.Infra: Docker Compose service, build list entry,
SECURITY_CONTACTS_USER_AGENTenv,js-yamldependency.InstallationPoolis extracted from the GitHub enricher and reused by a rate-limit-awaregithubApiGet(installation round-robin + concurrency gate).Reviewed by Cursor Bugbot for commit b4e91dc. Bugbot is set up for automated code reviews on this repo. Configure here.