feat: security contacts worker [CM-1297]#4283
Conversation
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
…CM-1243) Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
…1243) Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
There was a problem hiding this comment.
Pull request overview
This PR adds a security-contacts ingestion pipeline to packages_worker. It introduces a daily Temporal cron workflow that selects stale GitHub repos tied to critical packages, fans each repo out across six extractor families (SECURITY-INSIGHTS, GitHub PVR, SECURITY_CONTACTS, security.txt, SECURITY.md, and package-registry manifests), reconciles and scores the discovered contacts, and transactionally persists the top results into a new security_contacts table plus new policy columns on repos. It also wires up a dedicated security-contacts-worker entrypoint (bin, schedule, Docker Compose, build list, env/config), extracts the shared InstallationPool out of the enricher for GitHub App token round-robin, and adds js-yaml for SECURITY-INSIGHTS parsing.
Changes:
- New security-contacts module: extractors, scoring, reconciliation, batch processing, transactional write, schedule, workflow, and activity.
- DB migration adding
security_contactsandrepospolicy/refresh columns; config + env vars for the worker. - Supporting refactors/deps:
InstallationPoolextracted to its own file;js-yamladded; new worker entrypoint, npm scripts, Compose service, and build wiring.
Reviewed changes
Copilot reviewed 35 out of 37 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
.../security-contacts/processBatch.ts |
Orchestrates batch fetch, per-repo extractor fan-out, PVR veto, reconcile, write; defines a local concurrency helper |
.../security-contacts/score.ts |
Pure scoring (tier/channel/freshness/corroboration) and confidence bands |
.../security-contacts/reconcile.ts |
Dedupe, identity-link handle→email, sort, cap to 5 contacts |
.../security-contacts/writeContacts.ts |
Transactional replace of contacts + repo policy column refresh |
.../security-contacts/types.ts |
Shared types for contacts, provenance, policies, extractors |
.../security-contacts/extractors/* |
HTTP helpers, PVR, security.txt/md, SECURITY-INSIGHTS, SECURITY_CONTACTS, registry fetchers |
.../security-contacts/extractors/registry/* |
Per-ecosystem manifest fetchers (npm/pypi/maven/cargo/nuget/rubygems/composer) + purl parser |
.../security-contacts/{workflows,schedule,activities,githubToken}.ts |
Temporal workflow, cron schedule, activity, cached GitHub token pool |
.../bin/security-contacts-worker.ts |
New worker entrypoint (init → schedule → start) |
.../enricher/installationPool.ts & runEnrichmentLoop.ts |
Extracts InstallationPool to its own file and removes the inline copy |
.../src/{config,activities,workflows/index}.ts |
Registers config getter, activity, and workflow |
backend/src/osspckgs/migrations/V1782950400__security_contacts.sql |
New table, indexes, and repos columns |
backend/.env.dist.*, scripts/..., package.json, pnpm-lock.yaml |
Env vars, Compose service, build list, npm scripts, js-yaml dependency |
Files not reviewed (1)
- pnpm-lock.yaml: Generated file
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
|
|
…nstants Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 2 total unresolved issues (including 1 from previous review).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 082f6a5. Configure here.
| const declaredAt: string | undefined = | ||
| typeof root.header?.['last-updated'] === 'string' | ||
| ? root.header['last-updated'] | ||
| : typeof root.header?.['last-reviewed'] === 'string' | ||
| ? root.header['last-reviewed'] | ||
| : undefined |
| function isBlockedHost(h: string): boolean { | ||
| return h === 'localhost' || h === '::1' || h === '0.0.0.0' || h.startsWith('127.') | ||
| } |
| export function getSecurityContactsConfig() { | ||
| return { | ||
| // Sent on all registry calls; crates.io rejects requests without an identifying UA. | ||
| userAgent: requireEnv('SECURITY_CONTACTS_USER_AGENT'), | ||
| } | ||
| } |
| export function scoreContact( | ||
| contact: RawContact, | ||
| now: Date = new Date(), | ||
| ): { score: number; confidence: ConfidenceBand } { |
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>

This pull request introduces support for processing and storing security contacts for repositories, including database schema changes, new worker service setup, configuration, and supporting code. It also updates dependencies and environment files as needed. Below are the most important changes grouped by theme:
Security Contacts Feature Implementation
security_contactsand several related columns to therepostable to store security contact information and metadata.processSecurityContactsBatchactivity and supporting extractor logic to fetch and process security contacts, including HTTP helpers and parsing utilities. [1] [2]security-contacts-workerservice with Docker Compose configuration, build integration, and npm scripts for running and developing the worker. [1] [2] [3]Configuration and Environment
getSecurityContactsConfigfunction. [1] [2] [3]Dependency and Codebase Maintenance
js-yamland its types in both the lockfile andpackage.jsonto support new parsing needs. [1] [2] [3] [4] [5] [6] [7]InstallationPoolclass into its own file to be reused and removes its inline definition from the enricher loop. [1] [2] [3]These changes lay the groundwork for collecting, processing, and storing security contact information for repositories in a scalable and configurable way.
Note
Medium Risk
Large new outbound-ingestion surface (GitHub + registries) that stores contact emails/URLs and mutates packages DB schema; mitigations include SSRF guards, rate limiting, and non-destructive updates on partial failures.
Overview
Adds a security contacts ingestion path for GitHub repos tied to critical packages: new
security_contactsrows (channel, value, role, score, confidence, provenance) plusrepospolicy fields (PVR, policy URLs,contacts_last_refreshed).A new
security-contacts-workerregisters a daily Temporal schedule (0 6 * * *) and runsingestSecurityContacts, which batches repos (daily vs weekly cadence), runs tiered extractors in parallel, then reconciles, scores, and writes results. Sources include SECURITY-INSIGHTS, GitHub PVR,SECURITY_CONTACTS, security.txt (homepage), SECURITY.md, and registry manifests (npm, PyPI, Maven, Cargo, NuGet, RubyGems, Composer). GitHub calls use a shared installation pool with rate-limit handling; registry calls requireSECURITY_CONTACTS_USER_AGENT.Failure behavior is conservative: any extractor failure skips destructive updates and only bumps
contacts_last_refreshed; successful runs replace contacts in a transaction while policy columns use COALESCE so partial runs do not clear prior values.InstallationPoolis extracted from the github enricher for reuse.Deploy wiring adds Docker Compose, build scripts, env samples,
js-yaml, and npm start/dev scripts on the packages worker task queue.Reviewed by Cursor Bugbot for commit 2ece757. Bugbot is set up for automated code reviews on this repo. Configure here.