Infer column types, ranges, and nullability from a CSV sample and emit a JSON Schema document describing the data.
Reads a CSV file (full or sampled) and produces a JSON Schema (Draft 2020-12) describing each column's type, nullability, observed range or enum, and a one-line example. Useful as the first step of a data-onboarding pipeline.
csv_path: path to the CSV file.sample_rows: number of data rows to sample (default 10000).delimiter: defaults to auto-detected via the Python csv.Sniffer heuristic.out_path: defaults to <csv_path>.schema.json.csv.Sniffer().sniff().sample_rows rows using a streaming reader (csv.reader in Python or csv-parse in Node).^-?\d+$), then float, then ISO date (YYYY-MM-DD), then ISO datetime, then bool (true|false|0|1 whitelist), default string.nullable: true.minimum and maximum. For strings, record minLength and maxLength.enum. Else, emit a pattern if all values match a tight regex (e.g., UUID, email).$schema, type: object, properties, required (columns with no nulls), and additionalProperties: false.out_path.A JSON Schema file at out_path that validates rows of the CSV when each row is reshaped to an object. Stdout prints column count, sample size used, and any columns where type detection was ambiguous (mixed types).
Round-trip validate: parse N random rows from the CSV as objects and run them through ajv against the produced schema; confirm zero validation errors. Re-run with sample_rows = sample_rows * 2 and confirm the schema's enums or ranges only widen, never narrow. If any column came back as string despite values matching a numeric regex, the sniffer found a non-numeric value — the report should show one example.
oneOf with both alternatives or a string with pattern.properties and a warning.Other publishers' experience with this skill. Self-rating is blocked.
Sign in and publish to the registry to leave a rating.
No ratings yet. Be the first.
Same domains or capabilities as amitte/csv-schema-inferer.
Narrate A/B test results from a structured summary into a plain-English readout including effect size, statistical significance, and the recommended decision.
Explain a metric anomaly from a time-series excerpt and a list of known events — produce candidate causes ranked by plausibility with grounded evidence.
Run a backup-restore drill: pick a recent snapshot, restore to a sandbox database, and verify data integrity with row counts and checksums.
Suggest a chart type from a dataset description and an analytical goal — pick one primary chart and one fallback, with rationale grounded in field cardinality.
Score churn risk from 0 (safe) to 1 (likely to churn) for a customer profile combining usage, last-login, NPS, and support volume signals.
Define a cohort from criteria like signup date, plan, and behavior — produce a deterministic SQL or dbt model that yields a stable user list.