Skip to content

Integration with factor-factory and jellycell

Starting in v1.0.0, nyc311 integrates with two upstream packages:

  • factor-factory — a 17-engine-family causal-inference framework. nyc311 ships a pair of additive adapters that route PanelDatasets and Pipelines into factor-factory engines.
  • jellycell — a reporting / tearsheet library. nyc311's case studies optionally emit jellycell manuscripts alongside the existing FINDINGS.md.

The integration is additive: the existing nyc311.factors.Pipeline, nyc311.temporal.PanelDataset, and nyc311.stats.* APIs are unchanged.

The two adapters

PanelDataset.to_factor_factory_panel()

Converts a nyc311 panel into a factor_factory.tidy.Panel:

from nyc311.temporal import build_complaint_panel, TreatmentEvent
from nyc311.temporal import build_distance_weights, centroids_from_boundaries

panel = build_complaint_panel(
    records,
    geography="community_district",
    freq="ME",
    treatment_events=[event],
)

# Optional: carry spatial weights through via df.attrs.
weights = build_distance_weights(
    centroids_from_boundaries(boundaries), threshold_meters=2000
)

ff_panel = panel.to_factor_factory_panel(
    outcome_col="complaint_count",
    spatial_weights=weights,
)

Mapping:

PanelDataset factor_factory.tidy.Panel
unit_id First-level MultiIndex, named unit_id
period ("2024-03" string) Second-level MultiIndex, pandas.Timestamp at start
complaint_count Primary outcome column (configurable via outcome_col)
resolution_rate, median_resolution_days, ... Additional columns (available as covariates)
treatment (bool) treatment int column (0/1)
TreatmentEvent tuples PanelMetadata.treatment_events
unit_type PanelMetadata.dimension
spatial_weights=... panel.df.attrs["nyc311_spatial_weights"]

Recover the weights with nyc311.temporal.spatial_weights_from_panel(panel).

Pipeline.as_factor_factory_estimate()

Dispatches to factor_factory.engines.<family>.estimate on a panel:

from nyc311.factors import Pipeline, ComplaintVolumeFactor

pipeline = Pipeline().add(ComplaintVolumeFactor())
ff_panel = dataset.to_factor_factory_panel()

did_results = pipeline.as_factor_factory_estimate(
    ff_panel,
    family="did",
    method="twfe",
    outcome="complaint_count",
)

The returned object is a factor-factory <Family>Results. For family="did" that's DidResults — iterable, with [0].att, [0].se, [0].ci_95, etc.

Supported family values match factor_factory.engines.*: did, sdid, mediation, rdd, scm, changepoint, stl, panel_reg, inequality, spatial, reporting_bias, hawkes, survival, event_study, het_te, dml, climate, diffusion.

Stats-module crosswalk

As of v1.0.0, eleven of nyc311.stats's seventeen modules have a factor-factory equivalent. Their module docstrings now cross-reference the upstream engine with a .. note:: factor-factory preferred block. The homegrown implementation remains authoritative for backwards compatibility but will not grow new methods.

nyc311.stats module Method factor-factory
_staggered_did Callaway-Sant'Anna / TWFE / Sun-Abraham / BJS engines.did.{cs,twfe,sa,bjs}
_synthetic_control SCM engines.scm.{augmented,matrix_completion,pysyncon}
_rdd CCT robust local poly engines.rdd.rd_robust
_changepoint PELT / binseg engines.changepoint.ruptures
_decomposition STL engines.stl.sktime_stl
_panel_models FE / RE engines.panel_reg.pyfixest
_equity.theil_index Theil T engines.inequality.theil_t
_spatial.global_morans_i Global Moran's I engines.spatial.morans_i
_reporting_bias Latent-EM engines.reporting_bias.latent_em
_hawkes Hawkes self-exciting engines.hawkes.tick
_anomaly STL residual anomaly engines.stl.sktime_stl (residual)

Not covered upstream (nyc311.stats remains authoritative):

  • _bym2 (Bayesian small-area smoothing — PyMC native)
  • _gwr (geographically-weighted regression)
  • _equity.oaxaca_blinder
  • _power (power analysis helper)
  • _spatial_regression (spatial lag / error)
  • _its (segmented-regression ITS; a DiD-like approximation is in factor-factory but not a direct ITS)

See .claude/skills/stats-module-discipline.md for the rule: new statistical methods go through factor-factory first; homegrown only with an explicit RFC.

Tearsheets

Installing the optional tearsheets extra pulls in jellycell:

pip install "nyc311[tearsheets]"

With the tearsheets extra, the two precious case studies and the two new ones emit factor_factory.jellycell.tearsheets manuscripts in addition to their native FINDINGS.md output. The tearsheet set for each case study:

  • manuscripts/METHODOLOGY.md
  • manuscripts/DIAGNOSTICS_CHECKLIST.md
  • manuscripts/FINDINGS.md
  • manuscripts/MANUSCRIPT.md (stub; human-authored after first run)
  • manuscripts/AUDIT.md

Each case study ships a jellycell.toml alongside its run_analysis.py so uv run jellycell render works in-place.

Version ranges (v1.0.2)

Package Pin
factor-factory >=1.0.2,<2
jellycell >=1.3.5,<2 (via tearsheets extra)
nyc-geo-toolkit >=0.3.0,<0.5 (widened v1.0.2 for upstream v0.4)
Python >=3.12 (dropped 3.10/3.11 in v1.0.0)

Upstream helpers worth knowing

These live in nyc-geo-toolkit and factor-factory but compose naturally with nyc311.spatial / nyc311.temporal workflows:

  • nyc_geo_toolkit.centroids_from_boundaries(boundaries, *, representative=False) — v0.4+. Turns any polygon BoundaryCollection into a Point BoundaryCollection. For spatial weights, Moran's I / LISA, nearest-neighbour joins, choropleth label placement. Use representative=True for concave NYC shorelines.
from nyc_geo_toolkit import centroids_from_boundaries, load_nyc_boundaries
from nyc311.temporal import build_distance_weights

cbs = load_nyc_boundaries("community_district")
point_collection = centroids_from_boundaries(cbs, representative=True)
# Flip (lon, lat) GeoJSON order to (lat, lon) for build_distance_weights.
unit_centroids = {
    f.geography_value: (f.geometry["coordinates"][1], f.geometry["coordinates"][0])
    for f in point_collection.features
}
weights = build_distance_weights(unit_centroids, threshold_meters=2000.0)

nyc311 v1.0.2+ (pin >=0.3.0,<0.5) allows installing the required nyc-geo-toolkit version. Install with the [spatial] extra upstream for the shapely dependency.

  • factor_factory.engines.*.estimate(panel, ...) — the engine families listed above. Call via Pipeline.as_factor_factory_estimate() for the chaining convenience, or import directly.

factor-factory roadmap

Follow the factor-factory roadmap for upcoming engine families; most new engines become available to Pipeline.as_factor_factory_estimate automatically the moment the dependency pin accepts them.