Integration with factor-factory and jellycell¶
Starting in v1.0.0, nyc311 integrates with two upstream packages:
- factor-factory — a
17-engine-family causal-inference framework.
nyc311ships a pair of additive adapters that routePanelDatasets andPipelines into factor-factory engines. - jellycell — a reporting /
tearsheet library.
nyc311's case studies optionally emit jellycell manuscripts alongside the existingFINDINGS.md.
The integration is additive: the existing nyc311.factors.Pipeline,
nyc311.temporal.PanelDataset, and nyc311.stats.* APIs are unchanged.
The two adapters¶
PanelDataset.to_factor_factory_panel()¶
Converts a nyc311 panel into a
factor_factory.tidy.Panel:
from nyc311.temporal import build_complaint_panel, TreatmentEvent
from nyc311.temporal import build_distance_weights, centroids_from_boundaries
panel = build_complaint_panel(
records,
geography="community_district",
freq="ME",
treatment_events=[event],
)
# Optional: carry spatial weights through via df.attrs.
weights = build_distance_weights(
centroids_from_boundaries(boundaries), threshold_meters=2000
)
ff_panel = panel.to_factor_factory_panel(
outcome_col="complaint_count",
spatial_weights=weights,
)
Mapping:
PanelDataset |
factor_factory.tidy.Panel |
|---|---|
unit_id |
First-level MultiIndex, named unit_id |
period ("2024-03" string) |
Second-level MultiIndex, pandas.Timestamp at start |
complaint_count |
Primary outcome column (configurable via outcome_col) |
resolution_rate, median_resolution_days, ... |
Additional columns (available as covariates) |
treatment (bool) |
treatment int column (0/1) |
TreatmentEvent tuples |
PanelMetadata.treatment_events |
unit_type |
PanelMetadata.dimension |
spatial_weights=... |
panel.df.attrs["nyc311_spatial_weights"] |
Recover the weights with nyc311.temporal.spatial_weights_from_panel(panel).
Pipeline.as_factor_factory_estimate()¶
Dispatches to factor_factory.engines.<family>.estimate on a panel:
from nyc311.factors import Pipeline, ComplaintVolumeFactor
pipeline = Pipeline().add(ComplaintVolumeFactor())
ff_panel = dataset.to_factor_factory_panel()
did_results = pipeline.as_factor_factory_estimate(
ff_panel,
family="did",
method="twfe",
outcome="complaint_count",
)
The returned object is a factor-factory <Family>Results. For family="did"
that's DidResults — iterable, with [0].att, [0].se, [0].ci_95, etc.
Supported family values match factor_factory.engines.*: did, sdid,
mediation, rdd, scm, changepoint, stl, panel_reg, inequality,
spatial, reporting_bias, hawkes, survival, event_study, het_te,
dml, climate, diffusion.
Stats-module crosswalk¶
As of v1.0.0, eleven of nyc311.stats's seventeen modules have a factor-factory
equivalent. Their module docstrings now cross-reference the upstream engine with
a .. note:: factor-factory preferred block. The homegrown implementation
remains authoritative for backwards compatibility but will not grow new methods.
nyc311.stats module |
Method | factor-factory |
|---|---|---|
_staggered_did |
Callaway-Sant'Anna / TWFE / Sun-Abraham / BJS | engines.did.{cs,twfe,sa,bjs} |
_synthetic_control |
SCM | engines.scm.{augmented,matrix_completion,pysyncon} |
_rdd |
CCT robust local poly | engines.rdd.rd_robust |
_changepoint |
PELT / binseg | engines.changepoint.ruptures |
_decomposition |
STL | engines.stl.sktime_stl |
_panel_models |
FE / RE | engines.panel_reg.pyfixest |
_equity.theil_index |
Theil T | engines.inequality.theil_t |
_spatial.global_morans_i |
Global Moran's I | engines.spatial.morans_i |
_reporting_bias |
Latent-EM | engines.reporting_bias.latent_em |
_hawkes |
Hawkes self-exciting | engines.hawkes.tick |
_anomaly |
STL residual anomaly | engines.stl.sktime_stl (residual) |
Not covered upstream (nyc311.stats remains authoritative):
_bym2(Bayesian small-area smoothing — PyMC native)_gwr(geographically-weighted regression)_equity.oaxaca_blinder_power(power analysis helper)_spatial_regression(spatial lag / error)_its(segmented-regression ITS; a DiD-like approximation is in factor-factory but not a direct ITS)
See
.claude/skills/stats-module-discipline.md
for the rule: new statistical methods go through factor-factory first; homegrown
only with an explicit RFC.
Tearsheets¶
Installing the optional tearsheets extra pulls in jellycell:
pip install "nyc311[tearsheets]"
With the tearsheets extra, the two precious case studies and the two new ones
emit
factor_factory.jellycell.tearsheets
manuscripts in addition to their native FINDINGS.md output. The tearsheet set
for each case study:
manuscripts/METHODOLOGY.mdmanuscripts/DIAGNOSTICS_CHECKLIST.mdmanuscripts/FINDINGS.mdmanuscripts/MANUSCRIPT.md(stub; human-authored after first run)manuscripts/AUDIT.md
Each case study ships a jellycell.toml alongside its run_analysis.py so
uv run jellycell render works in-place.
Version ranges (v1.0.2)¶
| Package | Pin |
|---|---|
factor-factory |
>=1.0.2,<2 |
jellycell |
>=1.3.5,<2 (via tearsheets extra) |
nyc-geo-toolkit |
>=0.3.0,<0.5 (widened v1.0.2 for upstream v0.4) |
| Python | >=3.12 (dropped 3.10/3.11 in v1.0.0) |
Upstream helpers worth knowing¶
These live in nyc-geo-toolkit and factor-factory but compose naturally with
nyc311.spatial / nyc311.temporal workflows:
nyc_geo_toolkit.centroids_from_boundaries(boundaries, *, representative=False)— v0.4+. Turns any polygonBoundaryCollectioninto a PointBoundaryCollection. For spatial weights, Moran's I / LISA, nearest-neighbour joins, choropleth label placement. Userepresentative=Truefor concave NYC shorelines.
from nyc_geo_toolkit import centroids_from_boundaries, load_nyc_boundaries
from nyc311.temporal import build_distance_weights
cbs = load_nyc_boundaries("community_district")
point_collection = centroids_from_boundaries(cbs, representative=True)
# Flip (lon, lat) GeoJSON order to (lat, lon) for build_distance_weights.
unit_centroids = {
f.geography_value: (f.geometry["coordinates"][1], f.geometry["coordinates"][0])
for f in point_collection.features
}
weights = build_distance_weights(unit_centroids, threshold_meters=2000.0)
nyc311 v1.0.2+ (pin >=0.3.0,<0.5) allows installing the required
nyc-geo-toolkit version. Install with the [spatial] extra upstream for the
shapely dependency.
factor_factory.engines.*.estimate(panel, ...)— the engine families listed above. Call viaPipeline.as_factor_factory_estimate()for the chaining convenience, or import directly.
factor-factory roadmap¶
Follow the
factor-factory roadmap
for upcoming engine families; most new engines become available to
Pipeline.as_factor_factory_estimate automatically the moment the dependency
pin accepts them.