Examples¶
nyc311 examples now live as self-contained consumer projects under
examples/. There are no repo-level example notebooks, no shared
examples/utils/, and no shared examples/output/.
Contract¶
Every example follows the same structure:
- one semantic-slug folder under
examples/ - one local
pyproject.toml - one local
README.md - one local
.gitignore - one
main.pyentrypoint - local
cache/and ignoredartifacts/directories created on demand - optional tracked
reports/for markdown and report figures that should ship with the example
Each example imports only nyc311.*, so it exercises the package the same way
an external user would. In the repo, that happens through a local editable path
dependency. Outside the repo, the same scripts work after installing nyc311
from PyPI with the listed extras.
The canonical bootstrap starting point for new examples is
examples/example-template/.
Overview¶
| Example | Focus | Default data mode | Extra | Cache | Artifacts | Reports | Status |
|---|---|---|---|---|---|---|---|
examples/quickstart-sdk/ |
first in-memory SDK walkthrough | packaged sample records | base | no | CSV topic summary | markdown tearsheet | implemented |
examples/fetch-filtered-snapshot/ |
filtered Socrata fetch, cache reuse, and fetch metadata | live fetch with local cache reuse | base | yes | snapshot CSV + metadata JSON/MD | optional publishable tearsheet | implemented |
examples/community-district-case-study/ |
Brooklyn case study with topic, volume, and resolution summaries | cache-backed live slice | plotting |
yes | multiple scratch CSV summaries | publish-gated tearsheet + 3 PNGs | implemented |
examples/topic-eda/ |
coverage audit, unmatched descriptors, anomalies, and resolution gaps | cache-backed live slice | dataframes,plotting |
yes | baseline report card + CSV summaries | publish-gated tearsheet + 4 PNGs | implemented |
examples/borough-choropleth/ |
borough-level dominant-topic map | packaged sample records | spatial,plotting |
no | scratch CSV summaries | tearsheet + 3 PNGs | implemented |
examples/spatial-join-qa/ |
canonical spatial join QA over a larger cached live district audit | cache-backed live slice | spatial,plotting |
yes | boundary inventory + join QA CSVs | publish-gated tearsheet + 3 PNGs | implemented |
examples/community-district-choropleth/ |
district-level dominant-topic map with full-layer context | cache-backed live slice | spatial,plotting |
yes | scratch CSV summaries | publish-gated tearsheet + 3 PNGs | implemented |
examples/spatial-topic-comparison/ |
joined-district topic comparison after spatial enrichment | cache-backed live slice | spatial,plotting |
yes | joined topic CSV + preview tables | publish-gated tearsheet + 4 PNGs | implemented |
Data And Cache Strategy¶
The examples follow one default runtime story:
- run in memory whenever packaged sample data is enough
- when a story needs more data, write a cache file inside that example folder
- reuse that local cache on later runs instead of refetching by default
- keep ignored scratch outputs in
artifacts/and tracked markdown/figures inreports/ - for live examples, update tracked report assets only through an explicit publish step
That pattern keeps examples reproducible without reintroducing one shared global dump directory.
Local Repo Usage¶
From any example folder:
uv sync
uv run python main.py
Examples are intentionally not executed in the main CI or test matrix. The package itself remains the tested release surface.
Bootstrap Template¶
When adding a new example, start from examples/example-template/. It captures
the current conventions for:
- uv path-dependency setup
- ignored
cache/andartifacts/ - tracked
reports/andreports/figures/ - explicit relative markdown image paths like
./figures/example-chart.png
Snapshot-First Pattern¶
For larger workflows, fetch once and then iterate against a local snapshot:
nyc311 fetch \
--output local-noise-snapshot.csv \
--complaint-type "Noise - Residential" \
--geography borough \
--geography-value BROOKLYN \
--start-date 2025-01-01 \
--end-date 2025-03-31 \
--page-size 1000 \
--max-pages 6
Then point analysis at the saved file:
nyc311 topics \
--source local-noise-snapshot.csv \
--complaint-type "Noise - Residential" \
--geography community_district \
--output topics.csv
That same pattern is mirrored inside the cache-backed example projects.
Case Studies¶
Longer-form analyses live under examples/case_studies/ (the two precious
research artifacts — real data, cited in CITATION.cff) and
examples/sdid-multi-borough-policy/ + examples/mediation-cascade-resolution/
(synthetic-data factor-factory engine showcases).
All four ship with jellycell tearsheets when the tearsheets extra is
installed; see factor-factory integration.
examples/case_studies/resolution_equity/¶
A longitudinal study of NYC 311 resolution times across 59 community districts
over 60 monthly periods (January 2020 - December 2024). It walks the full
nyc311.stats + nyc311.temporal surface end-to-end:
nyc311.pipeline.bulk_fetch()downloads 5 years of data split per borough with.meta.jsonintegrity sidecarsnyc311.temporal.build_complaint_panel()builds the balanced(community_district x month)panelnyc311.stats.seasonal_decompose()extracts trend, seasonal, and residual components per complaint typenyc311.stats.detect_changepoints()finds structural breaks aligned with COVID-19 lockdown and reopening phasesnyc311.stats.panel_fixed_effects()runs the two-way FE resolution-equity regressionnyc311.stats.global_morans_i()andlocal_morans_i()test for spatial clustering of slow/fast resolution districtsnyc311.factorscomposes the supporting domain metrics
Run it from the case-study directory:
pip install "nyc311[stats,spatial,dataframes]"
cd examples/case_studies/resolution_equity
python run_analysis.py
The numbered scripts (01_fetch_data.py through 06_changepoint_detection.py)
can also be run individually. See FINDINGS.md in that directory for the
written-up results.
examples/case_studies/rat_containerization/¶
Evaluates the 2024 NYC rat containerization mandate using the full causal inference toolkit:
nyc311.stats.synthetic_control()builds a data-driven counterfactual for the first treated community districtnyc311.stats.staggered_did()estimates group-time ATTs across the staggered rollout, correcting for TWFE biasnyc311.stats.event_study()produces event-time coefficients with pre-trend diagnosticsnyc311.stats.regression_discontinuity()estimates the local treatment effect at the policy boundarynyc311.factorscomposes supporting metrics viaSpatialLagFactorandEquityGapFactor
pip install "nyc311[stats,spatial,dataframes,tearsheets]"
cd examples/case_studies/rat_containerization
python run_analysis.py
examples/sdid-multi-borough-policy/¶
Self-contained synthetic-data showcase for factor_factory.engines.sdid
(Arkhangelsky et al. 2021, AER) routed through
PanelDataset.to_factor_factory_panel(). Simulates a synchronized
expanded-311-intake rollout across three treated boroughs (Manhattan, Brooklyn,
Bronx) with two never-treated donor boroughs (Queens, Staten Island) over 36
months.
- no network, no Socrata — runs offline in seconds
- builds a 5-borough × 36-month
PanelDatasetwith oneTreatmentEvent - adapts to
factor_factory.tidy.Paneland fits both TWFE (baseline) and SDID (headline) - emits the five jellycell tearsheets under
manuscripts/
pip install "nyc311[stats,dataframes,tearsheets]"
cd examples/sdid-multi-borough-policy
python run_analysis.py
examples/mediation-cascade-resolution/¶
Self-contained synthetic-data showcase for
factor_factory.engines.mediation.four_way (VanderWeele 2014, Epidemiology).
Exercises the pilot → triage-time → resolution-rate cascade with a synthetic
30-district × 12-month panel.
- decomposes the total effect into CDE, INTref, INTmed, and PIE
- proves the adapter's mediator-column path
(
covariates={"triage_time_days": ...}round-trips throughPanelDataset.to_factor_factory_panel()) - emits the five jellycell tearsheets under
manuscripts/
pip install "nyc311[stats,dataframes,tearsheets]"
cd examples/mediation-cascade-resolution
python run_analysis.py
examples/factor-factory-quickstart/¶
The no-jellycell showcase — minimal
PanelDataset → ff.Panel → engine → pandas in ~50 lines. Exercises the adapter
without installing the tearsheets extra. Good starting point for consumers who
want the causal-inference engine adapter without the reporting machinery.
pip install "nyc311>=1.0,<2" "factor-factory>=1.0.2,<2"
cd examples/factor-factory-quickstart
python main.py