Architecture¶
nyc311 implements a narrow but end-to-end pipeline for deterministic topic
summarization over NYC 311-style complaint data.
This architecture snapshot reflects the current stable 1.x release surface.
Pipeline¶
flowchart LR
sourceData[SourceData] --> loaders[load_service_requests]
sourceData --> bulkFetch[bulk_fetch]
loaders --> records[ServiceRequestRecordList]
bulkFetch --> records
records --> extract[extract_topics]
records --> coverage[analyze_topic_coverage]
records --> gaps[analyze_resolution_gaps]
records --> panel[build_complaint_panel]
records --> factors[Factor Pipeline]
extract --> assignments[TopicAssignmentList]
assignments --> aggregate[aggregate_by_geography]
aggregate --> summaries[GeographyTopicSummaryList]
summaries --> anomalies[detect_anomalies]
summaries --> csvExport[export_topic_table]
summaries --> geojsonPrep[load_boundaries / load_nyc_boundaries]
geojsonPrep --> geojsonExport[export_geojson]
summaries --> report[export_report_card]
gaps --> report
anomalies --> report
panel --> panelDS[PanelDataset]
panelDS --> stats[Stats Module]
panelDS --> ffAdapter[to_factor_factory_panel]
factors --> pipeResult[PipelineResult]
factors --> ffBridge[Pipeline.as_factor_factory_estimate]
ffAdapter --> ffPanel[factor_factory.tidy.Panel]
ffPanel --> ffEngines[factor_factory.engines.*]
ffBridge --> ffEngines
ffEngines --> ffResults[DidResults / SdidResults / ScmResults / ...]
ffResults --> jellycell[jellycell tearsheets]
stats --> statsResults[ITSResult / ChangepointResult / MoranResult / ...]
The two additive bridges at the right of the diagram — to_factor_factory_panel
and Pipeline.as_factor_factory_estimate — route nyc311 panels into
factor-factory's 17
causal-inference engine families without changing PanelDataset or Pipeline
themselves. See factor-factory integration for the crosswalk.
Module Responsibilities¶
| Module | Responsibility |
|---|---|
nyc311.models |
Typed dataclasses, constants, configs, and normalization helpers |
nyc311.io |
CSV and Socrata ingestion for service-request records |
nyc311.analysis |
Deterministic topic extraction, coverage, gaps, and anomalies |
nyc311.geographies |
Compatibility layer over nyc-geo-toolkit plus 311-specific geography adapters |
nyc311.samples |
Packaged sample records and sample-aligned boundary subsets |
nyc311.export |
CSV, GeoJSON, and markdown artifact generation |
nyc311.dataframes |
Optional pandas conversions for typed nyc311 models |
nyc311.spatial |
Optional geopandas spatial helpers and joins |
nyc311.plotting |
Optional in-memory plotting helpers for packaged boundary layers |
nyc311.presets |
Reusable filter and Socrata config builders for common workflows |
nyc311.pipeline |
High-level SDK helpers that mirror the CLI happy path |
nyc311.factors |
Composable factor pipeline with 9 built-in factors including SpatialLagFactor and EquityGapFactor; Pipeline.as_factor_factory_estimate() bridges into factor_factory.engines.* |
nyc311.temporal |
Balanced panel datasets, treatment events, inverse-distance spatial weights; PanelDataset.to_factor_factory_panel() adapter to factor_factory.tidy.Panel |
nyc311.stats |
Statistical modeling: ITS, PELT, STL, panel FE/RE, Moran/LISA, synthetic control, staggered DiD, event study, RDD, spatial lag/error, GWR, Oaxaca-Blinder, Theil, reporting-bias adjustment, BYM2, Hawkes, anomaly detection, power analysis |
nyc311.cli |
Argparse-powered fetch and analysis entry points |
Design Principles¶
- Keep the implemented surface explicit and namespaced.
- Prefer typed inputs and outputs over implicit dictionaries.
- Make the SDK composable for scripts, workflows, and interactive analysis.
- Expose packaged geography access through a thin compatibility layer over
nyc-geo-toolkit. - Keep the CLI thin by delegating real work to importable functions.
- Keep optional dependency boundaries explicit for dataframe, spatial, and notebook helpers.
- Provide publication-quality statistical methods with clear academic references for every modeling primitive.
Toolkit Relationship¶
nyc311 owns the 311-specific workflow, while nyc-geo-toolkit owns the
generic NYC geography assets and normalization rules.
flowchart TB
toolkit["nyc-geo-toolkit"] --> geographies["nyc311.geographies"]
geographies --> exports["export_geojson()"]
geographies --> spatial["nyc311.spatial"]
geographies --> samples["nyc311.samples"]
That split keeps the package responsibilities clear:
nyc311owns complaint loading, topic analysis, exports, reports, and package-specific compatibility helpersnyc-geo-toolkitowns reusable boundary data, canonical geography normalization, and generic boundary loaders- consumer-facing geography helpers in
nyc311stay thin so they can track the stable toolkit contract without duplicating shared implementation
Implemented Scope¶
- service-request loading from CSV and Socrata
- service-request snapshot export for reproducible local staging
- topic extraction for four supported complaint types
- topic-coverage analysis for descriptor-rule match rates
- aggregation by borough or community district
- resolution-gap summaries
- anomaly detection over aggregated topic counts
- CSV export
- boundary-backed GeoJSON export
- markdown report-card export
- optional pandas dataframe conversion helpers
- packaged NYC borough, community-district, council-district, NTA, ZCTA, and census-tract boundary layers
- packaged sample service-request and boundary loaders for example workflows
- optional in-memory boundary plotting helpers
- a one-call SDK pipeline helper
- thin CLI fetch and export paths
- a namespace-based public API audit script for maintainers
- a composable factor pipeline with nine built-in domain factors (including SpatialLagFactor and EquityGapFactor)
- a balanced temporal panel builder with treatment-event modeling and inverse-distance spatial weights
- a statistics module with interrupted time series, PELT changepoint detection, STL seasonal decomposition, Moran's I / LISA spatial autocorrelation, panel fixed/random-effects regression wrappers, synthetic control, staggered difference-in-differences, event-study plots, regression discontinuity, spatial lag/error models, GWR, Oaxaca-Blinder decomposition, Theil index, reporting-bias adjustment, BYM2 small-area smoothing, Hawkes process, seasonality-adjusted anomaly detection, and power analysis / MDE calculator
- a
bulk_fetch()per-borough downloader that emits.meta.jsonintegrity sidecars - the resolution-equity case study, which exercises the full
nyc311.statssurface against ~1M real records - two additive factor-factory bridges (v1.0.0):
PanelDataset.to_factor_factory_panel()andPipeline.as_factor_factory_estimate()route nyc311 panels into any of factor-factory's 17 causal-inference engine families - two new factor-factory showcase case studies:
examples/sdid-multi-borough-policy/(synthetic DiD via SDID) andexamples/mediation-cascade-resolution/(four-way mediation decomposition) - optional jellycell integration via the
tearsheetsextra — the four case studies emitmanuscripts/*.mdtearsheets for publication-ready reporting
Boundaries¶
Boundary-backed exports still expect feature properties with both:
geographygeography_value
nyc311 now consumes canonical packaged boundary layers from nyc-geo-toolkit
for:
boroughcommunity_districtcouncil_districtneighborhood_tabulation_areazctacensus_tract
These packaged layers are the preferred notebook and SDK path. File-backed
boundary loading remains available through
nyc311.geographies.load_boundaries() for scripts and custom workflows, while
the generic boundary assets and normalization logic live in nyc-geo-toolkit.
Maintainer Notes¶
The primary source of truth for public package behavior is the tested code in
src/nyc311/ and the user-facing docs in this folder.