Architecture¶

nyc311 implements a narrow but end-to-end pipeline for deterministic topic summarization over NYC 311-style complaint data.

This architecture snapshot reflects the current stable 1.x release surface.

Pipeline¶

flowchart LR
    sourceData[SourceData] --> loaders[load_service_requests]
    sourceData --> bulkFetch[bulk_fetch]
    loaders --> records[ServiceRequestRecordList]
    bulkFetch --> records
    records --> extract[extract_topics]
    records --> coverage[analyze_topic_coverage]
    records --> gaps[analyze_resolution_gaps]
    records --> panel[build_complaint_panel]
    records --> factors[Factor Pipeline]
    extract --> assignments[TopicAssignmentList]
    assignments --> aggregate[aggregate_by_geography]
    aggregate --> summaries[GeographyTopicSummaryList]
    summaries --> anomalies[detect_anomalies]
    summaries --> csvExport[export_topic_table]
    summaries --> geojsonPrep[load_boundaries / load_nyc_boundaries]
    geojsonPrep --> geojsonExport[export_geojson]
    summaries --> report[export_report_card]
    gaps --> report
    anomalies --> report
    panel --> panelDS[PanelDataset]
    panelDS --> stats[Stats Module]
    panelDS --> ffAdapter[to_factor_factory_panel]
    factors --> pipeResult[PipelineResult]
    factors --> ffBridge[Pipeline.as_factor_factory_estimate]
    ffAdapter --> ffPanel[factor_factory.tidy.Panel]
    ffPanel --> ffEngines[factor_factory.engines.*]
    ffBridge --> ffEngines
    ffEngines --> ffResults[DidResults / SdidResults / ScmResults / ...]
    ffResults --> jellycell[jellycell tearsheets]
    stats --> statsResults[ITSResult / ChangepointResult / MoranResult / ...]

The two additive bridges at the right of the diagram — to_factor_factory_panel and Pipeline.as_factor_factory_estimate — route nyc311 panels into factor-factory's 17 causal-inference engine families without changing PanelDataset or Pipeline themselves. See factor-factory integration for the crosswalk.

Module Responsibilities¶

Module	Responsibility
`nyc311.models`	Typed dataclasses, constants, configs, and normalization helpers
`nyc311.io`	CSV and Socrata ingestion for service-request records
`nyc311.analysis`	Deterministic topic extraction, coverage, gaps, and anomalies
`nyc311.geographies`	Compatibility layer over `nyc-geo-toolkit` plus 311-specific geography adapters
`nyc311.samples`	Packaged sample records and sample-aligned boundary subsets
`nyc311.export`	CSV, GeoJSON, and markdown artifact generation
`nyc311.dataframes`	Optional pandas conversions for typed nyc311 models
`nyc311.spatial`	Optional geopandas spatial helpers and joins
`nyc311.plotting`	Optional in-memory plotting helpers for packaged boundary layers
`nyc311.presets`	Reusable filter and Socrata config builders for common workflows
`nyc311.pipeline`	High-level SDK helpers that mirror the CLI happy path
`nyc311.factors`	Composable factor pipeline with 9 built-in factors including SpatialLagFactor and EquityGapFactor; `Pipeline.as_factor_factory_estimate()` bridges into `factor_factory.engines.*`
`nyc311.temporal`	Balanced panel datasets, treatment events, inverse-distance spatial weights; `PanelDataset.to_factor_factory_panel()` adapter to `factor_factory.tidy.Panel`
`nyc311.stats`	Statistical modeling: ITS, PELT, STL, panel FE/RE, Moran/LISA, synthetic control, staggered DiD, event study, RDD, spatial lag/error, GWR, Oaxaca-Blinder, Theil, reporting-bias adjustment, BYM2, Hawkes, anomaly detection, power analysis
`nyc311.cli`	Argparse-powered fetch and analysis entry points

Design Principles¶

Keep the implemented surface explicit and namespaced.
Prefer typed inputs and outputs over implicit dictionaries.
Make the SDK composable for scripts, workflows, and interactive analysis.
Expose packaged geography access through a thin compatibility layer over nyc-geo-toolkit.
Keep the CLI thin by delegating real work to importable functions.
Keep optional dependency boundaries explicit for dataframe, spatial, and notebook helpers.
Provide publication-quality statistical methods with clear academic references for every modeling primitive.

Toolkit Relationship¶

nyc311 owns the 311-specific workflow, while nyc-geo-toolkit owns the generic NYC geography assets and normalization rules.

flowchart TB
    toolkit["nyc-geo-toolkit"] --> geographies["nyc311.geographies"]
    geographies --> exports["export_geojson()"]
    geographies --> spatial["nyc311.spatial"]
    geographies --> samples["nyc311.samples"]

That split keeps the package responsibilities clear:

nyc311 owns complaint loading, topic analysis, exports, reports, and package-specific compatibility helpers
nyc-geo-toolkit owns reusable boundary data, canonical geography normalization, and generic boundary loaders
consumer-facing geography helpers in nyc311 stay thin so they can track the stable toolkit contract without duplicating shared implementation

Implemented Scope¶

service-request loading from CSV and Socrata
service-request snapshot export for reproducible local staging
topic extraction for four supported complaint types
topic-coverage analysis for descriptor-rule match rates
aggregation by borough or community district
resolution-gap summaries
anomaly detection over aggregated topic counts
CSV export
boundary-backed GeoJSON export
markdown report-card export
optional pandas dataframe conversion helpers
packaged NYC borough, community-district, council-district, NTA, ZCTA, and census-tract boundary layers
packaged sample service-request and boundary loaders for example workflows
optional in-memory boundary plotting helpers
a one-call SDK pipeline helper
thin CLI fetch and export paths
a namespace-based public API audit script for maintainers
a composable factor pipeline with nine built-in domain factors (including SpatialLagFactor and EquityGapFactor)
a balanced temporal panel builder with treatment-event modeling and inverse-distance spatial weights
a statistics module with interrupted time series, PELT changepoint detection, STL seasonal decomposition, Moran's I / LISA spatial autocorrelation, panel fixed/random-effects regression wrappers, synthetic control, staggered difference-in-differences, event-study plots, regression discontinuity, spatial lag/error models, GWR, Oaxaca-Blinder decomposition, Theil index, reporting-bias adjustment, BYM2 small-area smoothing, Hawkes process, seasonality-adjusted anomaly detection, and power analysis / MDE calculator
a bulk_fetch() per-borough downloader that emits .meta.json integrity sidecars
the resolution-equity case study, which exercises the full nyc311.stats surface against ~1M real records
two additive factor-factory bridges (v1.0.0): PanelDataset.to_factor_factory_panel() and Pipeline.as_factor_factory_estimate() route nyc311 panels into any of factor-factory's 17 causal-inference engine families
two new factor-factory showcase case studies: examples/sdid-multi-borough-policy/ (synthetic DiD via SDID) and examples/mediation-cascade-resolution/ (four-way mediation decomposition)
optional jellycell integration via the tearsheets extra — the four case studies emit manuscripts/*.md tearsheets for publication-ready reporting

Boundaries¶

Boundary-backed exports still expect feature properties with both:

geography
geography_value

nyc311 now consumes canonical packaged boundary layers from nyc-geo-toolkit for:

borough
community_district
council_district
neighborhood_tabulation_area
zcta
census_tract

These packaged layers are the preferred notebook and SDK path. File-backed boundary loading remains available through nyc311.geographies.load_boundaries() for scripts and custom workflows, while the generic boundary assets and normalization logic live in nyc-geo-toolkit.

Maintainer Notes¶

The primary source of truth for public package behavior is the tested code in src/nyc311/ and the user-facing docs in this folder.