API Reference¶
The public API is organized around explicit namespaces rather than a flat root package.
The root nyc311 package is intentionally minimal and only exposes version
metadata. Import functionality from the canonical public modules below.
nyc311.geographies is the one namespace that intentionally fronts another
package: it preserves the 311-facing geography surface while delegating generic
boundary loading and normalization primitives to nyc-geo-toolkit.
Update docstrings and exported symbols in src/nyc311/ rather than editing this
reference structure by hand.
Root Package¶
nyc311 ¶
Minimal root namespace for the nyc311 package.
Models¶
nyc311.models ¶
Public typed models and constants for the nyc311 package.
BOROUGH_STATEN_ISLAND
module-attribute
¶
BOROUGH_STATEN_ISLAND: Final[BoroughName] = 'STATEN ISLAND'
SUPPORTED_BOROUGHS
module-attribute
¶
SUPPORTED_BOROUGHS: Final[tuple[BoroughName, ...]] = (
BOROUGH_BRONX,
BOROUGH_BROOKLYN,
BOROUGH_MANHATTAN,
BOROUGH_QUEENS,
BOROUGH_STATEN_ISLAND,
)
SUPPORTED_BOUNDARY_GEOGRAPHIES
module-attribute
¶
SUPPORTED_BOUNDARY_GEOGRAPHIES: Final[tuple[str, ...]] = (
"borough",
"community_district",
"council_district",
"neighborhood_tabulation_area",
"census_tract",
"zcta",
)
SUPPORTED_GEOGRAPHIES
module-attribute
¶
SUPPORTED_GEOGRAPHIES: Final[tuple[str, ...]] = (
SUPPORTED_RECORD_GEOGRAPHIES
)
SUPPORTED_RECORD_GEOGRAPHIES
module-attribute
¶
SUPPORTED_RECORD_GEOGRAPHIES: Final[tuple[str, ...]] = (
"borough",
"community_district",
)
AnalysisWindow
dataclass
¶
Rolling time window used for trend and anomaly calculations.
Source code in src/nyc311/models/_analysis.py
12 13 14 15 16 17 18 19 20 | |
AnomalyResult
dataclass
¶
A standardized anomaly score for one aggregated topic summary.
Source code in src/nyc311/models/_analysis.py
182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 | |
ExportTarget
dataclass
¶
Destination metadata for supported exporters.
Source code in src/nyc311/models/_analysis.py
70 71 72 73 74 75 76 77 78 79 80 81 82 | |
GeographyTopicSummary
dataclass
¶
An export-ready summary row for topic counts within one geography.
Source code in src/nyc311/models/_analysis.py
85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 | |
ResolutionGapSummary
dataclass
¶
A first-pass borough-level summary of unresolved complaint volume.
Source code in src/nyc311/models/_analysis.py
132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 | |
TopicCoverageReport
dataclass
¶
Coverage metadata that shows how much a topic ruleset matched.
Source code in src/nyc311/models/_analysis.py
39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 | |
top_unmatched_descriptors
instance-attribute
¶
top_unmatched_descriptors: tuple[tuple[str, int], ...]
TopicQuery
dataclass
¶
Topic-analysis parameters for the implemented rules-based workflow.
Source code in src/nyc311/models/_analysis.py
23 24 25 26 27 28 29 30 31 32 33 34 35 36 | |
BoundaryGeoJSONExport
dataclass
¶
Combined boundary + summary payload for GeoJSON export.
Source code in src/nyc311/models/_boundaries.py
22 23 24 25 26 27 | |
GeographyFilter
dataclass
¶
A supported geography selector for implemented loading filters.
Source code in src/nyc311/models/_filters.py
13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 | |
ServiceRequestFilter
dataclass
¶
Filters for CSV and Socrata service-request loading.
Source code in src/nyc311/models/_filters.py
41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 | |
SocrataConfig
dataclass
¶
Configuration for the implemented live Socrata loader path.
extra_where_clauses holds additional $where fragments (Socrata SoQL) that
are AND-joined after the predicates derived from :class:ServiceRequestFilter.
Use for predicates not covered by the filter (e.g. latitude IS NOT NULL).
Values are stripped; empty strings are dropped.
Source code in src/nyc311/models/_filters.py
62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 | |
dataset_identifier
class-attribute
instance-attribute
¶
dataset_identifier: str = SOCRATA_DATASET_IDENTIFIER
base_url
class-attribute
instance-attribute
¶
base_url: str = 'https://data.cityofnewyork.us/resource'
created_date_sort
class-attribute
instance-attribute
¶
created_date_sort: Literal['asc', 'desc'] = 'asc'
extra_where_clauses
class-attribute
instance-attribute
¶
extra_where_clauses: tuple[str, ...] = field(
default_factory=tuple
)
ServiceRequestRecord
dataclass
¶
A single loaded NYC 311-style service-request record.
.. note::
As of nyc311 v1.0.1, ``closed_date`` is carried alongside
``created_date`` so resolution-time analyses don't have to
bypass the SDK. The field is optional — Socrata returns a
null ``closed_date`` for any unresolved complaint — and
existing call sites that instantiate the record without it
keep working unchanged.
Source code in src/nyc311/models/_records.py
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 | |
resolution_description
class-attribute
instance-attribute
¶
resolution_description: str | None = None
geography_value ¶
geography_value(geography: str) -> str
Return the value for a supported geography key.
Source code in src/nyc311/models/_records.py
89 90 91 92 93 94 95 96 97 98 99 100 | |
TopicAssignment
dataclass
¶
A deterministic topic label derived from one service-request record.
Source code in src/nyc311/models/_records.py
103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 | |
supported_topic_queries ¶
supported_topic_queries() -> tuple[str, ...]
Return the complaint types with implemented topic extraction.
Source code in src/nyc311/models/_constants.py
63 64 65 | |
normalize_borough_name ¶
normalize_borough_name(value: str) -> str
Normalize a borough name or borough alias to the canonical public constant.
Source code in src/nyc311/models/_normalize.py
109 110 111 112 113 114 115 116 117 | |
IO¶
nyc311.io ¶
Public loading helpers for service-request data.
REQUIRED_SERVICE_REQUEST_COLUMNS
module-attribute
¶
REQUIRED_SERVICE_REQUEST_COLUMNS: Final[tuple[str, ...]] = (
SERVICE_REQUEST_CSV_COLUMNS
)
cache_path_for_request ¶
cache_path_for_request(
socrata_config: SocrataConfig,
filters: ServiceRequestFilter,
cache_dir: Path,
) -> Path
Return the deterministic CSV path for a Socrata config + filter pair.
Source code in src/nyc311/io/_cache.py
34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 | |
cached_fetch ¶
cached_fetch(
socrata_config: SocrataConfig,
filters: ServiceRequestFilter,
*,
cache_dir: Path,
refresh: bool = False,
request_open: Callable[..., Any] | None = None,
max_records: int | None = None,
on_page: Callable[[int, int], None] | None = None,
) -> Path
Stream a Socrata query to a CSV file under cache_dir; return the path.
Skips the network fetch when the file already exists and refresh is False.
Rows are filtered with the same rules as :func:load_service_requests_from_socrata.
For multi-gigabyte extracts, prefer this function and analyze with chunked
pandas.read_csv instead of loading via :func:load_service_requests, which
materializes rows in memory.
Optional on_page is forwarded to :func:nyc311.io.iter_service_requests_from_socrata
for per-HTTP-page progress (page index and row count for that page).
Source code in src/nyc311/io/_cache.py
84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 | |
load_service_requests_from_csv ¶
load_service_requests_from_csv(
source: str | Path, *, filters: ServiceRequestFilter
) -> list[ServiceRequestRecord]
Load and filter service-request records from a local CSV snapshot.
Source code in src/nyc311/io/_csv.py
101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 | |
load_resolution_data ¶
load_resolution_data(
source: str | Path | SocrataConfig,
*,
filters: ServiceRequestFilter | None = None,
cache_dir: Path | str | None = None,
refresh: bool = False,
max_cached_records: int | None = None,
) -> list[ServiceRequestRecord]
Load the subset of service requests that already include resolution text.
Source code in src/nyc311/io/_service_requests.py
54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 | |
load_service_requests ¶
load_service_requests(
source: str | Path | SocrataConfig,
*,
filters: ServiceRequestFilter | None = None,
cache_dir: Path | str | None = None,
refresh: bool = False,
max_cached_records: int | None = None,
) -> list[ServiceRequestRecord]
Load filtered NYC 311-style service-request records from CSV or Socrata.
When source is a :class:~nyc311.models.SocrataConfig and cache_dir
is set, the live API response is streamed to a deterministic CSV under
cache_dir (see :func:cached_fetch), then loaded from disk. Very large
extracts should use :func:cached_fetch with chunked pandas analysis instead
of this helper, which returns an in-memory list.
Source code in src/nyc311/io/_service_requests.py
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 | |
iter_service_requests_from_socrata ¶
iter_service_requests_from_socrata(
socrata_config: SocrataConfig,
*,
filters: ServiceRequestFilter,
request_open: Callable[..., Any],
on_page: Callable[[int, int], None] | None = None,
) -> Iterator[ServiceRequestRecord]
Yield service-request records from Socrata without holding all pages in memory.
on_page is invoked after each successful HTTP response with
(page_index, row_count_in_page) (0-based page index).
Source code in src/nyc311/io/_socrata.py
164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 | |
load_service_requests_from_socrata ¶
load_service_requests_from_socrata(
socrata_config: SocrataConfig,
*,
filters: ServiceRequestFilter,
request_open: Callable[..., Any],
) -> list[ServiceRequestRecord]
Load and filter service-request records from the live Socrata API.
Source code in src/nyc311/io/_socrata.py
224 225 226 227 228 229 230 231 232 233 234 235 236 | |
Analysis¶
nyc311.analysis ¶
Public analysis helpers for nyc311 complaint workflows.
DEFAULT_TOPIC_RULES
module-attribute
¶
DEFAULT_TOPIC_RULES: Final[dict[str, TopicRuleSet]] = {
"Noise - Residential": (
(
"party_music",
(
"party",
"music",
"speakers",
"stereo",
"bass",
"television",
),
),
(
"construction",
("construction", "drilling", "jackhammer"),
),
("pet_noise", ("dog", "barking", "pet")),
(
"banging",
(
"banging",
"thumping",
"shaking",
"arguing",
"hammering",
),
),
),
"Illegal Parking": (
("hydrant_blocking", ("hydrant", "fire hydrant")),
("crosswalk_blocking", ("crosswalk",)),
("bus_stop_blocking", ("bus stop",)),
(
"double_parked",
(
"double parked",
"double parking",
"double parked",
),
),
),
"Blocked Driveway": (
(
"commercial_driveway",
("commercial van", "delivery truck", "truck"),
),
("overnight_blocking", ("overnight", "all night")),
(
"residential_driveway",
("residential driveway", "driveway", "garage"),
),
),
"Rodent": (
(
"extermination_request",
(
"exterminator",
"extermination",
"infestation",
),
),
("rats_seen", ("rats", "rat", "trash bags")),
("mouse_condition", ("mouse", "mice", "droppings")),
),
"HEAT/HOT WATER": (
(
"no_heat",
(
"no heat",
"without heat",
"radiator cold",
"heat not working",
),
),
(
"no_hot_water",
(
"no hot water",
"without hot water",
"hot water not working",
),
),
(
"intermittent_heat",
(
"intermittent heat",
"heat comes and goes",
"heat inconsistent",
),
),
),
"Street Condition": (
("pothole", ("pothole", "potholes")),
(
"cave_in",
(
"cave in",
"cave-in",
"sinkhole",
"collapsed roadway",
),
),
(
"rough_road",
(
"uneven",
"rough road",
"broken asphalt",
"road surface",
),
),
),
"Noise - Street/Sidewalk": (
(
"construction",
("construction", "drilling", "jackhammer"),
),
(
"loud_vehicle",
(
"car alarm",
"engine idling",
"horn",
"vehicle",
"muffler",
),
),
(
"bar_noise",
(
"bar",
"club",
"restaurant",
"patrons",
"crowd",
),
),
),
"UNSANITARY CONDITION": (
(
"garbage",
("garbage", "trash", "refuse", "debris"),
),
(
"sewage",
("sewage", "feces", "human waste", "overflow"),
),
(
"pest_waste",
(
"rodent",
"rat",
"mouse",
"droppings",
"animal waste",
),
),
),
"Abandoned Vehicle": (
(
"derelict_vehicle",
(
"abandoned",
"derelict",
"stripped",
"wrecked",
),
),
(
"unlicensed_vehicle",
(
"no plate",
"no registration",
"expired registration",
),
),
),
}
aggregate_by_geography ¶
aggregate_by_geography(
topic_assignments: list[TopicAssignment], geography: str
) -> list[GeographyTopicSummary]
Aggregate deterministic topic assignments into supported geographies.
Source code in src/nyc311/analysis/_aggregation.py
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 | |
detect_anomalies ¶
detect_anomalies(
aggregated_data: list[GeographyTopicSummary],
window: AnalysisWindow,
*,
z_threshold: float = 2.0,
) -> list[AnomalyResult]
Score unusually high or low aggregated topic counts via z-scores.
Source code in src/nyc311/analysis/_anomalies.py
13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 | |
analyze_topic_coverage ¶
analyze_topic_coverage(
service_requests: list[ServiceRequestRecord],
query: TopicQuery,
*,
custom_rules: TopicRuleSet | None = None,
top_unmatched_n: int = 10,
) -> TopicCoverageReport
Report how much a topic configuration matched versus falling into other.
Source code in src/nyc311/analysis/_coverage.py
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 | |
analyze_resolution_gaps ¶
analyze_resolution_gaps(
service_requests: list[ServiceRequestRecord],
resolution_data: list[ServiceRequestRecord],
) -> list[ResolutionGapSummary]
Summarize unresolved complaint share by borough and complaint type.
Source code in src/nyc311/analysis/_resolution.py
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 | |
extract_topics ¶
extract_topics(
service_requests: list[ServiceRequestRecord],
query: TopicQuery,
*,
custom_rules: TopicRuleSet | None = None,
) -> list[TopicAssignment]
Extract deterministic first-pass topics for one complaint type.
Source code in src/nyc311/analysis/_topics.py
210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 | |
register_topic_rules ¶
register_topic_rules(
complaint_type: str, rules: TopicRuleSet
) -> None
Register or replace topic rules for one complaint type.
Source code in src/nyc311/analysis/_topics.py
116 117 118 119 120 121 | |
Geographies¶
nyc311.geographies ¶
Public access to packaged NYC geography layers and boundary helpers.
boundaries_to_dataframe ¶
boundaries_to_dataframe(
boundaries: BoundaryCollection,
) -> pd.DataFrame
Convert a typed boundary collection into a DataFrame.
Source code in src/nyc311/geographies/_conversions.py
25 26 27 28 29 30 31 32 33 34 | |
boundaries_to_geojson ¶
boundaries_to_geojson(
boundaries: BoundaryCollection,
) -> dict[str, object]
Convert a typed boundary collection into a GeoJSON FeatureCollection.
Source code in src/nyc311/geographies/_conversions.py
20 21 22 | |
list_boundary_layers ¶
list_boundary_layers() -> tuple[str, ...]
List the packaged NYC boundary layers shipped with nyc311.
Source code in src/nyc311/geographies/_loaders.py
69 70 71 | |
list_boundary_values ¶
list_boundary_values(layer: str) -> tuple[str, ...]
List the canonical values available for one packaged boundary layer.
Source code in src/nyc311/geographies/_loaders.py
74 75 76 | |
load_boundaries ¶
load_boundaries(source: str | Path) -> BoundaryCollection
Load boundaries from a file path or a packaged NYC boundary layer.
Source code in src/nyc311/geographies/_loaders.py
88 89 90 | |
load_nyc_boundaries ¶
load_nyc_boundaries(
layer: str = "community_district",
*,
values: str | tuple[str, ...] | list[str] | None = None,
) -> BoundaryCollection
Load a packaged NYC boundary layer as typed boundary models.
Source code in src/nyc311/geographies/_loaders.py
79 80 81 82 83 84 85 | |
load_nyc_boundaries_geodataframe ¶
load_nyc_boundaries_geodataframe(
layer: str = "community_district",
*,
values: str | tuple[str, ...] | list[str] | None = None,
) -> gpd.GeoDataFrame
Load a packaged NYC boundary layer directly into a GeoDataFrame.
Source code in src/nyc311/geographies/_loaders.py
93 94 95 96 97 98 99 | |
load_nyc_census_tracts ¶
load_nyc_census_tracts(
*,
values: str | tuple[str, ...] | list[str] | None = None,
) -> BoundaryCollection
Load the packaged NYC census-tract layer.
Source code in src/nyc311/geographies/_loaders.py
102 103 104 105 106 107 | |
load_nyc_council_districts ¶
load_nyc_council_districts(
*,
values: str | tuple[str, ...] | list[str] | None = None,
) -> BoundaryCollection
Load the packaged NYC city-council-district layer.
Source code in src/nyc311/geographies/_loaders.py
118 119 120 121 122 123 | |
load_nyc_neighborhood_tabulation_areas ¶
load_nyc_neighborhood_tabulation_areas(
*,
values: str | tuple[str, ...] | list[str] | None = None,
) -> BoundaryCollection
Load the packaged NYC neighborhood-tabulation-area layer.
Source code in src/nyc311/geographies/_loaders.py
110 111 112 113 114 115 | |
clip_boundaries_to_bbox ¶
clip_boundaries_to_bbox(
boundaries: BoundaryCollection,
*,
min_longitude: float,
min_latitude: float,
max_longitude: float,
max_latitude: float,
) -> BoundaryCollection
Clip boundary geometries to a longitude/latitude bounding box.
Source code in src/nyc311/geographies/_ops.py
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | |
spatially_enrich_records ¶
spatially_enrich_records(
records: list[ServiceRequestRecord],
*,
layer: str = "community_district",
boundaries: BoundaryCollection | None = None,
) -> gpd.GeoDataFrame
Attach packaged boundary attributes to point-capable service requests.
Source code in src/nyc311/geographies/_ops.py
36 37 38 39 40 41 42 43 44 45 46 47 | |
Samples¶
nyc311.samples ¶
Packaged sample data helpers for nyc311 examples and tests.
load_sample_boundaries ¶
load_sample_boundaries(
layer: str = "community_district",
) -> BoundaryCollection
Load the subset of packaged boundaries that overlaps the sample records.
Source code in src/nyc311/samples/_loaders.py
27 28 29 30 31 32 33 34 35 36 37 | |
load_sample_service_requests ¶
load_sample_service_requests(
*, filters: ServiceRequestFilter | None = None
) -> list[ServiceRequestRecord]
Load the packaged sample NYC 311 service-request slice.
Source code in src/nyc311/samples/_loaders.py
15 16 17 18 19 20 21 22 23 24 | |
Export¶
nyc311.export ¶
Public export helpers for nyc311 outputs.
export_anomalies ¶
export_anomalies(
data: list[AnomalyResult], target: ExportTarget
) -> Path
Export anomaly detections to a CSV file.
Source code in src/nyc311/export/_csv.py
96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 | |
export_service_requests_csv ¶
export_service_requests_csv(
data: list[ServiceRequestRecord], target: ExportTarget
) -> Path
Export loaded service-request records to a reproducible CSV snapshot.
Source code in src/nyc311/export/_csv.py
56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 | |
export_topic_table ¶
export_topic_table(
data: list[GeographyTopicSummary], target: ExportTarget
) -> Path
Export geography-topic summaries to a CSV file.
Source code in src/nyc311/export/_csv.py
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 | |
export_geojson ¶
export_geojson(
data: BoundaryGeoJSONExport, target: ExportTarget
) -> Path
Export supported boundary-backed complaint outputs to GeoJSON.
Source code in src/nyc311/export/_geojson.py
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 | |
export_report_card ¶
export_report_card(
data: object, target: ExportTarget
) -> Path
Export a markdown report card from summaries, gaps, and anomalies.
Source code in src/nyc311/export/_report.py
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 | |
Pipeline¶
nyc311.pipeline ¶
High-level workflow helpers for live fetching and topic-analysis pipelines.
fetch_service_requests ¶
fetch_service_requests(
*,
filters: ServiceRequestFilter | None = None,
socrata_config: SocrataConfig | None = None,
output: str | Path | None = None,
cache_dir: Path | str | None = None,
refresh: bool = False,
max_cached_records: int | None = None,
) -> list[ServiceRequestRecord]
Fetch a live Socrata slice into memory and optionally stage it as CSV.
This is the intended SDK helper for notebook and workflow users who want to fetch once, inspect records in memory, and only export a local snapshot when they decide the filtered slice is worth keeping.
When cache_dir is set, responses are streamed to a CSV cache first (see
:func:nyc311.io.cached_fetch), then loaded—avoid huge slices unless you use
chunked analysis on the cache file.
Source code in src/nyc311/pipeline.py
28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 | |
run_topic_pipeline ¶
run_topic_pipeline(
source: str | Path | SocrataConfig,
complaint_type: str,
*,
geography: str = "community_district",
filters: ServiceRequestFilter | None = None,
top_n: int = 20,
output: str | Path | None = None,
output_format: str = "csv",
boundaries: str | Path | None = None,
) -> list[GeographyTopicSummary]
Run the implemented load-extract-aggregate-export topic workflow.
When output is provided, this helper also writes either a CSV or GeoJSON
artifact using the same behavior exposed by the current CLI. The aggregated
summaries are always returned to support notebook and workflow use cases.
Source code in src/nyc311/pipeline.py
62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 | |
bulk_fetch ¶
bulk_fetch(
*,
complaint_types: tuple[str, ...] = (),
start_date: date | str | None = None,
end_date: date | str | None = None,
cache_dir: Path | str = Path("data/cache"),
boroughs: tuple[str, ...] | None = None,
app_token: str | None = None,
page_size: int = 5000,
on_progress: Callable[[str, int, int], None]
| None = None,
) -> list[Path]
Fetch full-city 311 data split by borough for manageable file sizes.
Downloads are split per-borough so that each CSV stays under a few
hundred megabytes. Files are written to cache_dir with
deterministic names; subsequent calls skip any borough whose file
already exists. Each completed CSV is paired with a .meta.json
sidecar containing the row count, SHA-256 checksum, fetch
timestamp, and the filter parameters used.
The Socrata $select fragment requests the schema:
unique_key, created_date, closed_date, complaint_type,
descriptor, borough, community_board, resolution_description,
latitude, longitude. closed_date (added in v1.0.1 per
random-walks/nyc311#20) is nullable — unresolved complaints
serialize it as an empty column — which lets downstream
resolution-time / SLA analyses compute
closed_date - created_date directly without a second
round-trip.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
complaint_types
|
tuple[str, ...]
|
Optional whitelist of complaint types. When empty, every complaint type is included. |
()
|
start_date
|
date | str | None
|
Inclusive lower bound on |
None
|
end_date
|
date | str | None
|
Inclusive upper bound on |
None
|
cache_dir
|
Path | str
|
Directory to write per-borough CSV files into. The directory is created on demand. |
Path('data/cache')
|
boroughs
|
tuple[str, ...] | None
|
Boroughs to include. Defaults to all five. |
None
|
app_token
|
str | None
|
Socrata app token for higher rate limits. |
None
|
page_size
|
int
|
Rows per Socrata HTTP request. |
5000
|
on_progress
|
Callable[[str, int, int], None] | None
|
Optional callback invoked after each HTTP page as
|
None
|
Returns:
| Type | Description |
|---|---|
list[Path]
|
Paths to the completed per-borough CSV files in the order the |
list[Path]
|
boroughs were processed. |
Source code in src/nyc311/pipeline.py
119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 | |
DataFrames¶
nyc311.dataframes ¶
Optional pandas conversion helpers for notebook and data-science workflows.
anomalies_to_dataframe ¶
anomalies_to_dataframe(
anomalies: list[AnomalyResult],
) -> Any
Convert anomaly results into a DataFrame.
Source code in src/nyc311/dataframes/_analysis.py
65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 | |
coverage_to_dataframe ¶
coverage_to_dataframe(
reports: list[TopicCoverageReport],
) -> Any
Convert topic-coverage reports into a DataFrame.
Source code in src/nyc311/dataframes/_analysis.py
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 | |
gaps_to_dataframe ¶
gaps_to_dataframe(gaps: list[ResolutionGapSummary]) -> Any
Convert resolution-gap summaries into a DataFrame.
Source code in src/nyc311/dataframes/_analysis.py
44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 | |
summaries_to_dataframe ¶
summaries_to_dataframe(
summaries: list[GeographyTopicSummary],
) -> Any
Convert geography-topic summaries into a DataFrame.
Source code in src/nyc311/dataframes/_analysis.py
22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 | |
assignments_to_dataframe ¶
assignments_to_dataframe(
assignments: list[TopicAssignment],
) -> Any
Convert topic assignments into a DataFrame.
Source code in src/nyc311/dataframes/_records.py
104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 | |
dataframe_to_records ¶
dataframe_to_records(
dataframe: Any,
) -> list[ServiceRequestRecord]
Convert a DataFrame back into typed service-request records.
Source code in src/nyc311/dataframes/_records.py
45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 | |
records_to_dataframe ¶
records_to_dataframe(
records: list[ServiceRequestRecord],
) -> Any
Convert service-request records into a notebook-friendly DataFrame.
Source code in src/nyc311/dataframes/_records.py
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 | |
resample_and_fill ¶
resample_and_fill(
dataframe: Any,
freq: str,
*,
method: Literal["zero", "ffill", "bfill"] = "zero",
) -> Any
Resample a DatetimeIndex-indexed frame and fill missing bins.
method='zero' fills missing values with 0 (typical for counts).
Source code in src/nyc311/dataframes/_timeseries.py
106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 | |
to_panel ¶
to_panel(
records: list[ServiceRequestRecord],
*,
freq: str = "D",
geography: str = "borough",
) -> Any
Return a panel of complaint counts indexed by (geography_value, period).
Columns are complaint types. Use .xs("BROOKLYN", level=0) for one area.
Source code in src/nyc311/dataframes/_timeseries.py
43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 | |
to_timeseries ¶
to_timeseries(
records: list[ServiceRequestRecord], *, freq: str = "D"
) -> Any
Return complaint counts per period with a :class:~pandas.DatetimeIndex.
Columns are complaint types (wide format). Suitable for .plot(), .rolling(),
and .resample().
Source code in src/nyc311/dataframes/_timeseries.py
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 | |
to_topic_timeseries ¶
to_topic_timeseries(
assignments: list[TopicAssignment], *, freq: str = "D"
) -> Any
Like :func:to_timeseries but aggregates extracted topic labels.
Source code in src/nyc311/dataframes/_timeseries.py
77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 | |
Spatial¶
nyc311.spatial ¶
Optional geospatial helpers built on top of the typed nyc311 models.
The nyc311.spatial module is the GeoDataFrame-flavoured sibling of
nyc311.geographies — it loads boundary layers and records as
geopandas frames, spatially joins records to boundaries, and
materialises typed summaries as map-ready GeoDataFrames.
.. note::
For polygon-centroid points (distance-band spatial weights, Moran's I / LISA, nearest-neighbour joins, choropleth label placement), nyc311 deliberately does not ship a centroid helper in this module. Use upstream instead:
.. code-block:: python
from nyc_geo_toolkit import (
centroids_from_boundaries,
load_nyc_boundaries,
)
cbs = load_nyc_boundaries("community_district")
# representative=True keeps the point inside the polygon —
# matters for non-convex NYC shorelines.
points = centroids_from_boundaries(cbs, representative=True)
Shipped as a first-class helper in nyc-geo-toolkit v0.4.0 (on
PyPI as v0.4.1 since 2026-04-21). Requires the [spatial]
extra on nyc-geo-toolkit for the shapely dependency. See also
:func:nyc311.temporal.centroids_from_boundaries, which returns
a shapely-free dict[str, (lat, lon)] suitable for direct
use with :func:nyc311.temporal.build_distance_weights.
load_boundaries_geodataframe ¶
load_boundaries_geodataframe(
source: str | Path | BoundaryCollection | None = None,
*,
layer: str | None = None,
) -> Any
Load supported boundaries from a path, collection, or packaged layer.
.. note::
Need polygon centroids for spatial weights / Moran's I / label
placement? Upstream :func:nyc_geo_toolkit.centroids_from_boundaries
(v0.4+) converts any polygon BoundaryCollection into a Point
BoundaryCollection, preserving geography / vintage / properties.
Pair with representative=True for non-convex polygons. See the
:mod:nyc311.spatial module docstring for the full recipe.
Source code in src/nyc311/spatial/_boundaries.py
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 | |
spatial_join_records_to_boundaries ¶
spatial_join_records_to_boundaries(
records_gdf: Any, boundaries_gdf: Any
) -> Any
Join point records to boundary polygons without clobbering record columns.
Source code in src/nyc311/spatial/_joins.py
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 | |
records_to_geodataframe ¶
records_to_geodataframe(
records: list[ServiceRequestRecord],
) -> Any
Convert point-capable service-request records into a GeoDataFrame.
Source code in src/nyc311/spatial/_points.py
13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 | |
summaries_to_geodataframe ¶
summaries_to_geodataframe(
summaries: list[Any],
boundaries_gdf: Any = None,
*,
layer: str | None = None,
) -> Any
Merge aggregated geography summaries onto boundary geometries.
Source code in src/nyc311/spatial/_summaries.py
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 | |
Plotting¶
nyc311.plotting ¶
Optional in-memory plotting helpers for NYC boundary maps.
plot_boundary_choropleth ¶
plot_boundary_choropleth(
geodataframe: Any,
*,
column: str,
title: str,
cmap: str = "viridis",
categorical: bool = False,
add_basemap: bool = False,
figsize: tuple[float, float] = (10, 8),
outline_gdf: Any | None = None,
legend_title: str | None = None,
legend_kwds: dict[str, Any] | None = None,
) -> Any
Render a choropleth map and return the matplotlib figure.
Source code in src/nyc311/plotting.py
159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 | |
plot_boundary_preview ¶
plot_boundary_preview(
boundaries_gdf: Any,
*,
title: str,
points_gdf: Any | None = None,
add_basemap: bool = False,
figsize: tuple[float, float] = (10, 8),
) -> Any
Render boundary outlines and optional points, then return the figure.
Source code in src/nyc311/plotting.py
249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 | |
plot_boundary_point_groups ¶
plot_boundary_point_groups(
boundaries_gdf: Any,
*,
title: str,
matched_points_gdf: Any | None = None,
unmatched_points_gdf: Any | None = None,
context_gdf: Any | None = None,
outline_gdf: Any | None = None,
matched_label: str = "Matched",
unmatched_label: str = "Unmatched",
add_basemap: bool = False,
figsize: tuple[float, float] = (10, 8),
) -> Any
Render categorized points over highlighted boundaries and optional context.
Source code in src/nyc311/plotting.py
289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 | |
plot_timeseries ¶
plot_timeseries(
dataframe: Any,
*,
title: str,
figsize: tuple[float, float] = (12, 5),
footnote: str | None = None,
) -> Any
Line chart for a :class:~pandas.DataFrame with a DatetimeIndex or created_date column.
Source code in src/nyc311/plotting.py
368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 | |
plot_complaint_heatmap ¶
plot_complaint_heatmap(
dataframe: Any,
*,
title: str,
time_column: str = "created_date",
figsize: tuple[float, float] = (10, 6),
) -> Any
Hour-of-day x day-of-week density heatmap (expects datetime resolution in time_column).
Source code in src/nyc311/plotting.py
408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 | |
plot_stacked_area ¶
plot_stacked_area(
dataframe: Any,
*,
title: str,
top_n: int = 8,
figsize: tuple[float, float] = (12, 6),
) -> Any
Stacked area chart of the top-N columns (by total) over a DatetimeIndex.
Source code in src/nyc311/plotting.py
448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 | |
plot_bar_counts ¶
plot_bar_counts(
labels: list[str],
counts: list[float],
*,
title: str,
horizontal: bool = False,
figsize: tuple[float, float] = (10, 6),
) -> Any
Simple bar chart for categorical counts.
Source code in src/nyc311/plotting.py
488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 | |
plot_complaint_scatter ¶
plot_complaint_scatter(
points_gdf: Any,
*,
boundaries_gdf: Any | None = None,
title: str,
column: str = "complaint_type",
add_basemap: bool = False,
figsize: tuple[float, float] = (12, 10),
legend_top_n: int | None = None,
) -> Any
Scatter plot of points colored by column over optional boundary outlines.
Source code in src/nyc311/plotting.py
511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 | |
plot_hero_banner ¶
plot_hero_banner(
points_gdf: Any,
*,
boundaries_gdf: Any | None = None,
title: str,
bbox: tuple[float, float, float, float] | None = None,
column: str = "complaint_type",
figsize: tuple[float, float] = (16, 5),
legend_top_n: int | None = None,
) -> Any
Wide horizontal map with OSM basemap, points, and boundaries (Web Mercator).
Source code in src/nyc311/plotting.py
566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 | |
Presets¶
nyc311.presets ¶
Reusable preset builders for common nyc311 example and workflow inputs.
build_filter ¶
build_filter(
*,
start_date: date | str,
end_date: date | str,
geography: str = "borough",
geography_value: str = models.BOROUGH_BROOKLYN,
complaint_types: tuple[str, ...] = (),
) -> models.ServiceRequestFilter
Build a typed service-request filter from string-friendly inputs.
Source code in src/nyc311/presets.py
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | |
brooklyn_borough_filter ¶
brooklyn_borough_filter(
*,
start_date: date | str,
end_date: date | str,
complaint_types: tuple[str, ...] = (),
) -> models.ServiceRequestFilter
Build a borough-level Brooklyn filter.
Source code in src/nyc311/presets.py
32 33 34 35 36 37 38 39 40 41 42 43 44 45 | |
manhattan_borough_filter ¶
manhattan_borough_filter(
*,
start_date: date | str,
end_date: date | str,
complaint_types: tuple[str, ...] = (),
) -> models.ServiceRequestFilter
Build a borough-level Manhattan filter.
Source code in src/nyc311/presets.py
48 49 50 51 52 53 54 55 56 57 58 59 60 61 | |
small_socrata_config ¶
small_socrata_config(
*,
page_size: int = 500,
max_pages: int | None = 1,
app_token: str | None = None,
) -> models.SocrataConfig
Build a small Socrata config suited to examples and local iteration.
Source code in src/nyc311/presets.py
64 65 66 67 68 69 70 71 72 73 74 75 | |
large_socrata_config ¶
large_socrata_config(
*,
page_size: int = 5000,
max_pages: int | None = None,
app_token: str | None = None,
request_timeout_seconds: float = 300.0,
created_date_sort: Literal["asc", "desc"] = "asc",
) -> models.SocrataConfig
Build a high-throughput Socrata config for bulk downloads (e.g. full history).
Default page_size is 5,000 rows per request so each HTTP round-trip stays
smaller than very large pages, with a five-minute read timeout per request.
Use created_date_sort='desc' when you want the most recent rows first
(e.g. capped smoke samples).
Source code in src/nyc311/presets.py
78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 | |
smoke_socrata_config ¶
smoke_socrata_config(
*,
page_size: int = 5000,
app_token: str | None = None,
request_timeout_seconds: float = 120.0,
) -> models.SocrataConfig
Recent-first Socrata config used with a per-borough row cap (see about-the-data --preset smoke).
Source code in src/nyc311/presets.py
102 103 104 105 106 107 108 109 110 111 112 113 114 115 | |
Factors¶
nyc311.factors ¶
Composable factor pipeline for NYC 311 complaint analysis.
EquityGapFactor ¶
Bases: Factor
Disparity metric: ratio of unit resolution time to citywide median.
Values above 1.0 indicate the unit resolves complaints slower than the citywide median; below 1.0, faster.
Source code in src/nyc311/factors/_advanced.py
51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 | |
compute ¶
compute(context: FactorContext) -> float
Return the resolution-time equity ratio for context.
Returns:
| Type | Description |
|---|---|
float
|
|
float
|
resolved complaints exist or the citywide median is |
float
|
non-positive. |
Source code in src/nyc311/factors/_advanced.py
70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 | |
SpatialLagFactor ¶
Bases: Factor
Spatial lag of complaint counts: weighted average of neighbors.
Uses a precomputed spatial weights dict and a values dict to compute the weighted sum of neighboring unit values for the focal unit.
Source code in src/nyc311/factors/_advanced.py
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 | |
compute ¶
compute(context: FactorContext) -> float
Return the spatial lag for the context's geographic unit.
Returns:
| Type | Description |
|---|---|
float
|
The weighted sum of neighboring values. Returns |
float
|
when the unit has no neighbors in the weights dict. |
Source code in src/nyc311/factors/_advanced.py
37 38 39 40 41 42 43 44 45 46 47 48 | |
Factor ¶
Bases: ABC
Abstract base for a single named computation over a FactorContext.
Source code in src/nyc311/factors/_base.py
44 45 46 47 48 49 50 51 52 | |
compute
abstractmethod
¶
compute(context: FactorContext) -> float | str | bool | int
Return the computed value for context.
Source code in src/nyc311/factors/_base.py
50 51 52 | |
FactorContext
dataclass
¶
Row-level context for factor computation.
Each context represents one geographic unit (community district, NTA, borough) over one time window. Factors compute a single value from this context.
Source code in src/nyc311/factors/_base.py
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 | |
Pipeline ¶
Immutable builder that executes factors over contexts.
Pipeline never mutates in place: :meth:add returns a new
pipeline with the factor appended.
Source code in src/nyc311/factors/_base.py
55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 | |
add ¶
add(factor: Factor) -> Pipeline
Return a new pipeline with factor appended.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
factor
|
Factor
|
The factor to append. Must define a unique |
required |
Returns:
| Type | Description |
|---|---|
Pipeline
|
A new :class: |
Pipeline
|
|
Source code in src/nyc311/factors/_base.py
67 68 69 70 71 72 73 74 75 76 77 | |
as_factor_factory_estimate ¶
as_factor_factory_estimate(
panel: Any,
*,
family: str = "did",
method: str = "twfe",
outcome: str | None = None,
**engine_kwargs: Any,
) -> Any
Run a factor-factory engine on panel as a Pipeline continuation.
Additive bridge: the pipeline itself is not executed here.
Instead, the call dispatches into
factor_factory.engines.<family>.estimate, returning a
factor-factory <Family>Results object that downstream code
can chain off.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
panel
|
Any
|
A :class: |
required |
family
|
str
|
Engine-family module name under
|
'did'
|
method
|
str
|
Registry key for a specific adapter inside the
family (e.g. |
'twfe'
|
outcome
|
str | None
|
Outcome column on the Panel. When |
None
|
**engine_kwargs
|
Any
|
Additional kwargs forwarded to the engine's
|
{}
|
Returns:
| Type | Description |
|---|---|
Any
|
A factor-factory |
Raises:
| Type | Description |
|---|---|
ImportError
|
If factor-factory is not installed or the requested engine family's optional dependencies are missing. |
Source code in src/nyc311/factors/_base.py
84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 | |
run ¶
run(contexts: Iterable[FactorContext]) -> PipelineResult
Execute all factors across contexts and return results.
Iterates over each context once and evaluates every factor against
it, producing a columnar :class:PipelineResult keyed by factor
name.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
contexts
|
Iterable[FactorContext]
|
An iterable of :class: |
required |
Returns:
| Name | Type | Description |
|---|---|---|
A |
PipelineResult
|
class: |
PipelineResult
|
value tuples and whose |
|
PipelineResult
|
those columns positionally. |
Source code in src/nyc311/factors/_base.py
133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 | |
PipelineResult
dataclass
¶
Columnar result set produced by :meth:Pipeline.run.
Source code in src/nyc311/factors/_base.py
165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 | |
to_records ¶
to_records() -> tuple[dict[str, Any], ...]
Convert to a tuple of row dictionaries.
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
A tuple where each element is a dict containing |
...
|
|
tuple[dict[str, Any], ...]
|
row order matches :attr: |
Source code in src/nyc311/factors/_base.py
172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 | |
to_dataframe ¶
to_dataframe() -> Any
Convert to a pandas DataFrame indexed by geography_id.
Returns:
| Type | Description |
|---|---|
Any
|
A |
Any
|
one column per factor, indexed by |
Raises:
| Type | Description |
|---|---|
ImportError
|
If pandas is not installed. Install the optional
dataframes extra with |
Source code in src/nyc311/factors/_base.py
188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 | |
AnomalyScoreFactor ¶
Bases: Factor
Z-score of this unit's complaint volume.
Because the z-score is relative to the full set of contexts in
the pipeline run, this factor stores intermediate counts and
finalizes during :meth:Pipeline.run. As a stateless compromise
it uses a fixed population_mean and population_std provided at
construction time.
Returns 0.0 when population_std is zero.
Source code in src/nyc311/factors/_builtin.py
172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 | |
compute ¶
compute(context: FactorContext) -> float
Return the z-score of this context's complaint volume.
Returns:
| Type | Description |
|---|---|
float
|
|
float
|
when |
Source code in src/nyc311/factors/_builtin.py
206 207 208 209 210 211 212 213 214 215 | |
ComplaintVolumeFactor ¶
Bases: Factor
Total complaint count, optionally per-capita per 10 000 residents.
When per_capita is True and :attr:FactorContext.total_population
is available, the result is count / population * 10_000 (a float).
Otherwise the raw integer count is returned.
Source code in src/nyc311/factors/_builtin.py
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 | |
compute ¶
compute(context: FactorContext) -> int | float
Return the complaint volume (or per-capita rate) for context.
Returns:
| Type | Description |
|---|---|
int | float
|
The integer count of complaints in the context, or, when |
int | float
|
|
int | float
|
float |
Source code in src/nyc311/factors/_builtin.py
35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 | |
RecurrenceFactor ¶
Bases: Factor
Fraction of complaints at locations that appear more than once.
Locations are identified by rounding latitude/longitude to 4 decimal
places (~11 m precision). Returns 0.0 when no complaints have
coordinates.
Source code in src/nyc311/factors/_builtin.py
243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 | |
compute ¶
compute(context: FactorContext) -> float
Return the recurrent-location share for context.
Returns:
| Type | Description |
|---|---|
float
|
The fraction of geocoded complaint locations (latitude and |
float
|
longitude rounded to 4 decimal places) that appear more than |
float
|
once in the context. Returns |
float
|
coordinates. |
Source code in src/nyc311/factors/_builtin.py
254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 | |
ResolutionTimeFactor ¶
Bases: Factor
Median or mean days between complaint creation and resolution.
Uses resolution_description is not None as a proxy for resolved.
Returns -1.0 when no resolved complaints exist in the context.
Source code in src/nyc311/factors/_builtin.py
53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 | |
compute ¶
compute(context: FactorContext) -> float
Return the median (or mean) resolution time for context.
Returns:
| Type | Description |
|---|---|
float
|
The number of days between complaint creation and the |
float
|
window's end across resolved complaints, aggregated by the |
float
|
configured |
float
|
the context have a resolution description. |
Source code in src/nyc311/factors/_builtin.py
78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 | |
ResponseRateFactor ¶
Bases: Factor
Fraction of complaints that received a resolution description.
Range [0.0, 1.0]. Returns 0.0 for empty contexts.
Source code in src/nyc311/factors/_builtin.py
218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 | |
compute ¶
compute(context: FactorContext) -> float
Return the resolved fraction of complaints in context.
Returns:
| Type | Description |
|---|---|
float
|
The fraction of complaints with a non-null |
float
|
|
float
|
|
Source code in src/nyc311/factors/_builtin.py
227 228 229 230 231 232 233 234 235 236 237 238 239 240 | |
SeasonalityFactor ¶
Bases: Factor
Deviation of complaint count from a seasonal baseline.
baseline_monthly_counts maps month number (1-12) to the expected
count for that month. The factor returns (actual - expected) /
expected as a fractional deviation. Returns 0.0 when the
baseline is missing for the context's month or is zero.
Source code in src/nyc311/factors/_builtin.py
132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 | |
compute ¶
compute(context: FactorContext) -> float
Return the fractional deviation from the seasonal baseline.
Returns:
| Type | Description |
|---|---|
float
|
|
float
|
number of complaints in the context and |
float
|
baseline for the context's start-month. Returns |
float
|
the baseline is missing or non-positive for that month. |
Source code in src/nyc311/factors/_builtin.py
155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 | |
TopicConcentrationFactor ¶
Bases: Factor
Herfindahl-Hirschman Index of complaint-type shares.
HHI = sum(share_i^2) where share_i is the proportion of complaints of type i. Range [1/N, 1.0]; higher values indicate more concentration in fewer complaint types.
Returns 0.0 when the context has no complaints.
Source code in src/nyc311/factors/_builtin.py
103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 | |
compute ¶
compute(context: FactorContext) -> float
Return the HHI of complaint-type shares for context.
Returns:
| Type | Description |
|---|---|
float
|
|
float
|
of complaints of type |
float
|
and increases as complaints concentrate in fewer types. |
float
|
Returns |
Source code in src/nyc311/factors/_builtin.py
116 117 118 119 120 121 122 123 124 125 126 127 128 129 | |
dispatch_factor_factory_engine ¶
dispatch_factor_factory_engine(
panel: Panel,
*,
family: str = "did",
method: str = "twfe",
outcome: str | None = None,
**engine_kwargs: Any,
) -> Any
Call factor_factory.engines.<family>.estimate on panel.
This is the chaining target behind
:meth:nyc311.factors.Pipeline.as_factor_factory_estimate. It
lazily imports the requested engine family so callers don't pay the
import cost for families they don't use, and it raises a friendly
:class:ImportError when the family's optional dependencies are
missing.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
panel
|
Panel
|
A :class: |
required |
family
|
str
|
Engine-family module name under
|
'did'
|
method
|
str
|
Registry key for a specific adapter inside the family.
For example, |
'twfe'
|
outcome
|
str | None
|
Outcome column on the Panel. When |
None
|
**engine_kwargs
|
Any
|
Additional keyword arguments forwarded to the
engine's |
{}
|
Returns:
| Type | Description |
|---|---|
Any
|
The factor-factory |
Any
|
returned. Its :meth: |
Any
|
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
ImportError
|
If factor-factory is not installed or the requested engine family's optional dependencies are missing. |
Source code in src/nyc311/factors/_factor_factory.py
40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 | |
Temporal¶
nyc311.temporal ¶
Temporal panel module for longitudinal 311 complaint analysis.
PanelDataset
dataclass
¶
Balanced panel of (geographic_unit x time_period) observations.
Methods return new :class:PanelDataset instances—the dataset is
never mutated in place.
Source code in src/nyc311/temporal/_models.py
55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 | |
treatment_events
class-attribute
instance-attribute
¶
treatment_events: tuple[TreatmentEvent, ...] = ()
unit_ids
property
¶
unit_ids: tuple[str, ...]
The sorted, unique unit identifiers in the dataset.
Returns:
| Type | Description |
|---|---|
str
|
A tuple of distinct |
...
|
|
treatment_group ¶
treatment_group() -> PanelDataset
Return only observations in units that were ever treated.
Returns:
| Type | Description |
|---|---|
PanelDataset
|
A new :class: |
PanelDataset
|
restricted to units with a non-null |
PanelDataset
|
|
Source code in src/nyc311/temporal/_models.py
72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 | |
control_group ¶
control_group() -> PanelDataset
Return only observations in units that were never treated.
Returns:
| Type | Description |
|---|---|
PanelDataset
|
A new :class: |
PanelDataset
|
restricted to units with no |
PanelDataset
|
|
Source code in src/nyc311/temporal/_models.py
92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 | |
filter_periods ¶
filter_periods(start: str, end: str) -> PanelDataset
Restrict the dataset to a closed interval of periods.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
start
|
str
|
Inclusive lower-bound period label. |
required |
end
|
str
|
Inclusive upper-bound period label. |
required |
Returns:
| Type | Description |
|---|---|
PanelDataset
|
A new :class: |
PanelDataset
|
|
PanelDataset
|
|
Source code in src/nyc311/temporal/_models.py
112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 | |
to_factor_factory_panel ¶
to_factor_factory_panel(
*,
outcome_col: str = "complaint_count",
provenance: Any | None = None,
spatial_weights: dict[str, dict[str, float]]
| None = None,
) -> Any
Convert to a :class:factor_factory.tidy.Panel.
The adapter is additive — self is unchanged. Treatment events
are translated to factor-factory's frozen
:class:TreatmentEvent model, and an optional
spatial_weights dict (as produced by
:func:nyc311.temporal.build_distance_weights) is stashed on
panel.df.attrs["nyc311_spatial_weights"] for in-memory
round-trip.
See :mod:nyc311.temporal._factor_factory for details on the
column crosswalk.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
outcome_col
|
str
|
Column name to tag as the primary outcome in
the Panel metadata. Defaults to |
'complaint_count'
|
provenance
|
Any | None
|
Optional |
None
|
spatial_weights
|
dict[str, dict[str, float]] | None
|
Optional nested weights dict from
:func: |
None
|
Returns:
| Type | Description |
|---|---|
Any
|
A fully-validated |
Raises:
| Type | Description |
|---|---|
ImportError
|
If factor-factory or pandas is not installed. |
ValueError
|
If the dataset is empty or |
Source code in src/nyc311/temporal/_models.py
152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 | |
to_dataframe ¶
to_dataframe() -> Any
Convert to a pandas DataFrame with a (unit_id, period) MultiIndex.
Each per-type complaint count is exploded into a
complaints_<type> column, and any per-unit covariates are
merged in as additional columns.
Returns:
| Type | Description |
|---|---|
Any
|
A |
Any
|
one column per panel measure. The frame has no rows when the |
Any
|
dataset is empty. |
Raises:
| Type | Description |
|---|---|
ImportError
|
If pandas is not installed. Install the optional
dataframes extra with |
Source code in src/nyc311/temporal/_models.py
200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 | |
PanelObservation
dataclass
¶
One row in a balanced panel: (geographic_unit x time_period).
Source code in src/nyc311/temporal/_models.py
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 | |
TreatmentEvent
dataclass
¶
A policy intervention applied to specific geographic units.
Source code in src/nyc311/temporal/_models.py
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | |
panel_dataset_to_factor_factory ¶
panel_dataset_to_factor_factory(
dataset: PanelDataset,
*,
outcome_col: str = "complaint_count",
provenance: Provenance | None = None,
spatial_weights: dict[str, dict[str, float]]
| None = None,
) -> ff_tidy.Panel
Convert a :class:PanelDataset to a :class:factor_factory.tidy.Panel.
Maps nyc311's panel model onto factor-factory's tidy Panel contract:
unit_id→ Panel first-level MultiIndex, namedunit_id.period(string label) → pandas Timestamp at the period start, second-level index namedperiod.complaint_count→ primary outcome column (configurable viaoutcome_col).treatment(bool) → int 0/1 column namedtreatment.resolution_rate,median_resolution_days,population, per-type complaint counts, and covariates flow through as additional columns the engine can consume as covariates.TreatmentEventtuples are translated to factor-factory's frozen :class:TreatmentEventpydantic model (geographymaps todimension).- A
spatial_weightsdict (as produced by :func:nyc311.temporal.build_distance_weights) is attached to the resulting :attr:Panel.df.attrsunder the key"nyc311_spatial_weights"for in-memory round-trip.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset
|
PanelDataset
|
The balanced :class: |
required |
outcome_col
|
str
|
Column name to tag as the primary outcome in the
Panel metadata. Must be one of |
'complaint_count'
|
provenance
|
Provenance | None
|
Optional factor-factory :class: |
None
|
spatial_weights
|
dict[str, dict[str, float]] | None
|
Optional nested dict as produced by
:func: |
None
|
Returns:
| Type | Description |
|---|---|
Panel
|
A fully-validated :class: |
Raises:
| Type | Description |
|---|---|
ImportError
|
If |
ValueError
|
If |
Source code in src/nyc311/temporal/_factor_factory.py
45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 | |
spatial_weights_from_panel ¶
spatial_weights_from_panel(
panel: Panel,
) -> dict[str, dict[str, float]] | None
Recover spatial weights previously attached via the adapter.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
panel
|
Panel
|
A :class: |
required |
Returns:
| Type | Description |
|---|---|
dict[str, dict[str, float]] | None
|
The nested weights dict, or |
dict[str, dict[str, float]] | None
|
attached. |
Source code in src/nyc311/temporal/_factor_factory.py
183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 | |
build_complaint_panel ¶
build_complaint_panel(
records: Sequence[ServiceRequestRecord],
*,
geography: str = "community_district",
freq: str = "ME",
treatment_events: Sequence[TreatmentEvent] = (),
population_data: dict[str, int] | None = None,
covariates: dict[str, dict[str, float]] | None = None,
) -> PanelDataset
Construct a balanced panel from service-request records.
Aggregates records into one observation per
(geographic-unit, period) cell, filling missing cells so the
resulting :class:PanelDataset is fully balanced across both
dimensions.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
records
|
Sequence[ServiceRequestRecord]
|
Raw complaint records to aggregate. |
required |
geography
|
str
|
Geographic unit to group by; one of |
'community_district'
|
freq
|
str
|
Pandas offset alias controlling the period length
( |
'ME'
|
treatment_events
|
Sequence[TreatmentEvent]
|
Policy interventions to code as treatment indicators on each observation. |
()
|
population_data
|
dict[str, int] | None
|
Mapping |
None
|
covariates
|
dict[str, dict[str, float]] | None
|
Mapping |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
A |
PanelDataset
|
class: |
PanelDataset
|
period) |
|
PanelDataset
|
observations and no periods. |
Raises:
| Type | Description |
|---|---|
ImportError
|
If pandas is not installed. Install the optional
dataframes extra with |
Source code in src/nyc311/temporal/_panel.py
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 | |
build_distance_weights ¶
build_distance_weights(
unit_centroids: dict[str, tuple[float, float]],
*,
threshold_meters: float = 2000.0,
row_standardize: bool = True,
) -> dict[str, dict[str, float]]
Build an inverse-distance spatial weights matrix.
Units within threshold_meters are neighbors, weighted by
1 / distance. The resulting matrix is symmetric before
row-standardization.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
unit_centroids
|
dict[str, tuple[float, float]]
|
Mapping |
required |
threshold_meters
|
float
|
Maximum great-circle distance, in meters, for two units to be considered neighbors. |
2000.0
|
row_standardize
|
bool
|
If |
True
|
Returns:
| Type | Description |
|---|---|
dict[str, dict[str, float]]
|
Nested dictionary |
dict[str, dict[str, float]]
|
neighbors map to an empty inner dict. |
Source code in src/nyc311/temporal/_spatial_weights.py
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 | |
centroids_from_boundaries ¶
centroids_from_boundaries(
boundaries: Any,
) -> dict[str, tuple[float, float]]
Extract centroids from a :class:BoundaryCollection.
Computes a per-feature centroid as the mean of the exterior-ring coordinates. This is approximate but cheap and avoids a hard dependency on shapely.
.. note::
As of nyc-geo-toolkit v0.4.0,
:func:`nyc_geo_toolkit.centroids_from_boundaries` is available
as a shapely-backed, publication-grade centroid helper — it
returns a :class:`BoundaryCollection` of GeoJSON ``Point``
features at either the geometric centroid (default) or
shapely's ``representative_point`` (guaranteed to lie inside
concave polygons such as NYC's jagged community districts).
Prefer it when you already have shapely installed and need
defensible geometry for a published analysis.
nyc311's helper is intentionally the **shapely-free** path
(returns a plain ``dict[str, (lat, lon)]`` suitable for
feeding directly into :func:`build_distance_weights`) and is
preserved for workflows that need to stay on the lean base
install. The two helpers return different shapes and slightly
different numbers; don't swap them mid-analysis.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
boundaries
|
Any
|
A boundary collection exposing a |
required |
Returns:
| Type | Description |
|---|---|
dict[str, tuple[float, float]]
|
Mapping |
dict[str, tuple[float, float]]
|
feature whose exterior ring is non-empty. |
Source code in src/nyc311/temporal/_spatial_weights.py
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 | |
weights_to_pysal ¶
weights_to_pysal(
weights: dict[str, dict[str, float]],
) -> Any
Convert a weights dict to a :class:libpysal.weights.W object.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
weights
|
dict[str, dict[str, float]]
|
Nested dictionary |
required |
Returns:
| Type | Description |
|---|---|
Any
|
A |
Any
|
spatial autocorrelation routines. |
Raises:
| Type | Description |
|---|---|
ImportError
|
If libpysal is not installed. Install the optional
stats extra with |
Source code in src/nyc311/temporal/_spatial_weights.py
61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 | |
Stats¶
nyc311.stats ¶
PhD-level statistical modeling for NYC 311 complaint analysis.
STLAnomalyResult
dataclass
¶
Result of STL-residual anomaly detection.
Source code in src/nyc311/stats/_anomaly.py
18 19 20 21 22 23 24 25 26 27 | |
BYM2Result
dataclass
¶
Result of BYM2 small-area smoothing.
Source code in src/nyc311/stats/_bym2.py
23 24 25 26 27 28 29 30 31 32 33 34 35 | |
ChangepointResult
dataclass
¶
Detected structural breaks in a time series.
Source code in src/nyc311/stats/_changepoint.py
25 26 27 28 29 30 31 32 | |
DecompositionResult
dataclass
¶
Seasonal + trend + residual decomposition.
Source code in src/nyc311/stats/_decomposition.py
24 25 26 27 28 29 30 31 | |
OaxacaBlinderResult
dataclass
¶
Oaxaca-Blinder decomposition of an outcome gap.
Source code in src/nyc311/stats/_equity.py
32 33 34 35 36 37 38 39 40 41 42 43 | |
TheilResult
dataclass
¶
Population-weighted Theil T index with group decomposition.
Source code in src/nyc311/stats/_equity.py
46 47 48 49 50 51 52 53 54 | |
GWRResult
dataclass
¶
Result of a geographically weighted regression.
Source code in src/nyc311/stats/_gwr.py
23 24 25 26 27 28 29 30 31 32 33 34 | |
HawkesResult
dataclass
¶
Result of a Hawkes process estimation.
Source code in src/nyc311/stats/_hawkes.py
30 31 32 33 34 35 36 37 38 39 40 | |
ITSResult
dataclass
¶
Result of a segmented interrupted-time-series regression.
Source code in src/nyc311/stats/_its.py
18 19 20 21 22 23 24 25 26 27 28 | |
PanelRegressionResult
dataclass
¶
Summary of a panel regression fit.
Source code in src/nyc311/stats/_panel_models.py
26 27 28 29 30 31 32 33 34 35 36 37 38 | |
PowerResult
dataclass
¶
Result of a power / minimum detectable effect calculation.
Source code in src/nyc311/stats/_power.py
12 13 14 15 16 17 18 19 20 21 22 | |
RDResult
dataclass
¶
Result of a regression discontinuity estimation.
Source code in src/nyc311/stats/_rdd.py
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 | |
LatentReportingResult
dataclass
¶
Result of latent reporting-bias EM estimation.
Source code in src/nyc311/stats/_reporting_bias.py
48 49 50 51 52 53 54 55 56 57 | |
ReportingAdjustmentResult
dataclass
¶
Result of ecometric reporting-rate adjustment.
Source code in src/nyc311/stats/_reporting_bias.py
36 37 38 39 40 41 42 43 44 45 | |
LISAResult
dataclass
¶
Local Indicators of Spatial Association.
Source code in src/nyc311/stats/_spatial.py
35 36 37 38 39 40 41 42 | |
MoranResult
dataclass
¶
Global Moran's I test result.
Source code in src/nyc311/stats/_spatial.py
25 26 27 28 29 30 31 32 | |
SpatialErrorResult
dataclass
¶
Result of a spatial error (SEM) model.
Source code in src/nyc311/stats/_spatial_regression.py
36 37 38 39 40 41 42 43 44 45 46 47 48 | |
SpatialLagResult
dataclass
¶
Result of a spatial lag (SAR) model.
Source code in src/nyc311/stats/_spatial_regression.py
21 22 23 24 25 26 27 28 29 30 31 32 33 | |
EventStudyResult
dataclass
¶
Event-study coefficients with pre-trend diagnostics.
Source code in src/nyc311/stats/_staggered_did.py
50 51 52 53 54 55 56 57 58 59 60 61 | |
GroupTimeATT
dataclass
¶
A single group-time average treatment effect.
Source code in src/nyc311/stats/_staggered_did.py
24 25 26 27 28 29 30 31 32 | |
StaggeredDiDResult
dataclass
¶
Result of a staggered difference-in-differences estimation.
Source code in src/nyc311/stats/_staggered_did.py
35 36 37 38 39 40 41 42 43 44 45 46 47 | |
SyntheticControlResult
dataclass
¶
Result of a synthetic control analysis.
Source code in src/nyc311/stats/_synthetic_control.py
24 25 26 27 28 29 30 31 32 33 34 35 36 37 | |
detect_stl_anomalies ¶
detect_stl_anomalies(
series: Any,
*,
period: int | None = None,
threshold: float = 2.0,
) -> STLAnomalyResult
Detect anomalies using STL decomposition residuals.
Decomposes series via STL and flags observations whose
absolute residual z-score exceeds threshold.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
series
|
Any
|
A |
required |
period
|
int | None
|
Seasonal period in observations. When |
None
|
threshold
|
float
|
Absolute z-score threshold above which an
observation is flagged as anomalous. Defaults to |
2.0
|
Returns:
| Name | Type | Description |
|---|---|---|
An |
STLAnomalyResult
|
class: |
STLAnomalyResult
|
z-scores, and summary statistics of the residual distribution. |
Raises:
| Type | Description |
|---|---|
ImportError
|
If statsmodels or pandas is not installed.
Install with |
Source code in src/nyc311/stats/_anomaly.py
30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 | |
bym2_smooth ¶
bym2_smooth(
observed_counts: dict[str, int],
expected_counts: dict[str, float],
adjacency: dict[str, tuple[str, ...]],
*,
n_samples: int = 2000,
n_tune: int = 1000,
random_seed: int = 42,
) -> BYM2Result
Smooth area-level rates with the BYM2 model.
Estimates: y_i ~ Poisson(E_i * exp(mu + phi_i))
where phi_i = sqrt(rho) * spatial_i + sqrt(1 - rho) * iid_i
The mixing parameter rho controls the balance between spatially structured and unstructured random effects.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
observed_counts
|
dict[str, int]
|
Mapping |
required |
expected_counts
|
dict[str, float]
|
Mapping |
required |
adjacency
|
dict[str, tuple[str, ...]]
|
Mapping |
required |
n_samples
|
int
|
Number of posterior draws after tuning. |
2000
|
n_tune
|
int
|
Number of warmup / tuning iterations. |
1000
|
random_seed
|
int
|
Random seed for reproducibility. |
42
|
Returns:
| Name | Type | Description |
|---|---|---|
A |
BYM2Result
|
class: |
BYM2Result
|
intervals, and variance decomposition. |
Raises:
| Type | Description |
|---|---|
ImportError
|
If pymc is not installed. |
Source code in src/nyc311/stats/_bym2.py
38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 | |
detect_changepoints ¶
detect_changepoints(
series: Any,
*,
method: Literal["pelt", "binseg"] = "pelt",
penalty: float | None = None,
min_segment_size: int = 5,
) -> ChangepointResult
Detect structural breaks in a complaint time series.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
series
|
Any
|
A |
required |
method
|
Literal['pelt', 'binseg']
|
Detection algorithm; one of |
'pelt'
|
penalty
|
float | None
|
Penalty value passed to the underlying |
None
|
min_segment_size
|
int
|
Minimum number of observations between consecutive changepoints. |
5
|
Returns:
| Name | Type | Description |
|---|---|---|
A |
ChangepointResult
|
class: |
ChangepointResult
|
indices, their corresponding dates, the resulting segment count, |
|
ChangepointResult
|
and the penalty actually used. |
Raises:
| Type | Description |
|---|---|
ImportError
|
If |
TypeError
|
If |
Source code in src/nyc311/stats/_changepoint.py
35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 | |
seasonal_decompose ¶
seasonal_decompose(
series: Any, *, period: int | None = None
) -> DecompositionResult
Decompose series into trend, seasonal, and residual components.
Wraps :class:statsmodels.tsa.seasonal.STL. The series must be
indexed by a DatetimeIndex.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
series
|
Any
|
A |
required |
period
|
int | None
|
Seasonal period in observations. When |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
A |
DecompositionResult
|
class: |
DecompositionResult
|
residual |
Raises:
| Type | Description |
|---|---|
ImportError
|
If statsmodels or pandas is not installed. Install
the optional stats extra with |
TypeError
|
If |
Source code in src/nyc311/stats/_decomposition.py
34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 | |
oaxaca_blinder_decomposition ¶
oaxaca_blinder_decomposition(
group_a: Any,
group_b: Any,
outcome: str,
regressors: tuple[str, ...],
) -> OaxacaBlinderResult
Decompose the mean-outcome gap between two groups.
Uses the Oaxaca-Blinder twofold decomposition with group B coefficients as the reference:
gap = (mean(X_a) - mean(X_b)) @ beta_b [explained]
+ mean(X_a) @ (beta_a - beta_b) [unexplained]
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
group_a
|
Any
|
|
required |
group_b
|
Any
|
|
required |
outcome
|
str
|
Name of the outcome column. |
required |
regressors
|
tuple[str, ...]
|
Column names to include as explanatory variables. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
An |
OaxacaBlinderResult
|
class: |
OaxacaBlinderResult
|
and unexplained components, and per-variable contributions. |
Raises:
| Type | Description |
|---|---|
ImportError
|
If numpy or pandas is not installed. |
ValueError
|
If fewer than 2 observations exist in either group. |
Source code in src/nyc311/stats/_equity.py
57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 | |
theil_index ¶
theil_index(
values: dict[str, float],
populations: dict[str, int],
*,
groups: dict[str, str] | None = None,
) -> TheilResult
Compute the population-weighted Theil T index.
When groups is provided, decomposes the total index into
between-group and within-group components.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
values
|
dict[str, float]
|
Mapping |
required |
populations
|
dict[str, int]
|
Mapping |
required |
groups
|
dict[str, str] | None
|
Optional mapping |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
A |
TheilResult
|
class: |
TheilResult
|
components, per-unit contributions, and count. |
Raises:
| Type | Description |
|---|---|
ImportError
|
If numpy is not installed. |
ValueError
|
If values and populations have mismatched keys. |
Source code in src/nyc311/stats/_equity.py
139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 | |
geographically_weighted_regression ¶
geographically_weighted_regression(
values: dict[str, float],
regressors: dict[str, dict[str, float]],
coordinates: dict[str, tuple[float, float]],
*,
bandwidth: float | None = None,
kernel: str = "bisquare",
) -> GWRResult
Fit a geographically weighted regression.
Estimates locally varying coefficients, allowing the relationship between outcome and regressors to change across space.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
values
|
dict[str, float]
|
Mapping |
required |
regressors
|
dict[str, dict[str, float]]
|
Mapping
|
required |
coordinates
|
dict[str, tuple[float, float]]
|
Mapping |
required |
bandwidth
|
float | None
|
Fixed bandwidth. When |
None
|
kernel
|
str
|
Kernel function. One of |
'bisquare'
|
Returns:
| Name | Type | Description |
|---|---|---|
A |
GWRResult
|
class: |
GWRResult
|
local R-squared values, bandwidth, and fit statistics. |
Raises:
| Type | Description |
|---|---|
ImportError
|
If mgwr is not installed. |
ValueError
|
If fewer than 5 observations are provided. |
Source code in src/nyc311/stats/_gwr.py
37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 | |
fit_hawkes_process ¶
fit_hawkes_process(
event_times: Any,
*,
kernel: str = "exponential",
max_iter: int = 1000,
) -> HawkesResult
Fit a univariate Hawkes process to event timestamps.
The conditional intensity is:
lambda(t) = mu + sum_{t_i < t} alpha * beta * exp(-beta * (t - t_i))
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
event_times
|
Any
|
Array-like of event timestamps as floats (e.g. seconds since epoch, or days since start). |
required |
kernel
|
str
|
Triggering kernel type. Currently only
|
'exponential'
|
max_iter
|
int
|
Maximum iterations for the EM algorithm. |
1000
|
Returns:
| Name | Type | Description |
|---|---|---|
A |
HawkesResult
|
class: |
HawkesResult
|
kernel parameters, branching ratio, and log-likelihood. |
Raises:
| Type | Description |
|---|---|
ImportError
|
If numpy or scipy is not installed. |
ValueError
|
If fewer than 3 events are provided. |
Source code in src/nyc311/stats/_hawkes.py
43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 | |
interrupted_time_series ¶
interrupted_time_series(
series: Any,
intervention_date: date,
*,
covariates: Any | None = None,
) -> ITSResult
Fit a segmented interrupted-time-series regression.
Estimates pre-intervention level and trend, the immediate level
change at intervention_date, and the post-intervention trend
change, following the standard ITS regression specification.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
series
|
Any
|
A |
required |
intervention_date
|
date
|
The date the intervention began. Observations on or after this date are treated as post-intervention. |
required |
covariates
|
Any | None
|
Optional |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
An |
ITSResult
|
class: |
ITSResult
|
changes at |
|
ITSResult
|
trend coefficients, and the full model summary string. |
Raises:
| Type | Description |
|---|---|
ImportError
|
If statsmodels or pandas is not installed. Install
the optional stats extra with |
TypeError
|
If |
Source code in src/nyc311/stats/_its.py
31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 | |
panel_fixed_effects ¶
panel_fixed_effects(
panel: PanelDataset,
outcome: str,
regressors: tuple[str, ...],
*,
time_effects: bool = False,
cluster: Literal["entity", "time", "both"] = "entity",
) -> PanelRegressionResult
Estimate a fixed-effects panel regression.
Wraps :class:linearmodels.panel.PanelOLS with entity fixed effects
by default and optional two-way fixed effects.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
panel
|
PanelDataset
|
A :class: |
required |
outcome
|
str
|
Name of the dependent variable column. |
required |
regressors
|
tuple[str, ...]
|
Names of independent variable columns. |
required |
time_effects
|
bool
|
When |
False
|
cluster
|
Literal['entity', 'time', 'both']
|
Cluster standard errors by |
'entity'
|
Returns:
| Name | Type | Description |
|---|---|---|
A |
PanelRegressionResult
|
class: |
PanelRegressionResult
|
errors, p-values, R-squared, observation counts, and the full |
|
PanelRegressionResult
|
|
Raises:
| Type | Description |
|---|---|
ImportError
|
If |
ValueError
|
If |
Source code in src/nyc311/stats/_panel_models.py
102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 | |
panel_random_effects ¶
panel_random_effects(
panel: PanelDataset,
outcome: str,
regressors: tuple[str, ...],
) -> PanelRegressionResult
Estimate a random-effects panel regression.
Wraps :class:linearmodels.panel.RandomEffects.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
panel
|
PanelDataset
|
A :class: |
required |
outcome
|
str
|
Name of the dependent variable column. |
required |
regressors
|
tuple[str, ...]
|
Names of independent variable columns. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
A |
PanelRegressionResult
|
class: |
PanelRegressionResult
|
errors, p-values, R-squared, observation counts, and the full |
|
PanelRegressionResult
|
|
Raises:
| Type | Description |
|---|---|
ImportError
|
If |
ValueError
|
If |
Source code in src/nyc311/stats/_panel_models.py
183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 | |
minimum_detectable_effect ¶
minimum_detectable_effect(
n_units: int,
n_periods: int,
*,
icc: float = 0.05,
alpha: float = 0.05,
power: float = 0.8,
proportion_treated: float = 0.5,
outcome_variance: float = 1.0,
r_squared: float = 0.0,
) -> PowerResult
Compute the minimum detectable effect for a panel experiment.
Uses the standard cluster-RCT MDE formula:
MDE = (z_{alpha/2} + z_{beta}) * sqrt(2 * sigma^2 * DE / (N_t * T))
where DE = 1 + (T - 1) * ICC is the design effect.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n_units
|
int
|
Total number of geographic units (clusters). |
required |
n_periods
|
int
|
Number of time periods observed. |
required |
icc
|
float
|
Intra-cluster correlation coefficient. Defaults to
|
0.05
|
alpha
|
float
|
Significance level. Defaults to |
0.05
|
power
|
float
|
Statistical power (1 - beta). Defaults to |
0.8
|
proportion_treated
|
float
|
Fraction of units assigned to treatment.
Defaults to |
0.5
|
outcome_variance
|
float
|
Variance of the outcome variable. Defaults
to |
1.0
|
r_squared
|
float
|
Proportion of variance explained by covariates.
Defaults to |
0.0
|
Returns:
| Name | Type | Description |
|---|---|---|
A |
PowerResult
|
class: |
PowerResult
|
parameters. |
Raises:
| Type | Description |
|---|---|
ImportError
|
If scipy is not installed. Install with
|
ValueError
|
If any parameter is out of its valid range. |
Source code in src/nyc311/stats/_power.py
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 | |
regression_discontinuity ¶
regression_discontinuity(
running_variable: Any,
outcome: Any,
cutoff: float = 0.0,
*,
kernel: str = "triangular",
bandwidth: float | None = None,
polynomial_order: int = 1,
) -> RDResult
Estimate a local treatment effect at a sharp cutoff.
Fits local polynomials on each side of the cutoff, using the
Imbens-Kalyanaraman (IK) or Calonico-Cattaneo-Titiunik (CCT)
bandwidth selector when bandwidth is None.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
running_variable
|
Any
|
Array-like running (assignment) variable. |
required |
outcome
|
Any
|
Array-like outcome variable of the same length. |
required |
cutoff
|
float
|
The threshold value of the running variable.
Defaults to |
0.0
|
kernel
|
str
|
Kernel for local weighting. One of |
'triangular'
|
bandwidth
|
float | None
|
Bandwidth for the local polynomial fit. When
|
None
|
polynomial_order
|
int
|
Degree of the local polynomial.
Defaults to |
1
|
Returns:
| Name | Type | Description |
|---|---|---|
An |
RDResult
|
class: |
RDResult
|
robust standard error, bias-corrected confidence interval, |
|
RDResult
|
effective sample sizes, and bandwidth. |
Raises:
| Type | Description |
|---|---|
ImportError
|
If numpy or scipy is not installed. |
ValueError
|
If arrays are mismatched or too few observations exist on either side. |
Source code in src/nyc311/stats/_rdd.py
44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 | |
latent_reporting_bias_em ¶
latent_reporting_bias_em(
complaint_counts: dict[str, int],
populations: dict[str, int],
covariates: dict[str, dict[str, float]] | None = None,
*,
max_iter: int = 200,
tol: float = 1e-06,
) -> LatentReportingResult
Estimate true complaint rates via expectation-maximization.
Models observed counts as a product of a latent true rate and a reporting probability. The EM algorithm iterates between estimating true rates (M-step, Poisson MLE) and reporting probabilities (M-step, logistic on covariates).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
complaint_counts
|
dict[str, int]
|
Mapping |
required |
populations
|
dict[str, int]
|
Mapping |
required |
covariates
|
dict[str, dict[str, float]] | None
|
Optional mapping
|
None
|
max_iter
|
int
|
Maximum EM iterations. |
200
|
tol
|
float
|
Convergence tolerance on log-likelihood change. |
1e-06
|
Returns:
| Name | Type | Description |
|---|---|---|
A |
LatentReportingResult
|
class: |
LatentReportingResult
|
reporting probabilities, and convergence diagnostics. |
Raises:
| Type | Description |
|---|---|
ImportError
|
If numpy or scipy is not installed. |
Source code in src/nyc311/stats/_reporting_bias.py
143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 | |
reporting_rate_adjustment ¶
reporting_rate_adjustment(
panel: PanelDataset,
outcome: str,
demographic_covariates: tuple[str, ...],
) -> ReportingAdjustmentResult
Adjust complaint rates for neighborhood reporting propensity.
Fits a mixed-effects model with unit random intercepts:
outcome ~ covariates + (1 | unit_id)
The random intercepts capture unit-level reporting propensity after controlling for demographic covariates.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
panel
|
PanelDataset
|
A :class: |
required |
outcome
|
str
|
Column name for the complaint rate to adjust. |
required |
demographic_covariates
|
tuple[str, ...]
|
Column names for demographic controls (e.g. median income, population density). |
required |
Returns:
| Name | Type | Description |
|---|---|---|
A |
ReportingAdjustmentResult
|
class: |
ReportingAdjustmentResult
|
rates, random intercepts, ICC, and model summary. |
Raises:
| Type | Description |
|---|---|
ImportError
|
If statsmodels or pandas is not installed. |
Source code in src/nyc311/stats/_reporting_bias.py
60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 | |
global_morans_i ¶
global_morans_i(
values: dict[str, float],
weights: dict[str, dict[str, float]],
) -> MoranResult
Compute Global Moran's I for values under spatial weights.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
values
|
dict[str, float]
|
Mapping |
required |
weights
|
dict[str, dict[str, float]]
|
Nested dict |
required |
Returns:
| Name | Type | Description |
|---|---|---|
A |
MoranResult
|
class: |
MoranResult
|
permutation-based p-value, the standardized z-score, and the |
|
MoranResult
|
expected value under the null hypothesis. |
Raises:
| Type | Description |
|---|---|
ImportError
|
If |
Source code in src/nyc311/stats/_spatial.py
53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 | |
local_morans_i ¶
local_morans_i(
values: dict[str, float],
weights: dict[str, dict[str, float]],
*,
permutations: int = 999,
) -> LISAResult
Compute Local Moran's I (LISA) for hotspot/coldspot identification.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
values
|
dict[str, float]
|
Mapping |
required |
weights
|
dict[str, dict[str, float]]
|
Nested dict |
required |
permutations
|
int
|
Number of conditional permutations used to derive pseudo p-values. |
999
|
Returns:
| Name | Type | Description |
|---|---|---|
A |
LISAResult
|
class: |
LISAResult
|
p-values, and quadrant cluster labels ( |
|
LISAResult
|
|
Raises:
| Type | Description |
|---|---|
ImportError
|
If |
Source code in src/nyc311/stats/_spatial.py
103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 | |
spatial_error_model ¶
spatial_error_model(
panel: PanelDataset,
weights: dict[str, dict[str, float]],
outcome: str,
regressors: tuple[str, ...],
*,
period: str | None = None,
) -> SpatialErrorResult
Fit a spatial error (SEM) model via maximum likelihood.
Estimates: y = X @ beta + u, u = lambda * W @ u + epsilon
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
panel
|
PanelDataset
|
A :class: |
required |
weights
|
dict[str, dict[str, float]]
|
Nested dict |
required |
outcome
|
str
|
Column name for the dependent variable. |
required |
regressors
|
tuple[str, ...]
|
Column names for the independent variables. |
required |
period
|
str | None
|
If given, extract only this period as a cross-section.
If |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
A |
SpatialErrorResult
|
class: |
SpatialErrorResult
|
spatial error parameter (lambda), and fit statistics. |
Raises:
| Type | Description |
|---|---|
ImportError
|
If spreg or libpysal is not installed. |
Source code in src/nyc311/stats/_spatial_regression.py
142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 | |
spatial_lag_model ¶
spatial_lag_model(
panel: PanelDataset,
weights: dict[str, dict[str, float]],
outcome: str,
regressors: tuple[str, ...],
*,
period: str | None = None,
) -> SpatialLagResult
Fit a spatial lag (SAR) model via maximum likelihood.
Estimates: y = rho * W @ y + X @ beta + epsilon
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
panel
|
PanelDataset
|
A :class: |
required |
weights
|
dict[str, dict[str, float]]
|
Nested dict |
required |
outcome
|
str
|
Column name for the dependent variable. |
required |
regressors
|
tuple[str, ...]
|
Column names for the independent variables. |
required |
period
|
str | None
|
If given, extract only this period as a cross-section.
If |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
A |
SpatialLagResult
|
class: |
SpatialLagResult
|
spatial autoregressive parameter (rho), and fit statistics. |
Raises:
| Type | Description |
|---|---|
ImportError
|
If spreg or libpysal is not installed. |
Source code in src/nyc311/stats/_spatial_regression.py
68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 | |
event_study ¶
event_study(
panel: PanelDataset,
outcome: str,
*,
covariates: tuple[str, ...] = (),
pre_periods: int = 5,
post_periods: int = 5,
reference_period: int = -1,
) -> EventStudyResult
Estimate event-study coefficients with pre-trend diagnostics.
Computes mean differences between treated and control units at
each relative time period, with reference_period normalized
to zero.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
panel
|
PanelDataset
|
A :class: |
required |
outcome
|
str
|
Column name for the outcome variable. |
required |
covariates
|
tuple[str, ...]
|
Additional control variable column names. |
()
|
pre_periods
|
int
|
Number of pre-treatment periods to include. |
5
|
post_periods
|
int
|
Number of post-treatment periods to include. |
5
|
reference_period
|
int
|
Relative period to normalize to zero.
Defaults to |
-1
|
Returns:
| Name | Type | Description |
|---|---|---|
An |
EventStudyResult
|
class: |
EventStudyResult
|
period, confidence intervals, and a pre-trend F-test. |
Raises:
| Type | Description |
|---|---|
ImportError
|
If required packages are not installed. |
Source code in src/nyc311/stats/_staggered_did.py
198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 | |
staggered_did ¶
staggered_did(
panel: PanelDataset,
outcome: str,
*,
covariates: tuple[str, ...] = (),
cluster: str = "entity",
) -> StaggeredDiDResult
Estimate group-time ATTs under staggered treatment adoption.
Uses two-way fixed effects with interaction terms for each treatment cohort and post-treatment period, avoiding the well-documented bias of naive TWFE under staggered rollouts.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
panel
|
PanelDataset
|
A :class: |
required |
outcome
|
str
|
Column name for the outcome variable. |
required |
covariates
|
tuple[str, ...]
|
Additional control variable column names. |
()
|
cluster
|
str
|
Clustering level for standard errors. One of
|
'entity'
|
Returns:
| Name | Type | Description |
|---|---|---|
A |
StaggeredDiDResult
|
class: |
StaggeredDiDResult
|
aggregated ATT, and confidence intervals. |
Raises:
| Type | Description |
|---|---|
ImportError
|
If required packages are not installed. |
ValueError
|
If no treatment events are found. |
Source code in src/nyc311/stats/_staggered_did.py
64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 | |
synthetic_control ¶
synthetic_control(
panel: PanelDataset,
treated_unit: str,
outcome: str,
*,
predictors: tuple[str, ...] = (),
n_placebo_runs: int = 0,
) -> SyntheticControlResult
Estimate a treatment effect using the synthetic control method.
Constructs a weighted combination of untreated donor units that best reproduces the treated unit's pre-treatment trajectory, then measures the post-treatment divergence as the treatment effect.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
panel
|
PanelDataset
|
A :class: |
required |
treated_unit
|
str
|
The unit ID of the treated unit. |
required |
outcome
|
str
|
Column name for the outcome variable. |
required |
predictors
|
tuple[str, ...]
|
Additional predictor columns for matching. |
()
|
n_placebo_runs
|
int
|
Number of in-space placebos for inference.
When |
0
|
Returns:
| Name | Type | Description |
|---|---|---|
A |
SyntheticControlResult
|
class: |
SyntheticControlResult
|
counterfactual series, treatment effects, and optionally a |
|
SyntheticControlResult
|
placebo p-value. |
Raises:
| Type | Description |
|---|---|
ImportError
|
If pysyncon is not installed. |
ValueError
|
If the treated unit is not found in the panel. |
Source code in src/nyc311/stats/_synthetic_control.py
40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 | |
CLI¶
nyc311.cli ¶
Command-line entrypoints for nyc311.
main ¶
main(argv: Sequence[str] | None = None) -> int
Run the implemented fetch and complaint-topic export commands.
Source code in src/nyc311/cli/_main.py
13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 | |