Bird Data

To evaluate the relationship between birds and light polluted areas using spacial data I needed both light pollution data and bird distribution data. Light pollution data is easy: I can obtain all the data I need from the National Oceanic And Atmospheric Administration (NOOA). That data is stored in something called GeoTIFF files. The format is very interesting and I'm going to explore this more after the conclusion of this class. It has a lot of potental but I didn't end up actually needing to use it for this project. When I created my interactive map using Carto I discovered that one of the provided baselayer map choices was data from NOOA. Using Carto took care of the light pollution map for me.

To obtain data on the geographical distribution of birds I went to an organization called eBird. This organization, created by the Cornell Lab of Ornithology and National Audubon Society, is at the center of how the birding community reports sightings of bird species. The reported data is aggregated and studied. They use computer models to clean and process the data and model the actual distribution data of each reported bird, unconditional of the locations of the birders making observations and what is actually reported. That modeled data is available for download and use for 107 different bird species.

I'm grateful for this high quality data being made available to me. I found that usable bird distibution data like the kind used in my light pollution map is hard to come by.

Here is the data citation:

Fink, D., T. Auer, A. Johnston, M. Strimas-Mackey, M. Iliff, and S. Kelling. eBird Status and Trends. Version: November 2018. https://ebird.org/science/status-and-trends. Cornell Lab of Ornithology, Ithaca, New York.

After downloading the data I had to organize the data a bit before uploading it to Carto. This is easy to do in Python with the geopandas library.

First I must import some packages.

In [1]:
from pathlib import Path
import geopandas as gpd

This dataset consists of about 2 GB of GeoPackage (GPKG) data files. The below code reads the file for each bird and combines them into a GeoDataFrame.

In [2]:
source_files = Path('/local/ITP/temporary_expert/birddistributiondata/rangedata/')

df = gpd.GeoDataFrame()

for file in source_files.glob('*-range-*.gpkg'):
    df = df.append(gpd.read_file(file.as_posix(), layer=0))

First I dropped some unnecessary columns:

In [3]:
df.drop(['version_year', 'scientific_name', 'layer', 'start_dt', 'end_dt'], axis=1, inplace=True)

Looking at the first few rows of data I see a unique column called geometry. This column contains a list of latitude and longitude coordinates of polygons describing the region each bird occupies during the breeding, nonbreeding, and migration periods. This column is responible for the large file sizes.

In [4]:
df.head()
Out[4]:
species_code common_name season_name area_km2 geometry
0 buwtea Blue-winged Teal nonbreeding 2.144623e+06 (POLYGON ((-81.59041412053013 7.34583282073276...
1 buwtea Blue-winged Teal prebreeding_migration 1.337519e+07 (POLYGON ((-78.08007332219998 7.44096626161466...
2 buwtea Blue-winged Teal breeding 6.728110e+06 (POLYGON ((-92.61402318683363 18.3542833044898...
3 buwtea Blue-winged Teal postbreeding_migration 1.205513e+07 (POLYGON ((-80.26645458232068 8.86175781619752...
0 amerob American Robin nonbreeding 1.202956e+07 (POLYGON ((-95.71327227812202 16.6139574007840...

Some of these birds occupy one region year round. I'm not interested in those birds so I will remove those data rows.

In [5]:
df.season_name.unique()
Out[5]:
array(['nonbreeding', 'prebreeding_migration', 'breeding', 'postbreeding_migration', 'year_round'], dtype=object)

Note there are currently 107 unique birds in the dataset.

In [6]:
len(df.species_code.unique())
Out[6]:
107

Keep only the columns with 'nonbreeding', 'prebreeding_migration', 'breeding', or 'postbreeding_migration' seasons.

In [7]:
df = df[df.season_name.isin(['nonbreeding', 'prebreeding_migration', 'breeding', 'postbreeding_migration'])]

Check that it worked:

In [8]:
df.season_name.unique()
Out[8]:
array(['nonbreeding', 'prebreeding_migration', 'breeding', 'postbreeding_migration'], dtype=object)

Now there are only 95 birds in the dataset.

In [9]:
len(df.species_code.unique())
Out[9]:
95

Note that there are now 375 rows in the table. Since 95 * 4 = 380 this means that there are a few less rows than I would expect. A few birds don't have location data for all four seasons. I will leave those birds in dataset.

In [10]:
len(df)
Out[10]:
375

The data files are very large and the geometry is unnecessarily detailed for my use case. I can simplify them to reduce the file sizes. This is very slow.

The default free Carto account provides 250MB of space for data. If you are an NYU student you can sign up with your NYU account to receive 500MB of data. The below simplification will get the final file sizes to 256 MB, providing me leftover space for future projects.

In [11]:
df['geometry'] = df.simplify(0.0005)

print('done simplifying')
done simplifying

Carto has difficulty importing giant datasets all at once so the data will be broken up into severa small files and uploaded one at a time.

In [12]:
basepath = Path("/local/ITP/temporary_expert/birddistributiondata/processed_range_data")

df.iloc[0:100].to_file(basepath.joinpath("bird_data_000_100.gpkg"), driver="GPKG")
df.iloc[100:200].to_file(basepath.joinpath("bird_data_100_200.gpkg"), driver="GPKG")
df.iloc[200:300].to_file(basepath.joinpath("bird_data_200_300.gpkg"), driver="GPKG")
df.iloc[300:].to_file(basepath.joinpath("bird_data_300_375.gpkg"), driver="GPKG")
WARNING:Fiona:CPLE_NotSupported in b'dataset /local/ITP/temporary_expert/birddistributiondata/processed_range_data/bird_data_000_100.gpkg does not support layer creation option ENCODING'
WARNING:Fiona:CPLE_NotSupported in b'dataset /local/ITP/temporary_expert/birddistributiondata/processed_range_data/bird_data_100_200.gpkg does not support layer creation option ENCODING'
WARNING:Fiona:CPLE_NotSupported in b'dataset /local/ITP/temporary_expert/birddistributiondata/processed_range_data/bird_data_200_300.gpkg does not support layer creation option ENCODING'
WARNING:Fiona:CPLE_NotSupported in b'dataset /local/ITP/temporary_expert/birddistributiondata/processed_range_data/bird_data_300_375.gpkg does not support layer creation option ENCODING'

The four datasets can be combined into one dataset within Carto using SQL INSERT statements like this:

INSERT INTO jim18133.ebird_ranges (the_geom, fid, species_code, common_name, season_name, area_km2)
SELECT the_geom, fid, species_code, common_name, season_name, area_km2
FROM jim18133.bird_data_300_387
WHERE fid >= 20 AND fid < 40;

Finally, I need to use Python to write some HTML for the dropdown. This is copied into the light pollution map.

In [13]:
for _, row in df[~df.duplicated('species_code')][['species_code', 'common_name']].iterrows():
    print(f"<option value=\"{row['species_code']}\">{row['common_name']}</option>")
<option value="buwtea">Blue-winged Teal</option>
<option value="amerob">American Robin</option>
<option value="magwar">Magnolia Warbler</option>
<option value="comloo">Common Loon</option>
<option value="buggna">Blue-gray Gnatcatcher</option>
<option value="easpho">Eastern Phoebe</option>
<option value="buwwar">Blue-winged Warbler</option>
<option value="yebcha">Yellow-breasted Chat</option>
<option value="killde">Killdeer</option>
<option value="chiswi">Chimney Swift</option>
<option value="gnttow">Green-tailed Towhee</option>
<option value="macwar">MacGillivray's Warbler</option>
<option value="larbun">Lark Bunting</option>
<option value="bewwre">Bewick's Wren</option>
<option value="norfli">Northern Flicker</option>
<option value="moudov">Mourning Dove</option>
<option value="rehwoo">Red-headed Woodpecker</option>
<option value="lewwoo">Lewis's Woodpecker</option>
<option value="belvir">Bell's Vireo</option>
<option value="treswa">Tree Swallow</option>
<option value="wesmea">Western Meadowlark</option>
<option value="brnthr">Brown Thrasher</option>
<option value="eastow">Eastern Towhee</option>
<option value="forter">Forster's Tern</option>
<option value="easmea">Eastern Meadowlark</option>
<option value="truswa">Trumpeter Swan</option>
<option value="prowar">Prothonotary Warbler</option>
<option value="brespa">Brewer's Sparrow</option>
<option value="sctfly">Scissor-tailed Flycatcher</option>
<option value="hamfly">Hammond's Flycatcher</option>
<option value="whtspa">White-throated Sparrow</option>
<option value="rufhum">Rufous Hummingbird</option>
<option value="vigswa">Violet-green Swallow</option>
<option value="rewbla">Red-winged Blackbird</option>
<option value="paibun">Painted Bunting</option>
<option value="gockin">Golden-crowned Kinglet</option>
<option value="purfin">Purple Finch</option>
<option value="redhea">Redhead</option>
<option value="bulori">Bullock's Oriole</option>
<option value="blujay">Blue Jay</option>
<option value="btywar">Black-throated Gray Warbler</option>
<option value="libher">Little Blue Heron</option>
<option value="graspa">Grasshopper Sparrow</option>
<option value="doccor">Double-crested Cormorant</option>
<option value="yelwar">Yellow Warbler</option>
<option value="hoowar">Hooded Warbler</option>
<option value="whwdov">White-winged Dove</option>
<option value="orcori">Orchard Oriole</option>
<option value="warvir">Warbling Vireo</option>
<option value="balori">Baltimore Oriole</option>
<option value="wooduc">Wood Duck</option>
<option value="fragul">Franklin's Gull</option>
<option value="horlar">Horned Lark</option>
<option value="rthhum">Ruby-throated Hummingbird</option>
<option value="lazbun">Lazuli Bunting</option>
<option value="sagthr">Sage Thrasher</option>
<option value="sancra">Sandhill Crane</option>
<option value="amwpel">American White Pelican</option>
<option value="nrwswa">Northern Rough-winged Swallow</option>
<option value="baleag">Bald Eagle</option>
<option value="grcfly">Great Crested Flycatcher</option>
<option value="purmar">Purple Martin</option>
<option value="kenwar">Kentucky Warbler</option>
<option value="yebsap">Yellow-bellied Sapsucker</option>
<option value="easblu">Eastern Bluebird</option>
<option value="ovenbi1">Ovenbird</option>
<option value="wesblu">Western Bluebird</option>
<option value="amekes">American Kestrel</option>
<option value="ruckin">Ruby-crowned Kinglet</option>
<option value="moublu">Mountain Bluebird</option>
<option value="calgul">California Gull</option>
<option value="eargre">Eared Grebe</option>
<option value="comyel">Common Yellowthroat</option>
<option value="canwar">Canada Warbler</option>
<option value="indbun">Indigo Bunting</option>
<option value="cedwax">Cedar Waxwing</option>
<option value="lobcur">Long-billed Curlew</option>
<option value="brnpel">Brown Pelican</option>
<option value="amecro">American Crow</option>
<option value="westan">Western Tanager</option>
<option value="margod">Marbled Godwit</option>
<option value="herthr">Hermit Thrush</option>
<option value="ambduc">American Black Duck</option>
<option value="fiespa">Field Sparrow</option>
<option value="pibgre">Pied-billed Grebe</option>
<option value="grnher">Green Heron</option>
<option value="rusbla">Rusty Blackbird</option>
<option value="reevir1">Red-eyed Vireo</option>
<option value="wilfly">Willow Flycatcher</option>
<option value="barswa">Barn Swallow</option>
<option value="cavswa">Cave Swallow</option>
<option value="btbwar">Black-throated Blue Warbler</option>
<option value="woothr">Wood Thrush</option>
<option value="logshr">Loggerhead Shrike</option>
<option value="ameavo">American Avocet</option>