Your Ultimate Guide to Wrangling Satellite Data
Weather- and land data is unique and gaining importance

The weather is probably the world’s oldest ice-breaker (excuse the pun). It’s also more important than ever: more 55 percent of the world’s economy depends on natural resources. These resources grow and perish with the fluctuations of the weather.
Because of climate change, more and more natural resources are at risk. We depend on these to power just about everything from food to textiles to building materials, so being exposed to weather extremes really isn’t that great.
But this post isn’t about climate change or weather threats. What you’ll learn in this post can be used to examine that. For now, however, I’d like to show you how to use weather data.
Luckily, high-quality open access weather data exists and can be retrieved by anyone. It’s mostly satellite data but can also come in the from weather stations and other sources. Aside from pure weather data, you can often also access data relating to land use, soil health, urbanization, or vegetation levels through those same channels.
Where to Get Satellite Weather Data?
We’ll be focusing on public data sources here. Commercial satellite data is available, notably through Planet, Maxar, and Spire. However, it usually costs a lot of money—and public data can frankly get you a very long way.
The good news is that public satellite datasets are fairly centralized and can be accessed by setting up less than a handful of platforms. Usually, you’ll only need one or two anyway though. We’ll go through them in more detail below so that you have an overview of what to do, whatever you might be building. The table below shows what datasources are available and when to use them:
Setting up Copernicus Climate Data Store (CDS)
If you are interested in historical climate data or various forecasts, then Copernicus is your friend. To set it up, you’ll need to do the following:
Install the API client by typing in your console:
pip install cdsapi.Add the API credentials to
~/.cdsapirc.
More detailed instructions can be found on the Copernicus site.
Setting up EOSDIS
To get NASA’s EOSDIS data, for example for climate monitoring, you proceed in the following way:
Install the API client by typing in your console:
pip install earthaccess.Retrieve the data, e.g., by checking out the example on their site.
Setting up NOAA NCEI
To access real-time data weather data and reanalysis data through NOAA, you will need to do the following:
Make a request following the instructions on the site.
It’s worth pointing out that NOAA’s site is not half as easy to navigate as the other sources featured here. Beginners might find this rather frustrating.
Setting up Google Earth Engine (GEE)
GEE differs from the other data sources in the sense that it is a whole cloud platform through which one has access to Copernicus- and NASA data (but not NOAA). If you prefer working in the cloud and visualize your data in simple ways, GEE is for you. Here’s how to set it up:
Install the API client by typing in your console:
pip install earthengine-api.Alternatively, you can use their online code editor—follow the documentation here.
GEE is very beginner-friendly, but tends to reach the limits of its capacities when requesting and processing more granular data or complex machine learning workflows. Either way, Copernicus and NASA data access are also very easy to set up—the choice what to use is really yours.
Handling Satellite Weather Data in Python
Many satellite datasets are not stored in a simple CSV—it would be too big and bulky in many cases. To compress this data, NetCDF or HDF5 formats are used, whereby NetCDF is slightly easier to get started with than HDF5.
In Python, the following packages will be worth installing:
xarrayis efficient for multi-dimensional climate datasets.rasterio&GDALfor geospatial processing of satellite imagery.h5pyfor HDF5 files.matplotlibandcartopyfor plotting geospatial weather data.geopandasto edit geographical data, for example using a shapefile.
Let’s consider an example data extraction and processing using ERA5, which is available on Copernicus CDS. Once your account with Copernicus is set up, the following code will help you retrieve the worldwide temperature on January 1st, 2024:
import cdsapi # Copernicus API client
c = cdsapi.Client()
c.retrieve(
'reanalysis-era5-single-levels',
{
'variable': '2m_temperature',
'product_type': 'reanalysis',
'year': '2024',
'month': '01',
'day': '01',
'format': 'netcdf',
},
'era5_temperature.nc'
)You’ll note that we are downloading the file as a NetCDF in the above code. To process this data file, we’ll need xarray. If we wanted to show the temperatures in Europe on the first day of 2024, we’d use the following code:
import xarray as xr
import matplotlib.pyplot as plt
# Load NetCDF file
ds = xr.open_dataset("era5_temperature.nc")
# Select a region (e.g., Europe)
europe = ds.sel(latitude=slice(60, 30), longitude=slice(-10, 40))
# Plot temperature trend
europe['t2m'].mean(dim=['longitude', 'latitude']).plot()
plt.title("Average 2m Temperature Over Europe (Jan 2024)")
plt.show()Sometimes, we need to reproject data to another coordinate system. This is fairly straightforward with rioxarray:
import rioxarray as rxr
# Convert dataset to a GeoTIFF-friendly format
temp_rio = temp_celsius.rio.write_crs("EPSG:4326") # Assign correct CRS
# Save the dataset as a GeoTIFF
temp_rio.rio.to_raster("era5_temperature_reprojected.tif")On the whole, these techniques are not awfully difficult. However, for large datasets, challenges can arise when you hit the limits of your machine. Also, this type of data is structurally different than much other data that you might have come across in the past. This means that you’ll need to get used to these packages and commands if you plan to use weather data regularly.
Next, we’ll look at a couple of use cases of weather data for assessing stock prices and climate risks. We’ll then conclude this piece with some surprising insights.



