The data provided has flags you should understand. Definitions below are from the USHCN Status file.
Data Measurement Flag
blank = no measurement information applicable
a-i = number of days missing in calculation of monthly mean
temperature
E = The value is estimated using values from surrounding
stations because a monthly value could not be computed
from daily data; or,
the pairwise homogenization algorithm removed the value
because of too many apparent inhomogeneities occuring
close together in time.
Quality Control Flag
BLANK = no failure of quality control check or could not be
evaluated.
D = monthly value is part of an annual series of values that
are exactly the same (e.g. duplicated) within another
year in the station's record.
I = checks for internal consistency between TMAX and TMIN.
Flag is set when TMIN > TMAX for a given month.
L = monthly value is isolated in time within the station
record, and this is defined by having no immediate non-
missing values 18 months on either side of the value.
M = Manually flagged as erroneous.
O = monthly value that is >= 5 bi-weight standard deviations
from the bi-weight mean. Bi-weight statistics are
calculated from a series of all non-missing values in
the station's record for that particular month.
S = monthly value has failed spatial consistency check.
Any value found to be between 2.5 and 5.0 bi-weight
standard deviations from the bi-weight mean, is more
closely scrutinized by exmaining the 5 closest neighbors
(not to exceed 500.0 km) and determine their associated
distribution of respective z-scores. At least one of
the neighbor stations must have a z score with the same
sign as the target and its z-score must be greater than
or equal to the z-score listed in column B (below),
where column B is expressed as a function of the target
z-score ranges (column A).
----------------------------
A | B
----------------------------
4.0 - 5.0 | 1.9
----------------------------
3.0 - 4.0 | 1.8
----------------------------
2.75 - 3.0 | 1.7
----------------------------
2.50 - 2.75 | 1.6
W = monthly value is duplicated from the previous month,
based upon regional and spatial criteria and is only
applied from the year 2000 to the present.
Quality Controlled Adjusted (QCA) QC Flags:
A = alternative method of adjustment used.
M = values with a non-blank quality control flag in the "qcu"
dataset are set to missing the adjusted dataset and given
an "M" quality control flag.
Data Source Flag
Blank = Value was computed from daily data available in GHCN-Daily
Not Blank = Daily data are not available so the monthly value was
obtained from the USHCN version 1 dataset. The possible
Version 1 DSFLAGS are as follows:
1 = NCDC Tape Deck 3220, Summary of the Month Element Digital File
2 = Means Book - Smithsonian Institute, C.A. Schott (1876, 1881 thru 1931)
3 = Manuscript - Original Records, National Climatic Data Center
4 = Climatological Data (CD), monthly NCDC publication
5 = Climate Record Book, as described in History of Climatological Record
Books, U.S. Department of Commerce, Weather Bureau, USGPO (1960)
6 = Bulletin W - Summary of the Climatological Data for the United States (by
section), F.H. Bigelow, U.S. Weather Bureau (1912); and, Bulletin W -
Summary of the Climatological Data for the United States, 2nd Ed.
7 = Local Climatological Data (LCD), monthly NCDC publication
8 = State Climatologists, various sources
B = Professor Raymond Bradley - Refer to Climatic Fluctuations of the Western
United States During the Period of Instrumental Records, Bradley, et. al.,
Contribution No. 42, Dept. of Geography and Geology, University of
Massachusetts (1982)
D = Dr. Henry Diaz, a compilation of data from Bulletin W, LCD, and NCDC Tape
Deck 3220 (1983)
G = Professor John Griffiths - primarily from Climatological Data
Most of these flags I ignore because they either won't change the annual averages or they are the consequence of an opinion. The only flag I care about is the I flag, meaning the reported Tmin is larger than the Tmax for that month. I'll report later how many flags there are in the example dataset I'll use.
Next up: Part 5: Preparing the Data