Thursday, July 30, 2015

Moyhu data updates

For a while now, I have been maintaining updated data tables and graphics. Many are collected in the latest data page, but there are also the trend viewer, GHCN stations monthly map, and the hi-res NOAA SST (with updated movies). These mostly just grew, with a patchwork of daily and weekly updates.

I'm now trying to be more systematic, and to test for new data every hour. My scheme downloads only if there is new data, and consumes few resources otherwise. My hope is that all data will be processed and visible within an hour of its appearance.

I have upgraded the log at the bottom of the latest data page. This is supposed to record new data on arrival, with some diagnostics. Column 1 is date, which is actually the date listed at origin, translated to Melbourne time. The column headed "Delay" is the difference between this date and the date when processing is finished and the result should be on the website. I'm using this to find bugs. The date in the first column isn't totally reliable; it is the outcome of various systems, and may predate the actual availability of the data on the web. The second column is the name with link to the actual data. For the bigger files (size, col 3) a dialog box will ask whether to download. The "Time taken" is the time used by my computer in processing (again, for my diagnostics). Where several datasets are processed in the same hourly batch, this time is written against each of them. Currently, only the top few most recent lines of the log are useful, but new data should be correctly recorded in future.

NOAA temperature is a special case. It doesn't have the files I use in a NOAA ftp directory, but serves them with the current time attached. I have to use roundabout methods to decide whether they are new and need to be downloaded (I use their RSS file). By default they show as new every hour - I have measures to correct this, but they may not be perfect. Anyway, the times in the log for NOAA are not meaningful.

I have a scheme for doing the hourly listening only when an update is likely (assuming approx periodicity). If data arrives unexpectedly, it will be caught in a nightly processing.

It is still a bit experimental - I can't conveniently test some aspects other than just waiting for new (monthly) data to appear and be processed. But I think the basics are working.




5 comments:

  1. Nifty. Thanks for doing this.

    FWIW, I noticed the NCEP/NCAR reanalysis surface temp anomaly area weighted global average table hasn't been updated since 25 July. Not sure if that's a problem on their end or your end.

    ReplyDelete
    Replies
    1. Thanks, Ned, yes that was a problem associated with the new scheme. Fixed now.

      Delete
    2. Thank you! That was quick.

      So ... July 2015 looks a lot like July 2014, according to NCEP/NCAR.

      Delete
    3. Ned,
      I have a theory that I'll be putting in the next NCEP review. NCEP has been underpredicting GISS/NOAA for about three months. I think it is because NCEP reports air temp, and basically the bottom grid cell - so about 40 m altitude at center. The surface indices use SST over ocean. Lately SST's have been rising, while land temps have been mixed. I think NCEP may be lagging SST rise.

      This fits with the new paper of Cowtan et al, which I'll probably be writing about.

      Delete
    4. Yes, that makes sense.

      Off-topic, but ... Anthony's post about Cowtan et al. 2015 for some reason includes a graph showing model-vs-data comparisons, not for the surface (as in the paper) but for the "tropical troposphere hotspot". This reminded me that there have been a variety of developments in recent months that seem to just appear and then sink without a trace.

      * Po-Chedley et al. 2015 seemed to present an improved processing method for MSU Middle Troposphere temperatures, which had much higher trends than UAH-TMT and went a long way towards "solving" the trop-trop hotspot issue. But almost nobody discussed it and the usual suspects continue to talk about this "missing hotspot".

      * In a post a few months back, Tamino showed graphs of radiosonde temperature data for the troposphere that look nothing at all like the graphs that S&C or Watts keep posting. What's up with that?

      I'm glad to see Cowtan et al. making improvements in model-data comparisons for the surface, but it'd be great if somebody were to bring that level of attention to the mid-troposphere model-data comparison problem. Are there ways in which those comparisons have been done badly in the past, analogous to what Cowtan et al. are describing for the surface?

      I have no particular point here, just vague stuff that's been on my mind lately.

      Delete