Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WarpedVRT.read(masked=True) and WarpedVRT.read_masks don't use overviews #1373

Closed
mojodna opened this issue Jun 12, 2018 · 22 comments
Closed

Comments

@mojodna
Copy link
Contributor

mojodna commented Jun 12, 2018

In marblecutter, I've been using external masks with overviews and treating them as normal sources in order to take advantage of the overviews:

https://github.com/mojodna/marblecutter/blob/a4f66cbf7124f25cf15bec8bc0e37ef195119de6/marblecutter/__init__.py#L263-L327

I was thinking that #1361 would eliminate the need for this, but it turns out that the main reason that I'd done that in the first place was that overviews aren't used by read_masks (and when masked=True). (This can be observed both by watching a clock (masked reads are noticeably slower) and by enabling CPL_CURL_VERBOSE when reading remote sources.)

@sgillies
Copy link
Member

Thanks for pointing me at marblecutter's mask reader. That's what I'd like to accomplish in Rasterio, but for nodata more generally (nodata values, alpha bands, sidecars, and internal bitmasks).

As far as I can tell, reading a mask from Rasterio's WarpedVRT when the source has a sidecar or internal bitmask is going to be super slow (requiring materialization of much of the source) until we can get a proper VRT mask band added to the result of GDALCreateWarpedVRT in Rasterio's WarpedVRTReaderBase.

@mojodna
Copy link
Contributor Author

mojodna commented Jun 12, 2018

Yup. I would love to use internal masks some day, but this is a reasonable workaround in the meantime.

@sgillies
Copy link
Member

@mojodna #1378 forbids boundless reads from WarpedVRTs, because the boundless read implementation now uses an ordinary VRT (as an XML string) and there's no way to reference a WarpedVRT from the XML. I think that you should be able to make the call at https://github.com/mojodna/marblecutter/blob/a4f66cbf7124f25cf15bec8bc0e37ef195119de6/marblecutter/__init__.py#L277-L279 obsolete by setting the parent WarpedVRT's extents to this window.

Before I removed WarpedVRT from the boundless read implementation I found that I needed to buffer its extents by a few pixels in order to trigger GDAL's overview use. That's why the width and height of the WarpedVRT in the old boundless read impl were incremented by 1.

@mojodna
Copy link
Contributor Author

mojodna commented Jun 18, 2018

by setting the parent WarpedVRT's extents to this window

Sorry, I don't follow. Are you suggesting that I set the extent for: https://github.com/mojodna/marblecutter/blob/a4f66cbf7124f25cf15bec8bc0e37ef195119de6/marblecutter/__init__.py#L264-L272

If so, how would I do that? Through the combination of dst_width, dst_height, and dst_transform? I'm unclear on how I'd calculate the transform.

Before I removed WarpedVRT from the boundless read implementation I found that I needed to buffer its extents by a few pixels in order to trigger GDAL's overview use. That's why the width and height of the WarpedVRT in the old boundless read impl were incremented by 1.

Does rasterio include a simple way to buffer? (It's been a while since I've touched this bit of code, so I'm less confident in my ability to buffer it myself.)

@sgillies
Copy link
Member

sgillies commented Jun 19, 2018

@mojodna I haven't tested the code below, but given bounds and dst_transform it should be equivalent to your current function and work with Rasterio 1.0b2 unless I've slipped up.

From the bounds and dst_transform we can compute the transform of a VRT that will cover the bounds at the same resolution (have its origin at the upper left of the bounding box). Then we compute the number of rows and columns of pixels it would take to cover the bounds at that resolution.

Use these to construct a WarpedVRT and then read the full extent of it into the target_shape.

    def _read_data():

        w, s, e, n = bounds.bounds
        vrt_transform = Affine.translation(w, n) * Affine.scale(dst_transform.a, dst_transform.e) * Affine.identity()
        vrt_width = math.ceil((e - w) / dst_transform.a)
        vrt_height = math.ceil((s - n) / dst_transform.e)

        with WarpedVRT(
            src,
            src_nodata=nodata,
            dst_crs=bounds.crs,
            dst_width=vrt_width,
            dst_height=vrt_height,
            dst_transform=vrt_transform,
            resampling=resampling,
        ) as vrt:

            data = vrt.read(out_shape=(vrt.count,) + target_shape)

            # mask with NODATA values
            if nodata is not None and vrt.nodata is not None:
                data = _mask(data, vrt.nodata)
            else:
                data = np.ma.masked_array(data, mask=np.ma.nomask)

            return data

Please disregard what I wrote about the pixel buffer, it's not relevant. But if you did want to pad the VRT for other reasons, you could translate its transform matrix west and north by a pixel's distance, add 2 pixels to the width and height, and then call vrt.read(window=vrt.window(*bounds.bounds)).

The key to triggering use of the overviews is this: the resolution of your output, defined by out_shape in your case, must be smaller than the VRT resolution. For example, if your VRT is 1024 x 1024 pixels, you could expect to use any 2x overviews when you call read(out_shape=(3, 512, 512)).

I have an example at https://github.com/mapbox/rasterio/blob/master/tests/test_warpedvrt.py#L329 of external overviews being used when decreasing the resolution of read data by a factor of two.

@vincentsarago
Copy link
Member

FYI, here is how I manage to fetch the overview and doing a boundless like tile read (using ☝️ Sean guidance) https://gist.github.com/vincentsarago/bf2949dc0e3488def3caab71f3ced416

@mojodna
Copy link
Contributor Author

mojodna commented Jul 6, 2018

I'm still having problems with this. When overviews are triggered on a source with an internal mask, the materialized mask appears to be incorrect:

# rio insp https://mojodna-temp.s3.amazonaws.com/internal-mask.tif

from affine import Affine
from rasterio.vrt import WarpedVRT


# 21/1251419/852238
transform1 = Affine(0.04, 0, 3876179.03, 0, -0.04, 3751873.31)
bounds1 = (3876179.0321125016, 3751854.20560666, 3876198.141369573, 3751873.314863731)

# 20/625709/426119
transform2 = Affine(0.04, 0, 3876159.92, 0, -0.04, 3751873.31)
bounds2 = (3876159.9228554303, 3751835.0963495895, 3876198.141369573, 3751873.314863731)

with WarpedVRT(
    src,
    src_nodata=None,
    crs="EPSG:3857",
    width=480,
    height=480,
    transform=transform1,
    add_alpha=True,
) as vrt:
    alpha = vrt.read(4, out_shape=(512, 512), window=vrt.window(*bounds1))
    print(alpha)
    assert(alpha[0][-1] == 0)
    
with WarpedVRT(
    src,
    src_nodata=None,
    crs="EPSG:3857",
    width=960,
    height=960,
    transform=transform2,
    add_alpha=True,
) as vrt:
    alpha = vrt.read(4, out_shape=(512, 512), window=vrt.window(*bounds2))
    print(alpha)
    assert(alpha[0][-1] == 0)

21/1251419/852238:
iterm2 jgordj

20/625709/426119:
iterm2 swyqwh

@vincentsarago
Copy link
Member

@mojodna I'm pretty sure it's a conflict because your file has internal nodata value

gdalinfo internal-mask.tif | grep "NoData"
  NoData Value=-10000
  NoData Value=-10000
  NoData Value=-10000

I ran your file through rio-cogeo and the result looks 👌

capture d ecran 2018-07-06 a 19 00 38

@mojodna
Copy link
Contributor Author

mojodna commented Jul 6, 2018

-10000 isn't a valid NODATA value for a byte source though...

I'll see if I can track down the original to figure out where that came from.

@mojodna
Copy link
Contributor Author

mojodna commented Jul 6, 2018

Oh, and I'm explicitly stating that src_nodata=None

@mojodna
Copy link
Contributor Author

mojodna commented Jul 6, 2018

>>> with WarpedVRT(
...     src,
...     src_nodata=None,
...     crs="EPSG:3857",
...     width=480,
...     height=480,
...     transform=transform1,
...     add_alpha=True,
... ) as vrt:
...     print(vrt.nodata)
...     print(src.nodata)
None
None

@mojodna
Copy link
Contributor Author

mojodna commented Jul 6, 2018

@vincentsarago can you share the generated COG? I'd like to see how it's handling masking.

@vincentsarago
Copy link
Member

@mojodna https://s3-us-west-2.amazonaws.com/remotepixel-pub/cog/internal-mask_cogeo.tif

created it just by doing rio cogeo internal-mask.tif internal-mask_cogeo.tif

@mojodna
Copy link
Contributor Author

mojodna commented Jul 6, 2018

Hmm. According to gdalinfo, NoData is the only substantive difference.

According to rio info, they're identical. rio insp says src.nodata == None.

rasterio says: https://github.com/mapbox/rasterio/blob/6e1b93d7089c82301dba73e0337a52ba8a44ba6c/rasterio/_base.pyx#L458-L463

But GDAL is apparently doing something else under the hood under these circumstances.

@sgillies
Copy link
Member

sgillies commented Jul 9, 2018

@mojodna is it correct to say that at zoom level 21, a proper alpha channel is sourced from the internal bitmask, but at zoom 20, the alpha channel is not sourced from the internal bitmask?

What is the native resolution of your source image? Does it have overviews?

@rouault does this sound like a known (or unknown) GDAL bug?

@rouault
Copy link
Contributor

rouault commented Jul 9, 2018

I've tried

gdalwarp /vsicurl/https://mojodna-temp.s3.amazonaws.com/internal-mask.tif out.vrt -overwrite -t_srs EPSG:3857 -ts 960 960 -te 3876159.9228554303 3751835.0963495895 3876198.141369573 3751873.314863731 -dstalpha  -of vrt -srcnodata none
gdal_translate out.vrt out.tif

and the result looks good.
That said I see that rasterio can take GDALAutoCreateWarpedVRT() or GDALCreateWarpedVRT(). If it takes GDALAutoCreateWarpedVRT(), then the source nodata value is automatically applied in the warping options (gdalwarp -of VRT takes another code path), which might explain the conflict with the internal mask. Having both nodata and mask is a recipee for trouble.

@mojodna
Copy link
Contributor Author

mojodna commented Jul 9, 2018

Correct. I'm treating zoom 21 (w/ 512x512px output) as "native resolution".

Native resolution is ~3.3cm (UTM). Image overviews and mask overviews, both.

Size is 13266, 12631
Coordinate System is:
PROJCS["WGS 84 / UTM zone 36N",
    GEOGCS["WGS 84",
        DATUM["WGS_1984",
            SPHEROID["WGS 84",6378137,298.257223563,
                AUTHORITY["EPSG","7030"]],
            AUTHORITY["EPSG","6326"]],
        PRIMEM["Greenwich",0,
            AUTHORITY["EPSG","8901"]],
        UNIT["degree",0.0174532925199433,
            AUTHORITY["EPSG","9122"]],
        AUTHORITY["EPSG","4326"]],
    PROJECTION["Transverse_Mercator"],
    PARAMETER["latitude_of_origin",0],
    PARAMETER["central_meridian",33],
    PARAMETER["scale_factor",0.9996],
    PARAMETER["false_easting",500000],
    PARAMETER["false_northing",0],
    UNIT["metre",1,
        AUTHORITY["EPSG","9001"]],
    AXIS["Easting",EAST],
    AXIS["Northing",NORTH],
    AUTHORITY["EPSG","32636"]]
Pixel Size = (0.033760000000000,-0.033760000000000)
Band 1 Block=512x512 Type=Byte, ColorInterp=Red
  NoData Value=-10000
  Overviews: 6633x6316, 3317x3158, 1659x1579, 830x790, 415x395
  Mask Flags: PER_DATASET
  Overviews of mask band: 6633x6316, 3317x3158, 1659x1579, 830x790, 415x395
Band 2 Block=512x512 Type=Byte, ColorInterp=Green
  NoData Value=-10000
  Overviews: 6633x6316, 3317x3158, 1659x1579, 830x790, 415x395
  Mask Flags: PER_DATASET
  Overviews of mask band: 6633x6316, 3317x3158, 1659x1579, 830x790, 415x395
Band 3 Block=512x512 Type=Byte, ColorInterp=Blue
  NoData Value=-10000
  Overviews: 6633x6316, 3317x3158, 1659x1579, 830x790, 415x395
  Mask Flags: PER_DATASET
  Overviews of mask band: 6633x6316, 3317x3158, 1659x1579, 830x790, 415x395

@vincentsarago pix4dmapper seems to be the culprit which introduced the bogus NODATA value.

@mojodna
Copy link
Contributor Author

mojodna commented Jul 9, 2018

@rouault try gdal_translate -outsize 512 512 out.vrt out.png to trigger use of overviews--that exhibits the same problem as above.

@rouault
Copy link
Contributor

rouault commented Jul 9, 2018

OK, I could indeed reproduce the issue with gdal_translate -outsize 512 512 out.vrt out.png. Stepping through the execution shows that the mask is still being read. So I've just looked at the content of the TIFF, and apart from the full resolution image whose mask has values at 0 and 255, all masks of overviews have only values at 255. So the issue is in the data (or the software / software procedure that generated it)

@sgillies
Copy link
Member

sgillies commented Jul 9, 2018

@rouault thanks for the diagnosis! 🙏

@mojodna
Copy link
Contributor Author

mojodna commented Jul 9, 2018

@rouault awesome, thanks.

I'm still puzzling through how that might happen; the procedure is here: https://github.com/mojodna/marblecutter-tools/blob/46fed51b65f0f4cb2876cdb01cb214c0575ca64b/bin/transcode.sh#L146-L199

The step that loses the correct mask values is the 2nd gdal_translate where the COG is produced; breaking it down, it seems to be a conflict between -co COPY_SRC_OVERVIEWS=YES and --config GDAL_TIFF_INTERNAL_MASK YES

@mojodna
Copy link
Contributor Author

mojodna commented Jul 9, 2018

COPY_SRC_OVERVIEWS=YES doesn't appear to copy the overviews from the mask band of the source (esp. if it's an external mask containing overviews).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants