Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could you clarify timeouts for the Query? #413

Open
un000 opened this issue Aug 19, 2023 · 10 comments
Open

Could you clarify timeouts for the Query? #413

un000 opened this issue Aug 19, 2023 · 10 comments

Comments

@un000
Copy link
Contributor

un000 commented Aug 19, 2023

I don't understand how to manage timeouts for Query.
I got 20 retries with a sleep of 2 seconds and still issuing timeouts.

	cp := aerospike.NewClientPolicy()
	cp.Timeout = 5*time.Second
	cp.IdleTimeout = 10*time.Second


	qp := aerospike.NewQueryPolicy()
	qp.IncludeBinData = true
	qp.RecordQueueSize = 16 * 1024
	qp.FilterExpression = where
	qp.MaxRetries = 20
	qp.SleepBetweenRetries = 2*time.Second

	rs, err := c.client.Query(qp, statement)
	if err != nil {
		return fmt.Errorf("error executing Query: %w", err)
	}

	for result := range rs.Results() {
		if result.Err != nil {
			c.logger.Error("error scanning results", field.Error(err))     // < Timeout error here
			continue
		}

		if err := processFunc(result); err != nil {
			return fmt.Errorf("process func returned error: %w", err)
		}
	}

Per record processing time 150-250ms with 300 goroutines.
What should I change to increase timeout from aerospike, because after 20 retries ~ after 60-70 seconds of working the code fails?

AS: Aerospike Community Edition build 5.6.0.5
Client: v6.13.0

@khaf
Copy link
Collaborator

khaf commented Aug 25, 2023

Do you know what causes the timeouts? Do you have an unstable cluster/network? I have a bit of trouble reproducing this issue. We have found a case in which in some default configurations, adding a new node to the cluster could exhaust the max retries, but I presume that's not what you are observing here.

@un000
Copy link
Contributor Author

un000 commented Aug 25, 2023

@khaf the cluster is stable. It connected with 10GB local network and there no issues except long partition scans.

@khaf
Copy link
Collaborator

khaf commented Aug 25, 2023

Can you also include your Statement code? And the ExpressionFilter?

@un000
Copy link
Contributor Author

un000 commented Aug 25, 2023

@khaf sure

	statement := aerospike.NewStatement(r.namespace, r.set)
	statement.Filter = aerospike.NewEqualFilter("intbin", 5555)

@khaf
Copy link
Collaborator

khaf commented Aug 25, 2023

And the ExpressionFilter? How many records are there in the set? Do you have an estimate of how many records are going to be returned? And is it an in-memory or flash namespace?

@un000
Copy link
Contributor Author

un000 commented Aug 25, 2023

set:

disable-eviction: "false"
ns: "namespace"
index_populating: "false"
objects: "36306534"
stop-writes-count: "0"
set: "setname"
enable-index: "false"
sindexes: "2"
memory_data_bytes: "33855994789"
device_data_bytes: "31576275424"
truncate_lut: "0"
tombstones: "0"

Indexes:

*************************** 1. row ***************************
ns: "namespace"
bin: "stringbin"
indextype: "NONE"
set: "setname"
state: "RW"
indexname: "stringbin_idx"
path: "stringbin"
type: "STRING"
*************************** 2. row ***************************
ns: "namespace"
bin: "stringbin"
indextype: "NONE"
set: "setname"
state: "RW"
indexname: "intbin_idx"
path: "stringbin"
type: "NUMERIC"

3 nodes setup with a multicast

service {
    cluster-name cluster
    user aerospike
    group aerospike
    paxos-single-replica-limit 1
    proto-fd-max 15000
    migrate-threads 6
}

namespace namespace {
    memory-size 110G
    replication-factor 2
    default-ttl 0
    nsup-period 120
    storage-engine device {
        cold-start-empty true

        file /var/aerospike/a.p1.db
        file /var/aerospike/a.p2.db
        file /var/aerospike/a.p3.db
        file /var/aerospike/a.p4.db
        filesize 64G
        data-in-memory true
        write-block-size 128K
    }
    migrate-sleep 0
    defrag-sleep 0
}

Estimated ~6 mlns of records of ~60 mlns in the set

ExpressionFilter isn't set

@khaf
Copy link
Collaborator

khaf commented Aug 25, 2023

Thanks for the the detailed info. I'm on it, may take a couple of days though.

@artursh
Copy link

artursh commented Aug 26, 2023

Hi. Looks like I have the similar problem. 3 nodes cluster (aerospike-server:5.7.0.24) in k8s. Local network. 1+ billion records in set. In-memory storage.

clientPolicy := aerospike.NewClientPolicy()
clientPolicy.Timeout = 10 * time.Second
clientPolicy.IdleTimeout = 20 * time.Second

sp := aerospike.NewScanPolicy()
sp.RecordQueueSize = 5000
sp.IncludeBinData = false
sp.MaxRetries = 10
sp.SleepBetweenRetries = time.Second

recordset, err := aeroClient.ScanAll()

Reading results in 10 threads. After processing 560 mlns records got error:

ResultCode: NETWORK_ERROR, Iteration: 0, InDoubt: false, Node: A0 10.206.195.132:3000: network error.

/go/pkg/mod/github.com/aerospike/aerospike-client-go/v6@v6.14.0/connection.go:96 github.com/aerospike/aerospike-client-go/v6.errToAerospikeErr()
/go/pkg/mod/github.com/aerospike/aerospike-client-go/v6@v6.14.0/connection.go:262 github.com/aerospike/aerospike-client-go/v6.(*Connection).Read()
/go/pkg/mod/github.com/aerospike/aerospike-client-go/v6@v6.14.0/buffered_connection.go:92 github.com/aerospike/aerospike-client-go/v6.(*bufferedConn).readConn()
/go/pkg/mod/github.com/aerospike/aerospike-client-go/v6@v6.14.0/buffered_connection.go:106 github.com/aerospike/aerospike-client-go/v6.(*bufferedConn).read()
/go/pkg/mod/github.com/aerospike/aerospike-client-go/v6@v6.14.0/multi_command.go:250 github.com/aerospike/aerospike-client-go/v6.(*baseMultiCommand).readBytes()
/go/pkg/mod/github.com/aerospike/aerospike-client-go/v6@v6.14.0/multi_command.go:202 github.com/aerospike/aerospike-client-go/v6.(*baseMultiCommand).parseKey()
/go/pkg/mod/github.com/aerospike/aerospike-client-go/v6@v6.14.0/multi_command.go:292 github.com/aerospike/aerospike-client-go/v6.(*baseMultiCommand).parseRecordResults()
/go/pkg/mod/github.com/aerospike/aerospike-client-go/v6@v6.14.0/multi_command.go:174 github.com/aerospike/aerospike-client-go/v6.(*baseMultiCommand).parseResult()
/go/pkg/mod/github.com/aerospike/aerospike-client-go/v6@v6.14.0/command.go:2745 github.com/aerospike/aerospike-client-go/v6.(*baseCommand).executeAt()
/go/pkg/mod/github.com/aerospike/aerospike-client-go/v6@v6.14.0/command.go:2570 github.com/aerospike/aerospike-client-go/v6.(*baseCommand).execute()
/go/pkg/mod/github.com/aerospike/aerospike-client-go/v6@v6.14.0/multi_command.go:415 github.com/aerospike/aerospike-client-go/v6.(*baseMultiCommand).execute()

I executed app many times, it's scanning normal until 560 mlns and then always breaks at the same point. So full scan never finished. Cluster is stable, all nodes are alive. Try doing scan at different time when cluster is not under high load.

@odinsy
Copy link

odinsy commented Dec 12, 2023

+1 Same problem for us

@merlindeep
Copy link

We have encountered the same issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants