-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
statsMgr information get for studio #194
Conversation
.idea/.gitignore
Outdated
@@ -0,0 +1,8 @@ | |||
# Default ignored files |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please remove files in .idea
directory.
pkg/stats/statsmgr.go
Outdated
var wg sync.WaitGroup | ||
wg.Add(numReadingFiles) | ||
for _, file := range files { | ||
path := file.FailDataPath | ||
go func(path string) { | ||
totalLine += countLines(path) | ||
defer wg.Done() | ||
}(*path) | ||
} | ||
wg.Wait() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There seems to be no need for multiple goroutine here?
pkg/stats/statsmgr.go
Outdated
@@ -92,3 +133,16 @@ func (s *StatsMgr) startWorker(numReadingFiles int) { | |||
} | |||
} | |||
} | |||
|
|||
func countLines(path string) int64 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will cause the file to be read twice, will it affect performance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will the size of the file be considered, not the number of lines in the file? Add TotalSize
, ProcessedSize
?
pkg/stats/statsmgr.go
Outdated
func countLines(path string) int64 { | ||
file, err := os.Open(path) | ||
if err != nil { | ||
return 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't eat err
?
pkg/stats/statsmgr.go
Outdated
totalCount int64 | ||
totalBatches int64 | ||
totalLatency int64 | ||
totalReqTime int64 | ||
} | ||
|
||
func NewStatsMgr(numReadingFiles int) *StatsMgr { | ||
type StatsQuery struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about define StatsMgr
as follows?
type StatsMgr struct {
StatsCh chan base.Stats
DoneCh chan bool
statsRW sync.RWMutex
stats Stats
}
type Stats struct {
NumFailed int64
NumReadFailed int64
TotalLine int64
TotalCount int64
TotalBatches int64
TotalLatency int64
TotalReqTime int64
}
Then, you can define a function func (s *StatsMgr) Stats() Stats
replace the current StatsQuery
function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job!
.gitignore
Outdated
@@ -21,3 +21,4 @@ nebula-importer | |||
|
|||
# IDE | |||
.vscode/ | |||
.idea/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please configure your ide/editor to auto add a new blank line at the end of a file
pkg/stats/statsmgr.go
Outdated
lineCount := int64(0) | ||
for fileScanner.Scan() { | ||
lineCount++ | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it is faster? https://stackoverflow.com/a/24563853
pkg/stats/statsmgr.go
Outdated
// There is no \n in the last line | ||
count := 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you know?
Maybe ?
a
b
or
a
b
pkg/stats/statsmgr.go
Outdated
@@ -92,3 +139,27 @@ func (s *StatsMgr) startWorker(numReadingFiles int) { | |||
} | |||
} | |||
} | |||
|
|||
func countLines(path string) (int64, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should these be considered?
type CSVConfig struct {
WithHeader *bool `json:"withHeader" yaml:"withHeader"`
WithLabel *bool `json:"withLabel" yaml:"withLabel"`
Delimiter *string `json:"delimiter" yaml:"delimiter"`
}
https://github.com/vesoft-inc/nebula-importer/blob/master/pkg/config/config.go#L86-L90
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jievince How do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
pkg/stats/statsmgr.go
Outdated
path := file.Path | ||
withHeader := file.CSV.WithHeader | ||
go func(path string) { | ||
lines, err := csv.CountLines(path, *withHeader) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think counting file lines is impractical for huge file in new coroutine. Why not to do this when reading these data files?
Because Studio needs to know in advance how many lines the file needs to be imported , before importing. |
pkg/csv/reader.go
Outdated
} | ||
|
||
func CountFileBytes(path string, withHeader bool) (int64, error) { | ||
file, err := os.Open(path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The statistical methods of the two functions are different.
countLineBytes
is the sum of bytes of the csv field.CountFileBytes
is to get the file size and then subtract the header bytes.
You can take statistics bytes and don't care about headers.
countLineBytes
get the bytes read.CountFileBytes
get file size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
var _ io.Reader = (*recordReader)(nil)
type recordReader struct {
io.Reader
lastReadBytes int
}
func (r *recordReader) Read(p []byte) (n int, err error) {
n, err = r.Reader.Read(p)
r.lastReadBytes = n
return
}
func (r *CSVReader) InitReader(file *os.File) {
r.rr = &recordReader{Reader: bufio.NewReader(file)}
r.reader = csv.NewReader(r.rr)
...
}
func (r *CSVReader) ReadLine() (base.Data, error) {
line, err := r.reader.Read()
// r.rr.lastReadBytes // get the last read bytes
...
}
pkg/base/stats.go
Outdated
Latency int64 | ||
ReqTime int64 | ||
BatchSize int | ||
ImportedBytesNum int64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about use ImportedBytes
?
pkg/base/types.go
Outdated
@@ -34,26 +34,30 @@ func (op OpType) String() string { | |||
type Data struct { | |||
Type OpType | |||
Record Record | |||
BytesNum int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about use Bytes
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
pkg/stats/statsmgr.go
Outdated
s.totalCount += int64(stat.BatchSize) | ||
s.totalReqTime += stat.ReqTime | ||
s.totalLatency += stat.Latency | ||
s.statsRW.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this lock? Are these stats not be sent by one channel?
please sign the CLA firstly! |
Ok, I hava signed it. |
It's need not to use package main
import (
"bufio"
"encoding/csv"
"fmt"
"io"
"os"
)
type ProgressReader struct {
r io.Reader
totalBytes int64
bytes int64
remainingBytes int64
}
func NewProgressReader(totalBytes int64, r io.Reader) *ProgressReader {
if totalBytes <= 0 {
panic("the totalBytes must greater than 0")
}
return &ProgressReader{
r: r,
totalBytes: totalBytes,
}
}
func (r *ProgressReader) Read(p []byte) (n int, err error) {
n, err = r.r.Read(p)
r.bytes += int64(n)
r.remainingBytes += int64(n)
return n, err
}
func (r *ProgressReader) GetCurrentBytes(offset int64) int64 {
n := r.remainingBytes + offset
r.remainingBytes -= n
return n
}
func (r *ProgressReader) Percentage() float64 {
return 100 * float64(r.bytes) / float64(r.totalBytes)
}
func (r *ProgressReader) PercentageOffset(offset int64) float64 {
return 100 * float64(r.bytes+offset) / float64(r.totalBytes)
}
func (r *ProgressReader) Bytes() int64 {
return r.bytes
}
func main() {
f, err := os.Open("/Users/veezhang/Downloads/annual-enterprise-survey-2020-financial-year-provisional-csv.csv")
if err != nil {
panic(err)
}
stat, err := f.Stat()
if err != nil {
panic(err)
}
totalBytes := stat.Size()
pr := NewProgressReader(totalBytes, f)
br := bufio.NewReader(pr)
r := csv.NewReader(br)
for {
_, err := r.Read()
if err != nil {
if err == io.EOF {
break
}
panic(err)
}
fmt.Printf(
"%.3f %.3f %d %d\n",
pr.Percentage(),
pr.PercentageOffset(int64(-br.Buffered())),
pr.GetCurrentBytes(int64(-br.Buffered())),
-br.Buffered(),
)
}
} The output :
|
pkg/csv/reader.go
Outdated
"github.com/vesoft-inc/nebula-importer/pkg/base" | ||
"github.com/vesoft-inc/nebula-importer/pkg/config" | ||
"github.com/vesoft-inc/nebula-importer/pkg/logger" | ||
"io" | ||
"os" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sort it ?
pkg/csv/reader.go
Outdated
func (r *CSVReader) TotalBytes() (int64) { | ||
for { | ||
if r.initComplete { | ||
return r.totalBytes | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's inappropriate?
if r.Readers == nil { | ||
return &r.stataMgr.Stats | ||
} | ||
r.stataMgr.StatsCh <- base.NewOutputStats() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why add chan when it can be read directly?
pkg/cmd/runner.go
Outdated
|
||
func (r *Runner) QueryStats() *stats.Stats { | ||
if r.stataMgr != nil { | ||
if r.Readers == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why need to check the r.Readers
?
pkg/cmd/runner.go
Outdated
} | ||
} | ||
|
||
func (r *Runner) QueryStats() *stats.Stats { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about return stats.Stats
nor *stats.Stats
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
At present, Nebula Studio is doing multi task import. It need to know the total number of imported lines and the number of imported lines of each task.