Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate document corpus definition from indices #366

Closed
danielmitterdorfer opened this issue Nov 17, 2017 · 1 comment
Closed

Separate document corpus definition from indices #366

danielmitterdorfer opened this issue Nov 17, 2017 · 1 comment
Labels
enhancement Improves the status quo :Track Management New operations, changes in the track format, track download changes and the like
Milestone

Comments

@danielmitterdorfer
Copy link
Member

danielmitterdorfer commented Nov 17, 2017

Currently the data file definition is tied to an index type. E.g.:

  "indices": [
    {
      "name": "geonames",
      "types": [
        {
          "name": "type",
          "mapping": "mappings.json",
          "documents": "documents.json.bz2",
          "document-count": 11396505,
          "compressed-bytes": 264698741,
          "uncompressed-bytes": 3547614383
        }
      ]
    }
  ]

To allow for more flexibility we should separate them, e.g. by defining them in a separate corpora block. This would also give us more flexibility in allowing other formats (e.g. snapshots, see #341):

"corpora": [
  {
    "name": "geonames",
    "documents": [
      {
        "source-file": "documents.json.bz2",
        "document-count": 11396505,
          "compressed-bytes": 264698741,
          "uncompressed-bytes": 3547614383
      }
    ]
  }
]

We will also add an additional parameter to the bulk index runner to tell it which document corpus it should use (called corpora).

Note: For backwards-compatibility we should not remove the old document-related properties from our standard tracks yet.

@danielmitterdorfer danielmitterdorfer added :Track Management New operations, changes in the track format, track download changes and the like enhancement Improves the status quo labels Nov 17, 2017
@danielmitterdorfer danielmitterdorfer added this to the 0.9.0 milestone Nov 17, 2017
@danielmitterdorfer
Copy link
Member Author

Note: Ensure that scenario described in #325 is supported by our implementation.

danielmitterdorfer added a commit to elastic/rally-tracks that referenced this issue Jan 10, 2018
danielmitterdorfer added a commit to elastic/rally-tracks that referenced this issue Jan 10, 2018
danielmitterdorfer added a commit to elastic/rally-tracks that referenced this issue Jan 10, 2018
danielmitterdorfer added a commit to elastic/rally-tracks that referenced this issue Jan 10, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improves the status quo :Track Management New operations, changes in the track format, track download changes and the like
Projects
None yet
Development

No branches or pull requests

1 participant