Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

getDocumentsPerTopicsProbabilities Undefined offset: 0 #64

Open
slava-vishnyakov opened this issue Apr 21, 2018 · 2 comments · May be fixed by #68
Open

getDocumentsPerTopicsProbabilities Undefined offset: 0 #64

slava-vishnyakov opened this issue Apr 21, 2018 · 2 comments · May be fixed by #68

Comments

@slava-vishnyakov
Copy link

I'm trying to follow http://php-nlp-tools.com/posts/introducing-latent-dirichlet-allocation.html
But trying to call getDocumentsPerTopicsProbabilities at the end:

$docs = [
    'The queen does something',
    'Queen is very good queen',
    'Mission mission mission',
    'What is mission your mission'
];

$tok = new WhitespaceTokenizer();
$tset = new TrainingSet();
foreach ($docs as $line) {
    $tset->addDocument(
        '', // the class is not used by the lda model
        new TokensDocument(
            $tok->tokenize(
                mb_strtolower($line)
            )
        )
    );
}

$lda = new Lda(
    new DataAsFeatures(), // a feature factory to transform the document data
    2, // the number of topics we want
    1, // the dirichlet prior assumed for the per document topic distribution
    1  // the dirichlet prior assumed for the per word topic distribution
);

$lda->train($tset,50);

$lda->getDocumentsPerTopicsProbabilities(2);

This results in:

Undefined offset: 0 at
vendor/nlp-tools/nlp-tools/src/NlpTools/Models/Lda.php:243

image

This probably requires something along the lines of:

if (!isset($count_topics_docs[$doc])) {
    $count_topics_docs[$doc] = [];
}
if (!isset($count_topics_docs[$doc][$t])) {
    $count_topics_docs[$doc][$t] = 0;
}

also, further down you have a variable $limit_docs, which is undefined, maybe the signature of method is incorrect public function getDocumentsPerTopicsProbabilities($limit_docs = -1), maybe it's $limit_words there?

But, anyways, after running this method on this input:

$docs = [
    'The queen does something',
    'Queen is very good queen',

    'Mission mission mission',
    'What is mission your mission'
];
...
$lda->getDocumentsPerTopicsProbabilities(2);

I get this result:

[
0.3333333333333333,
0.3333333333333333,
0.3333333333333333,
0.3333333333333333
]

And I'm not sure how to interpret that... :)

Thanks!

slava-vishnyakov added a commit to slava-vishnyakov/php-nlp-tools that referenced this issue Apr 21, 2018
This was referenced Apr 21, 2018
@slava-vishnyakov
Copy link
Author

slava-vishnyakov commented Apr 21, 2018

One thing that might be is that it should be returning $p_t_d instead of $p, but that has no useful information either..

image

$p_t_d is
array:2 [▼
  0 => array:4 [
    0 => 0.33333333333333
    1 => 0.33333333333333
    2 => 0.33333333333333
    3 => 0.33333333333333
  ]
  1 => & array:4 [
    0 => 0.33333333333333
    1 => 0.33333333333333
    2 => 0.33333333333333
    3 => 0.33333333333333
  ]
]

@slava-vishnyakov
Copy link
Author

Ok, maybe I have figured this out in PR #67

image

@slava-vishnyakov slava-vishnyakov linked a pull request Apr 21, 2018 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant