Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metadata not defined in forest_local.py #120

Closed
sipesk opened this issue Apr 29, 2024 · 7 comments
Closed

metadata not defined in forest_local.py #120

sipesk opened this issue Apr 29, 2024 · 7 comments

Comments

@sipesk
Copy link

sipesk commented Apr 29, 2024

Hello,

Phylofisher is great and was running smoothly until i brought the sgt_construct_out.tar.gz to my local machine.

I downloaded forest_local.py and now get an error with the metadata args. I've checked the .tar.gz to make sure than the metadata.tsv are there and contain text

$ python3 forest_local.py -i sgt_constructor_out_Apr.24.2024-local.tar.gz -t 10

multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/usr/local/Cellar/python@3.10/3.10.14/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/usr/local/Cellar/python@3.10/3.10.14/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/pool.py", line 48, in mapstar
    return list(map(*args))
  File "/Users/au706677/Documents/AU/DeepPurple/Cryobio/Leftovers/EUKBINS/phylofisher/forest_local.py", line 197, in suspicious_clades
    groups.add(metadata[org]['Higher Taxonomy'])
NameError: name 'metadata' is not defined
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/au706677/Documents/AU/DeepPurple/Cryobio/Leftovers/EUKBINS/phylofisher/forest_local.py", line 684, in <module>
    suspicious = parallel_susp_clades(trees)
  File "/Users/au706677/Documents/AU/DeepPurple/Cryobio/Leftovers/EUKBINS/phylofisher/forest_local.py", line 503, in parallel_susp_clades
    suspicious = list(pool.map(suspicious_clades, trees))
  File "/usr/local/Cellar/python@3.10/3.10.14/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/pool.py", line 367, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/usr/local/Cellar/python@3.10/3.10.14/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/pool.py", line 774, in get
    raise self._value
NameError: name 'metadata' is not defined
@robert-ervin-jones
Copy link
Member

Hi @sipesk,

Can you try re-running without the -t option? There may be an issue with the parallelization.

Let me know if that works or not.

Thanks for using PhyloFisher!

Best,
Robert

@sipesk
Copy link
Author

sipesk commented May 1, 2024

No dice. I tried both python and python3 as well.

(fisher) au706677@d46989 phylofisher % python3 forest_local.py -i sgt_constructor_out_Apr.24.2024-local.tar.gz
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/usr/local/Cellar/python@3.10/3.10.14/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/usr/local/Cellar/python@3.10/3.10.14/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/pool.py", line 48, in mapstar
    return list(map(*args))
  File "/Users/au706677/Documents/AU/DeepPurple/Cryobio/Leftovers/EUKBINS/phylofisher/forest_local.py", line 197, in suspicious_clades
    groups.add(metadata[org]['Higher Taxonomy'])
NameError: name 'metadata' is not defined
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/au706677/Documents/AU/DeepPurple/Cryobio/Leftovers/EUKBINS/phylofisher/forest_local.py", line 684, in <module>
    suspicious = parallel_susp_clades(trees)
  File "/Users/au706677/Documents/AU/DeepPurple/Cryobio/Leftovers/EUKBINS/phylofisher/forest_local.py", line 503, in parallel_susp_clades
    suspicious = list(pool.map(suspicious_clades, trees))
  File "/usr/local/Cellar/python@3.10/3.10.14/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/pool.py", line 367, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/usr/local/Cellar/python@3.10/3.10.14/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/pool.py", line 774, in get
    raise self._value
NameError: name 'metadata' is not defined

Dir contains metadata.tsv and its a non zero content file.
Screenshot 2024-05-01 at 12 16 02

@shuiyujinlan
Copy link
Contributor

shuiyujinlan commented May 11, 2024

Hi, @robert-ervin-jones . I have the same question in my test with the original "metadata.tsv". How to bypass multiprocessing? And my test in remote server produce no result(except for the empty dir "forest_out_M.D.Y" itself).

Local test error info as below:

python forest_local.py -i sgt_constructor_out_Apr.28.2024-local.tar.gz
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "C:\Program Files\Python38\lib\multiprocessing\pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "C:\Program Files\Python38\lib\multiprocessing\pool.py", line 48, in mapstar
    return list(map(*args))
  File "D:\2024\05\forest_local.py", line 197, in suspicious_clades
    groups.add(metadata[org]['Higher Taxonomy'])
NameError: name 'metadata' is not defined
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "forest_local.py", line 684, in <module>
    suspicious = parallel_susp_clades(trees)
  File "forest_local.py", line 503, in parallel_susp_clades
    suspicious = list(pool.map(suspicious_clades, trees))
  File "C:\Program Files\Python38\lib\multiprocessing\pool.py", line 364, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "C:\Program Files\Python38\lib\multiprocessing\pool.py", line 771, in get
    raise self._value
NameError: name 'metadata' is not defined

After I add "metadata = {}" in line 20, the info changed as below:

>python forest_local.py -i sgt_constructor_out_Apr.28.2024-local.tar.gz
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "C:\Program Files\Python38\lib\multiprocessing\pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "C:\Program Files\Python38\lib\multiprocessing\pool.py", line 48, in mapstar
    return list(map(*args))
  File "D:\2024\05\forest_local.py", line 198, in suspicious_clades
    groups.add(metadata[org]['Higher Taxonomy'])
KeyError: 'Tisolute'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "forest_local.py", line 671, in <module>
    suspicious = parallel_susp_clades(trees)
  File "forest_local.py", line 504, in parallel_susp_clades
    suspicious = list(pool.map(suspicious_clades, trees))
  File "C:\Program Files\Python38\lib\multiprocessing\pool.py", line 364, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "C:\Program Files\Python38\lib\multiprocessing\pool.py", line 771, in get
    raise self._value
KeyError: 'Tisolute'

Then I commented out the function "def parallel_susp_clades(trees)", and changed "suspicious = parallel_susp_clades(trees)" to "suspicious = suspicious_clades(trees)", the error info changed as below:

>python forest_local.py -i sgt_constructor_out_Apr.28.2024-local.tar.gz
Traceback (most recent call last):
  File "forest_local.py", line 672, in <module>
    suspicious = suspicious_clades(trees)
  File "forest_local.py", line 175, in suspicious_clades
    t = Tree(tree)
  File "C:\Program Files\Python38\lib\site-packages\ete3\coretype\tree.py", line 212, in __init__
    read_newick(newick, root_node = self, format=format,
  File "C:\Program Files\Python38\lib\site-packages\ete3\parser\newick.py", line 269, in read_newick
    raise NewickError("'newick' argument must be either a filename or a newick string.")
ete3.parser.newick.NewickError: 'newick' argument must be either a filename or a newick string.
You may want to check other newick loading flags like 'format' or 'quoted_node_names'.

@shuiyujinlan
Copy link
Contributor

Hi, @robert-ervin-jones . I have the same question in my test with the original "metadata.tsv". How to bypass multiprocessing? And my test in remote server produce no result(except for the empty dir "forest_out_M.D.Y" itself).

Local test error info as below:

python forest_local.py -i sgt_constructor_out_Apr.28.2024-local.tar.gz
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "C:\Program Files\Python38\lib\multiprocessing\pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "C:\Program Files\Python38\lib\multiprocessing\pool.py", line 48, in mapstar
    return list(map(*args))
  File "D:\2024\05\forest_local.py", line 197, in suspicious_clades
    groups.add(metadata[org]['Higher Taxonomy'])
NameError: name 'metadata' is not defined
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "forest_local.py", line 684, in <module>
    suspicious = parallel_susp_clades(trees)
  File "forest_local.py", line 503, in parallel_susp_clades
    suspicious = list(pool.map(suspicious_clades, trees))
  File "C:\Program Files\Python38\lib\multiprocessing\pool.py", line 364, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "C:\Program Files\Python38\lib\multiprocessing\pool.py", line 771, in get
    raise self._value
NameError: name 'metadata' is not defined

After I add "metadata = {}" in line 20, the info changed as below:

>python forest_local.py -i sgt_constructor_out_Apr.28.2024-local.tar.gz
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "C:\Program Files\Python38\lib\multiprocessing\pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "C:\Program Files\Python38\lib\multiprocessing\pool.py", line 48, in mapstar
    return list(map(*args))
  File "D:\2024\05\forest_local.py", line 198, in suspicious_clades
    groups.add(metadata[org]['Higher Taxonomy'])
KeyError: 'Tisolute'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "forest_local.py", line 671, in <module>
    suspicious = parallel_susp_clades(trees)
  File "forest_local.py", line 504, in parallel_susp_clades
    suspicious = list(pool.map(suspicious_clades, trees))
  File "C:\Program Files\Python38\lib\multiprocessing\pool.py", line 364, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "C:\Program Files\Python38\lib\multiprocessing\pool.py", line 771, in get
    raise self._value
KeyError: 'Tisolute'

Then I commented out the function "def parallel_susp_clades(trees)", and changed "suspicious = parallel_susp_clades(trees)" to "suspicious = suspicious_clades(trees)", the error info changed as below:

>python forest_local.py -i sgt_constructor_out_Apr.28.2024-local.tar.gz
Traceback (most recent call last):
  File "forest_local.py", line 672, in <module>
    suspicious = suspicious_clades(trees)
  File "forest_local.py", line 175, in suspicious_clades
    t = Tree(tree)
  File "C:\Program Files\Python38\lib\site-packages\ete3\coretype\tree.py", line 212, in __init__
    read_newick(newick, root_node = self, format=format,
  File "C:\Program Files\Python38\lib\site-packages\ete3\parser\newick.py", line 269, in read_newick
    raise NewickError("'newick' argument must be either a filename or a newick string.")
ete3.parser.newick.NewickError: 'newick' argument must be either a filename or a newick string.
You may want to check other newick loading flags like 'format' or 'quoted_node_names'.

Solved in a silly way: I successfully changed multiprocessing to a simple "for loop", and then the result came out smoothly in 1 min. To achieve this, change:

    if not args.backpropagate:
        suspicious = parallel_susp_clades(trees)

to

    suspicious = []

    if not args.backpropagate:
#        suspicious = parallel_susp_clades(trees)
        for tree in trees:
            suspicious.append(suspicious_clades(tree))
        print(suspicious)

You will see the list of suspicious genes in the corresponding tree in your terminal. Result files are as below:

image

Hope this helps! @robert-ervin-jones @sipesk

@robert-ervin-jones
Copy link
Member

Hi @shuiyujinlan,

Would it be possible for you to open a PR with your proposed code changes?

Best,
Robert

@maggielawton
Copy link

Had the same issue, and the fix by shuiyujinlan worked for me as well. Thanks!

shuiyujinlan added a commit to shuiyujinlan/PhyloFisher that referenced this issue May 16, 2024
Remove multiprocessing to fix the bug in issue TheBrownLab#120 "metadata not defined in forest_local.py"(and import PyQt5 for ete3, I'm not sure if it's necessary.).
shuiyujinlan added a commit to shuiyujinlan/PhyloFisher that referenced this issue May 16, 2024
Only fix the bug in issue TheBrownLab#120.
@shuiyujinlan
Copy link
Contributor

Hi @shuiyujinlan,

Would it be possible for you to open a PR with your proposed code changes?

Best, Robert

Sure. I opened a PR just now. And hope you'll find some clues in my reply and description to fix it more gracefully (e.g. retain the multiprocessing function).

robert-ervin-jones added a commit that referenced this issue May 29, 2024
…121)

* Update forest_local.py

Remove multiprocessing to fix the bug in issue #120 "metadata not defined in forest_local.py"(and import PyQt5 for ete3, I'm not sure if it's necessary.).

* Update forest_local.py

Only fix the bug in issue #120.

* Update matrix_constructor.py

To fix the snakemake bug "MissingRuleException:
No rule to produce snakemake (if you use input functions make sure that they don't raise unexpected exceptions)." 
It's caused by "--of phylip" because in original code, one of output file "matrix.fas" in python didn't match the name "f'{out_dir}/matrix.{out_dict[out_format.lower()]}'"(if use default --of fasta, they are thesame) in snakemake file.

---------

Co-authored-by: Robert E. Jones <56359883+robert-ervin-jones@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants