Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rc3 installation issue - can't make blast dbs (SequenceServer::CommandFailed) #469

Closed
mjcoynejr opened this issue May 23, 2020 · 11 comments
Closed

Comments

@mjcoynejr
Copy link

Installation of sequenceserver-2.0.0.rc3 seemed to go ok until it attempted to create the blast databases when it died with SequenceServer::CommandFailed (SequenceServer::CommandFailed).

For details, see the attached terminal session file (includes commands issued, and info about ruby and gem installations and environments).

My goal, ultimately, is to integrate SequenceServer with Apache version 2.4.37. My system is CentOS Linux 8.1.1911 Linux 4.18.0-147.8.1.el8_1.x86_64 on x86_64.

sequenceserver.session.txt

Thanks for any help!

@mjcoynejr
Copy link
Author

On further inspection, it seems like the preparation of the blast databases actually gets started, but bombs out as described above.

After passing SequenceServer the path to the blast installation and the directories to be formatted, here's what happens:

[mcoyne@helix Desktop]$ sequenceserver

Your BLAST+ version 2.9.0+ is incompatible.
SequenceServer needs NCBI BLAST+ version 2.10.0+.

If you have downloaded NCBI BLAST+ already, please enter the path
below. Otherwise just press Enter and SequenceServer will download
a copy of NCBI BLAST+ for itself.

Press Ctrl+C to abort setup.

/home/mcoyne/.sequenceserver/ncbi-blast-2.10.0+/

Database dir not set.

SequenceServer needs to know where your FASTA files or BLAST+ databases are.
Please enter the path to the relevant directory (default: current directory).

Press Ctrl+C to quit.

/var/www/html/SequenceServer/

Could not find BLAST+ databases in: /var/www/html/SequenceServer.

Search for FASTA files (.fa, .fasta, .fna) in '/var/www/html/SequenceServer' and try
creating BLAST+ databases? [y/n] (Default: y).

y

Searching ...

FASTA file: /var/www/html/SequenceServer/All_Bacteroidales/All_Bacteroidales.faa
FASTA type: protein
Proceed? [y/n] (Default: y):
Enter a database title or will use 'All Bacteroidales':
Enter taxid (optional):
Traceback (most recent call last):
12: from /home/mcoyne/bin/sequenceserver:23:in <main>' 11: from /home/mcoyne/bin/sequenceserver:23:in load'
10: from /home/mcoyne/.gem/ruby/gems/sequenceserver-2.0.0.rc3/bin/sequenceserver:48:in <top (required)>' 9: from /home/mcoyne/.gem/ruby/gems/slop-3.6.0/lib/slop.rb:65:in parse!'
8: from /home/mcoyne/.gem/ruby/gems/slop-3.6.0/lib/slop.rb:260:in parse!' 7: from /home/mcoyne/.gem/ruby/gems/sequenceserver-2.0.0.rc3/bin/sequenceserver:186:in block (2 levels) in <top (required)>'
6: from /home/mcoyne/.gem/ruby/gems/sequenceserver-2.0.0.rc3/bin/sequenceserver:312:in rescue in block (2 levels) in <top (required)>' 5: from /home/mcoyne/.gem/ruby/gems/sequenceserver-2.0.0.rc3/lib/sequenceserver/database.rb:215:in make_blast_databases'
4: from /home/mcoyne/.gem/ruby/gems/sequenceserver-2.0.0.rc3/lib/sequenceserver/database.rb:215:in select' 3: from /home/mcoyne/.gem/ruby/gems/sequenceserver-2.0.0.rc3/lib/sequenceserver/database.rb:216:in block in make_blast_databases'
2: from /home/mcoyne/.gem/ruby/gems/sequenceserver-2.0.0.rc3/lib/sequenceserver/database.rb:245:in make_blast_database' 1: from /home/mcoyne/.gem/ruby/gems/sequenceserver-2.0.0.rc3/lib/sequenceserver/database.rb:252:in _make_blast_database'
/home/mcoyne/.gem/ruby/gems/sequenceserver-2.0.0.rc3/lib/sequenceserver/sys.rb:83:in `sys': SequenceServer::CommandFailed (SequenceServer::CommandFailed)

But, if I now look in /var/www/html/SequenceServer/All_Bacteroidales, files All_Bacteroidales.faa.pos, .pot, .ptf, and .pto have been created...

@yannickwurm
Copy link
Member

Dear @mjcoynejr - sorry for these frustrations.
Some thoughts already:

  • can you try running makeblastdb outside of sequenceserver to format the fasta file into a blast database? If it fails, just try running your fasta file through seqtk seq which may clean up some weird characters....
  • @yeban - i think that we should display the makeblastdb command that we use to help debug this kind of thing
  • FWIW, (but I think its unrelated): can you use blast 2.10? and could you use only lowercase directory names? (you have some uppercase letters in there)

Cheers,
Yannick

@mjcoynejr
Copy link
Author

@yannickwurm --

Thank you for your suggestions -- I'll give them a try and report back. Sequenceserver told me my system-wide install of blast 2.9.0+ was incompatible and thus downloaded 2.10.0+ and extracted it to /home/mcoyne/.sequenceserver/ncbi-blast-2.10.0+/.

I will try running makeblastdb by hand from a terminal and see what happens. I'll also change the /var/www/html/SequenceServer folder name to lower case. For what it's worth, these .faa and .fna files were copied directly over from another server of mine that is running sequenceserver 1.something, which works correctly -- I've got some new hardware and I'm trying to migrate things over.

As with all things Linux, I was thinking it was a path or permissions problem...

I'll try the things you mentioned now.

@mjcoynejr
Copy link
Author

mjcoynejr commented May 23, 2020

@yannickwurm

When run from a terminal opened in /home/mcoyne/.sequenceserver/ncbi-blast-2.10.0+/bin, ```
makeblastdb completed sucessfully:

[mcoyne@helix bin]$ makeblastdb -in /var/www/html/SequenceServer/All_Bacteroidales/All_Bacteroidales.faa -title "All_Bacteroidales" -dbtype prot


Building a new DB, current time: 05/23/2020 11:34:56
New DB name:   /var/www/html/SequenceServer/All_Bacteroidales/All_Bacteroidales.faa
New DB title:  All_Bacteroidales
Sequence type: Protein
Deleted existing Protein BLAST database named /var/www/html/SequenceServer/All_Bacteroidales/All_Bacteroidales.faa
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 11726947 sequences in 378.692 seconds.
[mcoyne@helix bin]$

When I renamed the /var/www/html/SequenceServer directory to sequenceserver and restarted the gem, it behaved quite differently:

[mcoyne@helix Desktop]$ sequenceserver

Your BLAST+ version 2.9.0+ is incompatible.
SequenceServer needs NCBI BLAST+ version 2.10.0+.

If you have downloaded NCBI BLAST+ already, please enter the path
below. Otherwise just press Enter and SequenceServer will download
a copy of NCBI BLAST+ for itself.

Press Ctrl+C to abort setup.

>> /home/mcoyne/.sequenceserver/ncbi-blast-2.10.0+


Database dir not set.

SequenceServer needs to know where your FASTA files or BLAST+ databases are.
Please enter the path to the relevant directory (default: current directory).

Press Ctrl+C to quit.

>> /var/www/html/sequenceserver

Do you want to be notified of SequenceServer releases and any
other important announcements (3-12 messages a year)? If yes,
please provide your email address below or press enter to
continue (you won't be prompted again).
>> n

[2020-05-23 11:51:59] WARN  Will listen on all interfaces (0.0.0.0). Consider using 127.0.0.1 (--host option).
** SequenceServer is ready.
   Go to http://localhost:4567 in your browser and start BLASTing!
   To share your setup, please try one of the following: 
     -  http://192.168.1.218:4567
     -  http://helix.xxxxxxxx.home:4567
   Press CTRL+C to quit.

It opened a browser session on http://localhost:4567/ and displays correctly, showing the single protein db that I created manually. I entered a protein query, and it worked correctly.

So, probably a case-sensitivity issue? I presume I'll be able to scan the rest by reinvoking sequenceserver with the appropriate command-line switches (-m and -d, if memory serves)?

@mjcoynejr
Copy link
Author

Running sequenceserver with the -m switch from a terminal still fails in the same manner as before...

Trying from sudo results in

sudo: sequenceserver: command not found

yeban added a commit that referenced this issue May 24, 2020
And print diagnostic information to help understand what might have
gone wrong (see #469)

Signed-off-by: Anurag Priyam <anurag08priyam@gmail.com>
@yeban
Copy link
Collaborator

yeban commented May 24, 2020

Hi. I just pushed an update that will print diagnostic information when makeblastdb fails. Could you update, try again, and let us know?

gem install --pre sequenceserver
sequenceserver -m

@mjcoynejr
Copy link
Author

mjcoynejr commented May 25, 2020

@yeban

Absolutely -- I need to first warn my users it's going down; I'll take it off-line about an hour from now and re-install.

I made all the blastable dbs using a Perl script and the /home/mcoyne/.sequenceserver/ncbi-blast-2.10.0+/bin binaries, and SequenceServer sees them fine on start-up...

BTW, how does one properly install this outside of a user directory, say into /usr/local/bin or (since I'm ultimately going to run it under Apache), into /var/www/html? (PS. If you want to blow this off for now while we work on makeblastdb, I'm good with that).

Be back soon...

@mjcoynejr
Copy link
Author

mjcoynejr commented May 25, 2020

@yeban

I commented above that "As with all things Linux, I was thinking it was a path or permissions problem...". I should have just gone with my instincts -- that's what it turned out to be:

[mcoyne@helix Desktop]$ sequenceserver

[2020-05-23 16:02:47] INFO  Reading configuration file: /home/mcoyne/.sequenceserver.conf.
[2020-05-23 16:02:47] WARN  Will listen on all interfaces (0.0.0.0). Consider using 127.0.0.1 (--host option).
** SequenceServer is ready.
   Go to http://localhost:nnnn in your browser and start BLASTing!
   To share your setup, please try one of the following: 
     -  http://192.168.1.218:nnnn
     -  http://xxx.xxx.xxx.home:nnnn
   Press CTRL+C to quit.
^C
** Thank you for using SequenceServer :).
   Please cite: 
       Priyam A, Woodcroft BJ, Rai V, Moghul I, Munagala A, Ter F,
       Chowdhary H, Pieniak I, Maynard LJ, Gibbins MA, Moon H,
       Davis-Richardson A, Uludag M, Watson-Haigh N, Challis R,
       Nakamura H, Favreau E, Gómez EA, Pluskal T, Leonard G,
       Rumpf W & Wurm Y.
       Sequenceserver: A modern graphical user interface for
       custom BLAST databases.
       Molecular Biology and Evolution (2019)

[mcoyne@helix Desktop]$ gem install --pre sequenceserver
Fetching: sequenceserver-2.0.0.rc4.gem (100%)

------------------------------------------------------------------------
  Thank you for installing SequenceServer :)

  To launch SequenceServer execute 'sequenceserver' from command line.

    $ sequenceserver


  Visit http://sequenceserver.com for more.
------------------------------------------------------------------------

Successfully installed sequenceserver-2.0.0.rc4
Parsing documentation for sequenceserver-2.0.0.rc4
Installing ri documentation for sequenceserver-2.0.0.rc4
Done installing documentation for sequenceserver after 0 seconds
1 gem installed

[mcoyne@helix Desktop]$ sequenceserver -m
[2020-05-24 21:07:02] INFO  Reading configuration file: /home/mcoyne/.sequenceserver.conf.

FASTA file: /var/www/html/sequenceserver/fragilis_not_fragilis/fragilis_not_fragilis.faa
FASTA type: protein
Proceed? [y/n] (Default: y): 
Enter a database title or will use 'fragilis not fragilis': 
Enter taxid (optional): 
Could not create BLAST database for: /var/www/html/sequenceserver/fragilis_not_fragilis/fragilis_not_fragilis.faa
Tried: makeblastdb -parse_seqids -hash_index -in /var/www/html/sequenceserver/fragilis_not_fragilis/fragilis_not_fragilis.faa -dbtype prot -title 'fragilis not fragilis' -taxid 0
stdout: 
stderr: Error: NCBI C++ Exception:
    T0 "/home/coremake/release_build/build/PrepareRelease_Linux64-Centos_JSID_01_260005_130.14.18.128_9008__PrepareRelease_Linux64-Centos_1575413971/c++/compilers/unix/../../src/objtools/blast/seqdb_writer/build_db.cpp", line 1025: Error: BLASTDB::ncbi::CBuildDatabase::CreateDirectories() - You do not have write permissions on 'fragilis_not_fragilis'

[mcoyne@helix Desktop]$ sudo sequenceserver -m
[sudo] password for mcoyne: 
sudo: sequenceserver: command not found

[mcoyne@helix Desktop]$ ls -l /var/www/html/sequenceserver/fragilis_not_fragilis
total 82100
-rw-rw-r--. 1 root root 22728309 Feb  7  2019 fragilis_not_fragilis.faa
-rw-rw-r--. 1 root root 61341301 Feb  7  2019 fragilis_not_fragilis.fna

[mcoyne@helix Desktop]$ ls -l /var/www/html/sequenceserver
total 24
drwxrwxrwx. 2 root root 4096 May 23 14:58 All_Bacteroidales
drwxrwxrwx. 2 root root 4096 May 23 15:17 All_Bacteroides_and_Parabacteroides
drwxr-xr-x. 2 root root   84 May 24 21:06 fragilis_not_fragilis
drwxr-xr-x. 2 root root 4096 May 23 15:17 Leighs_genomes
drwxr-xr-x. 2 root root 4096 May 23 15:17 PacBio_genomes
drwxr-xr-x. 2 root root 4096 May 23 15:18 Prevotella_genomes
drwxr-xr-x. 2 root root 4096 May 23 15:18 Selected_species

[mcoyne@helix Desktop]$ sudo chmod -R 0777 /var/www/html/sequenceserver
[sudo] password for mcoyne: 

[mcoyne@helix Desktop]$ ls -l /var/www/html/sequenceserver
total 24
drwxrwxrwx. 2 root root 4096 May 23 14:58 All_Bacteroidales
drwxrwxrwx. 2 root root 4096 May 23 15:17 All_Bacteroides_and_Parabacteroides
drwxrwxrwx. 2 root root   84 May 24 21:06 fragilis_not_fragilis
drwxrwxrwx. 2 root root 4096 May 23 15:17 Leighs_genomes
drwxrwxrwx. 2 root root 4096 May 23 15:17 PacBio_genomes
drwxrwxrwx. 2 root root 4096 May 23 15:18 Prevotella_genomes
drwxrwxrwx. 2 root root 4096 May 23 15:18 Selected_species

[mcoyne@helix Desktop]$ sequenceserver -m
[2020-05-24 21:19:13] INFO  Reading configuration file: /home/mcoyne/.sequenceserver.conf.

FASTA file: /var/www/html/sequenceserver/fragilis_not_fragilis/fragilis_not_fragilis.faa
FASTA type: protein
Proceed? [y/n] (Default: y): 
Enter a database title or will use 'fragilis not fragilis': 
Enter taxid (optional): 

Building a new DB, current time: 05/24/2020 21:19:18
New DB name:   /var/www/html/sequenceserver/fragilis_not_fragilis/fragilis_not_fragilis.faa
New DB title:  fragilis not fragilis
Sequence type: Protein
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 48568 sequences in 2.05515 seconds.

FASTA file: /var/www/html/sequenceserver/fragilis_not_fragilis/fragilis_not_fragilis.fna
FASTA type: nucleotide
Proceed? [y/n] (Default: y): 
Enter a database title or will use 'fragilis not fragilis': 
Enter taxid (optional): 

Building a new DB, current time: 05/24/2020 21:19:24
New DB name:   /var/www/html/sequenceserver/fragilis_not_fragilis/fragilis_not_fragilis.fna
New DB title:  fragilis not fragilis
Sequence type: Nucleotide
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 1286 sequences in 1.06068 seconds.

Sorry for the fuss... Nice to have the debug code in there though! Might also be nice to solve the directory name case sensitivity problem as well, but that's not such a big deal.

Now, I need to get this operational under Apache. Should I start a new thread about how to install SequenceServer into a directory other than a user directory to facilitate use under Apache (and more in keeping with the Linux FHS)?

I am a programmer and long-time (45+ years) geek (which is why I feel pretty silly that this came down to a permissions issue) and a long-time bench-type molecular biologist (30+ years), but I know nada about Ruby, or gem, or passenger...:)

I do want to say that this is an excellent piece of software (NCBI blast has become virtually unusable -- I have such a love-hate relationship with NCBI), and it is very needed -- if I can be of any help to you in further development, please let me know. I have many computers (currently 8 desktops, 3 laptops, and 4 rack mounted servers at my house alone, never mind at the lab), and would be more than willing to set up a sandboxed dedicated machine to help troubleshoot your application and perhaps give advice on usability from a bench scientist cum bioinformatician cum lab advisor to incoming post-docs point of view...

@mjcoynejr
Copy link
Author

mjcoynejr commented May 25, 2020

@yeban

For example:

You should consider making the -parse_seqid command line switch optional. When I was re-making my databases from within SequenceServer, it continuously bombed out with things like:

stderr: BLAST Database creation error: Near line 2201288, the local id is too long.  Its length is 51 but the maximum allowed local id length is 50.  Please find and correct all local ids that are too long.

The the local id (the part between the > and the first space) on line 2201288 in my All_Bacteroidales.faa file reads:

Anaerophaga_thermohalophila_DSM_12881.ATH1_RS0118475

I have (through a Perl script) reformatted the NCBI faa files for this genome set (>2,100 genomes) so that the fasta name is of the form 'genus_species_strain.locus_tag contig:start..stop annotation [genome name]', because this is MUCH more useful to lab members than the default NCBI fasta format (which is mostly useful to NCBI database administrators, and not to real-world scientists or post-docs just getting started on this bioinformatics stuff).

However, this string is 52 characters long (even though makeblastdb reports it as 51), and makeblastdb only allows 50 characters when -parse_seqid is included on the command line, thus an error (through stderr in rc4) is thrown.

The great strength of your software is clearly that it allows people to slice-and-dice their own blastable databases (I have had "no sequences found" returns from NCBI blast queries matching genomes I know are there -- because I submitted them). Allowing -parse_seqid to be optional (I agree, it's useful sometimes) would be very much more user frendly.

@yannickwurm
Copy link
Member

Thanks for all the thoughts and details @mjcoynejr
The parse seqids has been required so that users are able to download the fasta sequences of their hits... otherwise we are unable to extract them from the database.

To avoid the exact problem you mention, for internal stuff we thus often use slightly shortened names (e.g. shortening the genus... or relegating parts of the id to description)

Cheers

@yeban
Copy link
Collaborator

yeban commented May 25, 2020

I haven't tested it, but you might also be able to get away with the 50 character limit by adding | between species name and actual sequence identifier: Anaerophaga_thermohalophila|DSM_12881.ATH1_RS0118475

@yeban yeban closed this as completed Jun 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants