Question about HHpred parser
Dear CCTBX developers, I am a postdoc at EPFL working with HHpred for homology modeling of membrane proteins. I have been trying to write my own HHpred alignment parser until I found the python script under “cctbx_fork/iotbx/bioinformatics/__init__.py/” that contains an HHpred parser. My goal is to correctly parse the raw HHpred output file (.hhr), which involves unwrapping every alignment, parsing out a lot of text to finally obtain something like this:
pdb_name
query-sequence column score Example:
4U15
VYGFIGGIFGFMSIMTMAMISIDRYNVIGRPMAASKKMSHRRAFIMIIFVWLWS +........+..+..++|+++|++++.++.+.++++ +..+.++.+|+++|++.++...+........ +...|.. Being somewhat new to python, I was wondering whether the people who wrote this script are still around and could help me figure out whether the parser could be implemented in such a way. Thanks for any help you can provide! Best, Louis D
Dear Louis,
apologies for the late response, it was a long time ago I wrote the code
and had to look at it to be able to answer your query.
I am assuming that you want to search .hhr-files from hhsearch (i.e. with
multiple hits) as opposed to hhalign. If this is case, tha parser goes ca
50% of what you need, in that in captures the PDB id and also the alignment
sequence, but not the midline. It would not be impossible to extend the
parser to handle this, but currently, it does not. Would this be sufficient?
However, if you plan to process hhalign output, the parser gets everything
out, including the midline.
Best wishes, Gabor
On Wed, Apr 22, 2020 at 2:51 PM Louis Dumas
Dear CCTBX developers,
I am a postdoc at EPFL working with HHpred for homology modeling of membrane proteins.
I have been trying to write my own HHpred alignment parser until I found the python script under “cctbx_fork/iotbx/bioinformatics/__init__.py/” that contains an HHpred parser.
My goal is to correctly parse the raw HHpred output file (.hhr), which involves unwrapping every alignment, parsing out a lot of text to finally obtain something like this:
pdb_name
query-sequence
column score
Example:
4U15
VYGFIGGIFGFMSIMTMAMISIDRYNVIGRPMAASKKMSHRRAFIMIIFVWLWS
+........+..+..++|+++|++++.++.+.++++ +..+.++.+|+++|++.++...+........ +...|..
Being somewhat new to python, I was wondering whether the people who wrote this script are still around and could help me figure out whether the parser could be implemented in such a way.
Thanks for any help you can provide!
Best,
Louis D _______________________________________________ cctbxbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/cctbxbb
participants (2)
-
Gabor Bunkoczi
-
Louis Dumas