how to remove hydrogen from pdb
Dear Experts, Could you kindly tell me how can i remove hydrogen from my coordinate. Actually i refine my structure with phenix along all hydrogen now its almost done but before submitting the molecule into PDB data bank i want to remove hydrogen from the coordinate which i do not know how to do, suggestion would be appreciated Best Regards AFSHAN =========================================== Dr. Afshan Begum
Hi Afshan, This is from the documentation online. http://www.phenix-online.org/documentation/refinement.htm 7.To remove hydrogens from a model: % phenix.pdbtools model.pdb remove="element H or element D" or Reduce programs can be used for this: % phenix.reduce model_h.pdb -trim > model_noH.pdb We strongly recommend to not remove hydrogen atoms after refinement since it will make the refinement statistics (R-factors, etc...) unreproducible without repeating exactly the same refinement protocol. Ryan From: [email protected] [mailto:[email protected]] On Behalf Of Afshan Begum Sent: Thursday, February 06, 2014 9:12 AM To: [email protected] Subject: [phenixbb] how to remove hydrogen from pdb Dear Experts, Could you kindly tell me how can i remove hydrogen from my coordinate. Actually i refine my structure with phenix along all hydrogen now its almost done but before submitting the molecule into PDB data bank i want to remove hydrogen from the coordinate which i do not know how to do, suggestion would be appreciated Best Regards AFSHAN =========================================== Dr. Afshan Begum
Dear AFSHAN, the simple and qucik command egrep -v "^ATOM|HETATM.*H$" your.pdb > your_noH.pdb should also work. Best, Tim On 02/06/2014 06:12 PM, Afshan Begum wrote:
Dear Experts,
Could you kindly tell me how can i remove hydrogen from my coordinate. Actually i refine my structure with phenix along all hydrogen now its almost done but before submitting the molecule into PDB data bank i want to remove hydrogen from the coordinate which i do not know how to do, suggestion would be appreciated
Best Regards
AFSHAN
=========================================== Dr. Afshan Begum
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
-- Dr Tim Gruene Institut fuer anorganische Chemie Tammannstr. 4 D-37077 Goettingen GPG Key ID = A46BEE1A
Hi Tim, On 2/6/14, 10:52 AM, Tim Gruene wrote:
the simple and qucik command
egrep -v "^ATOM|HETATM.*H$" your.pdb > your_noH.pdb
should also work.
just out of curiosity I did (copy-paste of your example) egrep -v "^ATOM|HETATM.*H$\" m.pdb > m_noH.pdb and I got: Illegal variable name. Pavel
Of course, because in the shells that I use it will attempt to do variable name substitution in strings that are double-quoted. (I make no warranties about all possible shells). However if you use single quotes: egrep -v '^ATOM|HETATM.*H$' your.pdb > your_noH.pdb Should work just fine in tcsh, csh at the very least. Phil On 2/6/14 2:52 PM, Pavel Afonine wrote:
Hi Tim,
On 2/6/14, 10:52 AM, Tim Gruene wrote:
the simple and qucik command
egrep -v "^ATOM|HETATM.*H$" your.pdb > your_noH.pdb
should also work.
just out of curiosity I did (copy-paste of your example)
egrep -v "^ATOM|HETATM.*H$\" m.pdb > m_noH.pdb
and I got:
Illegal variable name.
Pavel
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
Thanks Phil, did this: egrep -v '^ATOM|HETATM.*H$' m.pdb > m_noH.pdb Result: in input file (m.pdb) I have: ATOM 1 N GLY A 1 0.504 -0.494 0.924 1.00 7.85 ATOM 2 CA GLY A 1 1.272 0.589 0.277 1.00 6.79 ATOM 3 C GLY A 1 1.700 1.614 1.301 1.00 5.59 ATOM 4 O GLY A 1 1.434 1.460 2.496 1.00 6.04 ATOM 0 H1 GLY A 1 0.452 -1.280 0.308 1.00 7.85 ATOM 0 H2 GLY A 1 0.959 -0.765 1.772 1.00 7.85 ATOM 0 H3 GLY A 1 -0.420 -0.171 1.131 1.00 7.85 ATOM 0 HA2 GLY A 1 2.157 0.171 -0.225 1.00 6.79 ATOM 0 HA3 GLY A 1 0.659 1.070 -0.499 1.00 6.79 END Output file (m_noH.pdb) contains only: END Pavel On 2/6/14, 12:03 PM, Phil Jeffrey wrote:
Of course, because in the shells that I use it will attempt to do variable name substitution in strings that are double-quoted. (I make no warranties about all possible shells). However if you use single quotes:
egrep -v '^ATOM|HETATM.*H$' your.pdb > your_noH.pdb
Should work just fine in tcsh, csh at the very least.
Phil
On 2/6/14 2:52 PM, Pavel Afonine wrote:
Hi Tim,
On 2/6/14, 10:52 AM, Tim Gruene wrote:
the simple and qucik command
egrep -v "^ATOM|HETATM.*H$" your.pdb > your_noH.pdb
should also work.
just out of curiosity I did (copy-paste of your example)
egrep -v "^ATOM|HETATM.*H$\" m.pdb > m_noH.pdb
and I got:
Illegal variable name.
Pavel
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
So I guess your contribution at this point in the thread is just to be as difficult as possible ? I dare say if you use it on PROTIN format it won't work either. Try it on a file that actually puts out something that conforms to the PDB standard with the element line at the end, not something that almost conforms to the standard, has non-distinct index numbers, apparently missing spaces on the GLY:N atom. I would hope, modulo the usual list of bugs, that phenix.refine actually writes out something more closely resembling the correct format, in which case Tim's regular expression would actually work. From the original post:
Actually i refine my structure with phenix along all hydrogen now
Phil Jeffrey Princeton On 2/6/14 3:09 PM, Pavel Afonine wrote:
Thanks Phil,
did this:
egrep -v '^ATOM|HETATM.*H$' m.pdb > m_noH.pdb
Result:
in input file (m.pdb) I have:
ATOM 1 N GLY A 1 0.504 -0.494 0.924 1.00 7.85 ATOM 2 CA GLY A 1 1.272 0.589 0.277 1.00 6.79 ATOM 3 C GLY A 1 1.700 1.614 1.301 1.00 5.59 ATOM 4 O GLY A 1 1.434 1.460 2.496 1.00 6.04 ATOM 0 H1 GLY A 1 0.452 -1.280 0.308 1.00 7.85 ATOM 0 H2 GLY A 1 0.959 -0.765 1.772 1.00 7.85 ATOM 0 H3 GLY A 1 -0.420 -0.171 1.131 1.00 7.85 ATOM 0 HA2 GLY A 1 2.157 0.171 -0.225 1.00 6.79 ATOM 0 HA3 GLY A 1 0.659 1.070 -0.499 1.00 6.79 END
Output file (m_noH.pdb) contains only:
END
Pavel
On 2/6/14, 12:03 PM, Phil Jeffrey wrote:
Of course, because in the shells that I use it will attempt to do variable name substitution in strings that are double-quoted. (I make no warranties about all possible shells). However if you use single quotes:
egrep -v '^ATOM|HETATM.*H$' your.pdb > your_noH.pdb
Should work just fine in tcsh, csh at the very least.
Phil
On 2/6/14 2:52 PM, Pavel Afonine wrote:
Hi Tim,
On 2/6/14, 10:52 AM, Tim Gruene wrote:
the simple and qucik command
egrep -v "^ATOM|HETATM.*H$" your.pdb > your_noH.pdb
should also work.
just out of curiosity I did (copy-paste of your example)
egrep -v "^ATOM|HETATM.*H$\" m.pdb > m_noH.pdb
and I got:
Illegal variable name.
Pavel
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
Hi Phil, ok, now I tried this: egrep -v '^ATOM|HETATM.*H$' m.pdb > m_noH.pdb Input (m.pdb - file right from PDB): ATOM 1 N GLY A 1 0.504 -0.494 0.924 1.00 7.85 N ATOM 2 CA GLY A 1 1.272 0.589 0.277 1.00 6.79 C ATOM 3 C GLY A 1 1.700 1.614 1.301 1.00 5.59 C ATOM 4 O GLY A 1 1.434 1.460 2.496 1.00 6.04 O ATOM 0 H1 GLY A 1 0.408 -1.171 0.354 1.00 7.85 H ATOM 0 H2 GLY A 1 0.939 -0.775 1.648 1.00 7.85 H ATOM 0 H3 GLY A 1 -0.298 -0.189 1.160 1.00 7.85 H ATOM 0 HA2 GLY A 1 2.052 0.220 -0.166 1.00 6.79 H ATOM 0 HA3 GLY A 1 0.731 1.013 -0.407 1.00 6.79 H END Output (m_noH.pdb): END Just to be clear: none of the two commands suggested so far worked on a valid PDB file (above). So I thought it might be useful to point this out. All the best, Pavel On 2/6/14, 12:21 PM, Phil Jeffrey wrote:
So I guess your contribution at this point in the thread is just to be as difficult as possible ? I dare say if you use it on PROTIN format it won't work either.
Try it on a file that actually puts out something that conforms to the PDB standard with the element line at the end, not something that almost conforms to the standard, has non-distinct index numbers, apparently missing spaces on the GLY:N atom.
I would hope, modulo the usual list of bugs, that phenix.refine actually writes out something more closely resembling the correct format, in which case Tim's regular expression would actually work.
From the original post:
Actually i refine my structure with phenix along all hydrogen now
Phil Jeffrey Princeton
On 2/6/14 3:09 PM, Pavel Afonine wrote:
Thanks Phil,
did this:
egrep -v '^ATOM|HETATM.*H$' m.pdb > m_noH.pdb
Result:
in input file (m.pdb) I have:
ATOM 1 N GLY A 1 0.504 -0.494 0.924 1.00 7.85 ATOM 2 CA GLY A 1 1.272 0.589 0.277 1.00 6.79 ATOM 3 C GLY A 1 1.700 1.614 1.301 1.00 5.59 ATOM 4 O GLY A 1 1.434 1.460 2.496 1.00 6.04 ATOM 0 H1 GLY A 1 0.452 -1.280 0.308 1.00 7.85 ATOM 0 H2 GLY A 1 0.959 -0.765 1.772 1.00 7.85 ATOM 0 H3 GLY A 1 -0.420 -0.171 1.131 1.00 7.85 ATOM 0 HA2 GLY A 1 2.157 0.171 -0.225 1.00 6.79 ATOM 0 HA3 GLY A 1 0.659 1.070 -0.499 1.00 6.79 END
Output file (m_noH.pdb) contains only:
END
Pavel
On 2/6/14, 12:03 PM, Phil Jeffrey wrote:
Of course, because in the shells that I use it will attempt to do variable name substitution in strings that are double-quoted. (I make no warranties about all possible shells). However if you use single quotes:
egrep -v '^ATOM|HETATM.*H$' your.pdb > your_noH.pdb
Should work just fine in tcsh, csh at the very least.
Phil
On 2/6/14 2:52 PM, Pavel Afonine wrote:
Hi Tim,
On 2/6/14, 10:52 AM, Tim Gruene wrote:
the simple and qucik command
egrep -v "^ATOM|HETATM.*H$" your.pdb > your_noH.pdb
should also work.
just out of curiosity I did (copy-paste of your example)
egrep -v "^ATOM|HETATM.*H$\" m.pdb > m_noH.pdb
and I got:
Illegal variable name.
Pavel
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
For old or new-style pdb files, try: awk '$1!~/ATOM|HETATM/ || $3!~/^H/' old.pdb > new.pdb But I understand some new H-atom names don't start with H, then this won't work. And if the number of atoms is large enough to fill the second field and fuse with HETATM in the first field, it won't work unless you define fixed field widths: awk '$1!~/ATOM|HETATM/ || $3!~/^H/' \ FIELDWIDTHS="6 5 5 4 2 4 4 8 8 8 6 6" \ old.pdb > new.pdb . Maybe a specialized tool like phenix.refine is safest! Pavel Afonine wrote:
Hi Phil,
ok, now I tried this:
egrep -v '^ATOM|HETATM.*H$' m.pdb > m_noH.pdb
Input (m.pdb - file right from PDB):
ATOM 1 N GLY A 1 0.504 -0.494 0.924 1.00 7.85 N ATOM 2 CA GLY A 1 1.272 0.589 0.277 1.00 6.79 C ATOM 3 C GLY A 1 1.700 1.614 1.301 1.00 5.59 C ATOM 4 O GLY A 1 1.434 1.460 2.496 1.00 6.04 O ATOM 0 H1 GLY A 1 0.408 -1.171 0.354 1.00 7.85 H ATOM 0 H2 GLY A 1 0.939 -0.775 1.648 1.00 7.85 H ATOM 0 H3 GLY A 1 -0.298 -0.189 1.160 1.00 7.85 H ATOM 0 HA2 GLY A 1 2.052 0.220 -0.166 1.00 6.79 H ATOM 0 HA3 GLY A 1 0.731 1.013 -0.407 1.00 6.79 H END
Output (m_noH.pdb):
END
Just to be clear: none of the two commands suggested so far worked on a valid PDB file (above). So I thought it might be useful to point this out.
All the best, Pavel
On 2/6/14, 12:21 PM, Phil Jeffrey wrote:
So I guess your contribution at this point in the thread is just to be as difficult as possible ? I dare say if you use it on PROTIN format it won't work either.
Try it on a file that actually puts out something that conforms to the PDB standard with the element line at the end, not something that almost conforms to the standard, has non-distinct index numbers, apparently missing spaces on the GLY:N atom.
I would hope, modulo the usual list of bugs, that phenix.refine actually writes out something more closely resembling the correct format, in which case Tim's regular expression would actually work.
From the original post:
Actually i refine my structure with phenix along all hydrogen now
Phil Jeffrey Princeton
On 2/6/14 3:09 PM, Pavel Afonine wrote:
Thanks Phil,
did this:
egrep -v '^ATOM|HETATM.*H$' m.pdb > m_noH.pdb
Result:
in input file (m.pdb) I have:
ATOM 1 N GLY A 1 0.504 -0.494 0.924 1.00 7.85 ATOM 2 CA GLY A 1 1.272 0.589 0.277 1.00 6.79 ATOM 3 C GLY A 1 1.700 1.614 1.301 1.00 5.59 ATOM 4 O GLY A 1 1.434 1.460 2.496 1.00 6.04 ATOM 0 H1 GLY A 1 0.452 -1.280 0.308 1.00 7.85 ATOM 0 H2 GLY A 1 0.959 -0.765 1.772 1.00 7.85 ATOM 0 H3 GLY A 1 -0.420 -0.171 1.131 1.00 7.85 ATOM 0 HA2 GLY A 1 2.157 0.171 -0.225 1.00 6.79 ATOM 0 HA3 GLY A 1 0.659 1.070 -0.499 1.00 6.79 END
Output file (m_noH.pdb) contains only:
END
Pavel
On 2/6/14, 12:03 PM, Phil Jeffrey wrote:
Of course, because in the shells that I use it will attempt to do variable name substitution in strings that are double-quoted. (I make no warranties about all possible shells). However if you use single quotes:
egrep -v '^ATOM|HETATM.*H$' your.pdb > your_noH.pdb
Should work just fine in tcsh, csh at the very least.
Phil
On 2/6/14 2:52 PM, Pavel Afonine wrote:
Hi Tim,
On 2/6/14, 10:52 AM, Tim Gruene wrote:
the simple and qucik command
egrep -v "^ATOM|HETATM.*H$" your.pdb > your_noH.pdb
should also work.
just out of curiosity I did (copy-paste of your example)
egrep -v "^ATOM|HETATM.*H$\" m.pdb > m_noH.pdb
and I got:
Illegal variable name.
Pavel
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
On Thu, Feb 6, 2014 at 1:07 PM, Edward A. Berry
Maybe a specialized tool like phenix.refine is safest!
At the risk of sounding like a broken record again: Unix shell tricks are great *if you already know how to use these tools* and have a detailed knowledge of the PDB format specification. For everyone else, I do not recommend trying to edit models with anything other than a tool specifically designed for this purpose. This is going to be especially important as the field slowly migrates to mmCIF. For the record, I had to add parentheses to get the egrep command to work on my Mac: egrep -v '^(ATOM|HETATM).*H$' m.pdb which does the job. But I do not think this solution is preferable to using phenix.reduce or phenix.pdbtools (or any number of other utilities). A final note: in my experience, if you deposit a file containing hydrogens to the PDB, they're just going to delete them for you (whether you want this or not!), so it is not necessary to do any additional preparation. In fact, I just did this last week, so we'll see what happens. -Nat
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi Nat, On 02/06/2014 10:26 PM, Nathaniel Echols wrote:
On Thu, Feb 6, 2014 at 1:07 PM, Edward A. Berry
wrote: Maybe a specialized tool like phenix.refine is safest!
At the risk of sounding like a broken record again: Unix shell tricks are great *if you already know how to use these tools* and have a detailed knowledge of the PDB format specification. For everyone else, I do not recommend trying to edit models with anything other than a tool specifically designed for this purpose. This is going to be especially important as the field slowly migrates to mmCIF.
some time ago (about 6 months), Eugene Krissinel sported a similar argument together with the statement that if a program does not exist to do a particular job, it was the developers' task to fix it. At that time I described a problem a student of mine had in data mining, but I never received a response. I am glad somebody else here wrote a little script to sort out the problem, because the numbers were important. With the warning above and the implicit concept to wait for developers, I still would not know. So I am happy about the availability and possibility to use scripting to push my research forward and not having to wait for someone to adjust my ideas to some API. Cheers, Tim P.S.: I checked the command on some output from phenix and noticed that the element type correctly is the last character of the line. Some (older) PDB files from the PDB show two blank spaces, the the regular expression does not work.
For the record, I had to add parentheses to get the egrep command to work on my Mac:
egrep -v '^(ATOM|HETATM).*H$' m.pdb
which does the job. But I do not think this solution is preferable to using phenix.reduce or phenix.pdbtools (or any number of other utilities).
A final note: in my experience, if you deposit a file containing hydrogens to the PDB, they're just going to delete them for you (whether you want this or not!), so it is not necessary to do any additional preparation. In fact, I just did this last week, so we'll see what happens.
-Nat
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
- -- - -- Dr Tim Gruene Institut fuer anorganische Chemie Tammannstr. 4 D-37077 Goettingen GPG Key ID = A46BEE1A -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) Comment: Using GnuPG with Icedove - http://www.enigmail.net/ iD8DBQFS9KRQUxlJ7aRr7hoRAsSKAKDaaA2ifkk/9XBsRvuX0mjocFt5lACfSrR6 rek3irmtFwofY5GDSoKsdig= =9nL0 -----END PGP SIGNATURE-----
On Thu, Feb 6, 2014 at 1:26 PM, Nathaniel Echols
A final note: in my experience, if you deposit a file containing hydrogens to the PDB, they're just going to delete them for you (whether you want this or not!), so it is not necessary to do any additional preparation. In fact, I just did this last week, so we'll see what happens.
In confirmation of this, here's what the PDB annotator said: "The H atoms in the coordinates all have zero occupancy, would you like to remove them from the coordinates?" Which obviously indicates a bug in phenix.refine (specifically the mmCIF output - the PDB-format file is okay), but my point stands. -Nat
Hi Afshan, it not a good idea to do this for reasons explained in great details here: http://www.phenix-online.org/newsletter/CCN_2012_01.pdf see "On contribution of hydrogen atoms to X-ray scattering". Yes, the easiest is phenix.reduce -trim model_with_h.pdb > model_no_h.pdb If you do this make sure you update refinement statistics (including R-factors!). All the best, Pavel On 2/6/14, 9:12 AM, Afshan Begum wrote:
Dear Experts,
Could you kindly tell me how can i remove hydrogen from my coordinate. Actually i refine my structure with phenix along all hydrogen now its almost done but before submitting the molecule into PDB data bank i want to remove hydrogen from the coordinate which i do not know how to do, suggestion would be appreciated
Hi Afshan,
Here is my little Linux shell script for removing hydrogen atoms from PDB. It uses the PDBCUR program in CCP4 package (my apologies to the PhenixBB!). With little modification you can also turn it into scripts for other things, like, to remove alternative comfomations, ANISOU,etc.. Please read the CCP4 manual for PDBCUR.
###
# remove_Hydrogen.sh
pdbcur XYZIN $1 XYZOUT $1_noH.pdb< chmod 755 remove_Hydrogen.sh To use it: ./remove_Hydrogen.sh xxxx.pdb Zhijie
From: Afshan Begum
Sent: Thursday, February 06, 2014 12:12 PM
To: [email protected]
Subject: [phenixbb] how to remove hydrogen from pdb
Dear Experts,
Could you kindly tell me how can i remove hydrogen from my coordinate. Actually i refine my structure with phenix along all hydrogen now its almost done but before submitting the molecule into PDB data bank i want to remove hydrogen from the coordinate which i do not know how to do, suggestion would be appreciated
Best Regards
AFSHAN
===========================================
Dr. Afshan Begum
--------------------------------------------------------------------------------
_______________________________________________
phenixbb mailing list
[email protected]
http://phenix-online.org/mailman/listinfo/phenixbb
participants (8)
-
Afshan Begum
-
Edward A. Berry
-
Nathaniel Echols
-
Pavel Afonine
-
Phil Jeffrey
-
Ryan Spencer
-
Tim Gruene
-
Zhijie Li