UNIT I INTRODUCTION TO BIOINFORMATICS


Introduction to Bioinformatics

Bioinformatics is the application of computer technology to the management of biological information. Computers are used to gather, store, analyze and integrate biological and genetic information which can then be applied to gene-based drug discovery and development. The need for Bioinformatics capabilities has been precipitated by the explosion of publicly available genomic information resulting from the Human Genome Project. The goal of this project - determination of the sequence of the entire human genome (approximately three billion base pairs) - will be reached by the year 2002. The science of Bioinformatics, which is the melding of molecular biology with computer science, is essential to the use of genomic information in understanding human diseases and in the identification of new molecular targets for drug discovery. In recognition of this, many universities, government institutions and pharmaceutical firms have formed bioinformatics groups, consisting of computational biologists and bioinformatics computer scientists. Such groups will be key to unraveling the mass of information generated by large scale sequencing efforts underway in laboratories around the world.


Scope of Bioinformatics


Bioinformatics Applications: Genomics, Proteomics and Transcrptiomics

Bioinformatics is a combination of molecular biology and computer sciences. It is that technology in which computers are used to gather, store, analyze and integrate biological and genetic information. The need for Bioinformatics arose when a project to determine the sequence of the entire human genome was initiated. This project was called the Human Genome Project. Bioinformatics is very important for the use of genomic information to understand human diseases and to identify new ways for gene-based drug discovery and development. Therefore, many universities, government institutions and pharmaceutical companies have come forward to form bioinformatics groups to do research related to computational biology so that better ways are used to make processes more efficient and less time consuming. 


Bioinformatics in Proteomics

Proteomics is a branch of biotechnology that deals with the techniques of molecular biology, biochemistry, and genetics to analyze the structure, function, and interactions of the proteins produced by the genes of a particular cell, tissue, or organism. This technology is being improved continuously and new tactics are being introduced. In the current day and age it is possible to acquire the proteome data. Bioinformatics makes it easier to come up with new algorithms to handle large and heterogeneous data sets to improve the processes. To date, algorithms for image analysis of 2D gels have been developed. In case of mass spectroscopy, data analysis algorithms for peptide mass fingerprinting and peptide fragmentation fingerprinting have been developed.

Bioinformatics and Genomics

Genomics is the study of complex sets of genes, their expression and the most vital role they play in biology. The most important application of bioinformatics in genomics is the Human Genome Project through which more than 30,000 genes have been identified and secured through the sequencing of chemical base pairs which make up the DNA. It has thus enabled us to obtain necessary knowledge as to how these genes inter-relate and what functions they perform. Cures for many diseases are being discovered through this inter-relation where bioinformatics, no doubt, plays a pivotal role. 

Bioinformatics and Transcriptiomics

Transcriptiomics deals with the study of messenger RNA molecules produced in an individual or population of a particular cell type. It is also referred to asExpression Profiling in which the expression level of mRNA, in a given cell population, is determined through DNA microarray technology. Bioinformatics is thus used for transcriptome analysis where mRNA expressions levels can be determined so as to see how a certain disease, like cancer, can be cured. 

Parallel to the above mentioned fields, Bioinformatics is also being used in;
Molecular medicine, Personalised medicine, Preventative medicine, Gene therapy, Drug development, Microbial genome applications, Waste cleanup, Climate change Studies, Alternative energy sources, Biotechnology, Antibiotic resistance, Forensic analysis of microbes, Bio-weapon creation, Evolutionary studies, Crop improvement, Insect resistance, Improve nutritional quality, Development of Drought resistance varieties, Veterinary Science etc which are all quite debatable in their own capacity and will be discussed in further detail.

Bioinformatics is being used in following fields:
  • Molecular medicine
      
  • Personalised medicine
      
  • Preventative medicine
     
  • Gene therapy
     
  • Drug development
      
  • Microbial genome applications
      
  • Waste cleanup
      
  • Climate change Studies
      
  • Alternative energy sources
      
  • Biotechnology
      
  • Antibiotic resistance
      
  • Forensic analysis of microbes
      
  • Bio-weapon creation
      
  • Evolutionary studies
      
  • Crop improvement
      
  • Insect resistance
     
  • Improve nutritional quality
      
  • Development of Drought resistance varieties
      
  • Vetinary Science



Elementary Commands and Protocols, ftp,telnet,http,


File eXchange Protocol (FXP) and (FXSP) is a method of data transfer which uses the FTP protocol to transfer data from one remote server to another (inter-server) without routing this data through the client’s connection. Conventional FTP involves a single server and a single client; all data transmission is done between these two. In the FXP session, a client maintains a standard FTP connection to two servers, and can direct either server to connect to the other to initiate a data transfer. The advantage of using FXP over FTP is evident when a high-bandwidth server demands resources from another high-bandwidth server, but only a low-bandwidth client, such as a network administrator working away from location, has the authority to access the resources on both servers.

Transport Layer Security (TLS) and its predecessor, Secure Sockets Layer (SSL), are cryptographic protocols that provide security and data integrity for communications over networks such as the Internet. TLS and SSL encrypt the segments of network connections at the Transport Layer end-to-end. HyperText Transfer Protocol Standard application-level protocol used for exchanging files on the World Wide Web. HTTP runs on top of the TCP/IP protocol. Web browsers are HTTP clients that send file requests to Web servers, which in turn handle the requests via an HTTP service. HTTP was originally proposed in 1989 by Tim Berners-Lee, who was a coauthor of the 1.0 specification. HTTP in its 1.0 version was “stateless”: each new request from a client established a new connection instead of handling all similar requests through the same connection between a specific client and server. Version 1.1 includes persistent connections, decompression of HTML files by client browsers, and multiple domain names sharing the same IP address.

 Telnet (teletype network) is a network protocol used on the Internet or local area networks to provide a bidirectional interactive communications facility. Typically, telnet provides access to a command-line interface on a remote host via a virtual terminal connection which consists of an 8-bit byte oriented data connection over the Transmission Control Protocol (TCP). User data is interspersed in-band with TELNET control information. The term telnet may also refer to the software that implements the client part of the protocol. Telnet client applications are available for virtually all computer platforms. Most network equipment and operating system with a TCP/IP stack support a Telnet service for remote configuration (including systems based on Windows NT). Because of security issues with Telnet, its use has waned in favor of SSH for remote access. Secure File Transfer Protocol or SFTP) is a network protocol that provides file transfer and manipulation functionality over any reliable data stream.

 It is typically used with version two of the SSH protocol (TCP port 22) to provide secure file transfer, but is intended to be usable with other protocols as well.allows only file transfers, the SFTP protocol allows for a range of operations on remote files – it is more like a remote file system protocol. An SFTP client’s extra capabilities compared to an SCP client include resuming interrupted transfers, directory listings, and remote file removal. [1] For these reasons it is relatively simple to implement a GUI SFTP client compared with a GUI SCP client. SMTP Simple Mail Transfer Protocol, a protocol for sending e-mail messages between servers. Most e-mail systems that send mail over the Internet use SMTP to send messages from one server to another; the messages can then be retrieved with an e-mail client using either POP or IMAP. In addition, SMTP is generally used to send messages from a mail client to a mail server.

This is why you need to specify both the POP or IMAP server and the SMTP server when you configure your e-mail application. IMAP Internet Message Access Protocol, a protocol for retrieving e-mail messages. The latest version, IMAP4, is similar to POP3 but supports some additional features. For example, with IMAP4, you can search through your e-mail messages for keywords while the messages are still on mail server. You can then choose which messages to download to your machine. Short for Post Office Protocol, a protocol used to retrieve e-mail from a mail server. Most e-mail applications (sometimes called an e-mail client) use the POP protocol, although some can use the newer IMAP (Internet Message Access Protocol).



FTP


File Transfer Protocol (FTP) is a standard network protocol used to transfer files from one host to another host over a TCP-based network, such as the Internet. It is often used to upload web pages and other documents from a private development machine to a public web-hosting server. FTP is built on a client-server architecture and uses separate control and data connections between the client and the server.[1] FTP users may authenticate themselves using a clear-text sign-in protocol, normally in the form of a username and password, but can connect anonymously if the server is configured to allow it. For secure transmission that hides (encrypts) your username and password, as well as encrypts the content, you can try using a client that uses SSH File Transfer Protocol.
The first FTP client applications were interactive command-line tools, implementing standard commands and syntax. Graphical user interfaces have since been developed for many of the popular desktop operating systems in use today,[2][3] including general web design programs like Microsoft Expression Web, and specialist FTP clients such as CuteFTP.

TelNet

Telnet is a network protocol used on the Internet or local area networks to provide a bidirectional interactive text-oriented communications facility using a virtual terminal connection. User data is interspersed in-band with Telnet control information in an 8-bit byte oriented data connection over the Transmission Control Protocol (TCP).
Telnet was developed in 1969 beginning with RFC 15, extended in RFC 854, and standardized as Internet Engineering Task Force (IETF) Internet Standard STD 8, one of the first Internet standards.
Historically, Telnet provided access to a command-line interface (usually, of an operating system) on a remote host. Most network equipment and operating systems with a TCP/IP stack support a Telnet service for remote configuration (including systems based on Windows NT). Because of security issues with Telnet, its use for this purpose has waned in favor of SSH.
The term telnet may also refer to the software that implements the client part of the protocol. Telnet client applications are available for virtually all computer platforms. Telnet is also used as a verb. To telnet means to establish a connection with the Telnet protocol, either with command line client or with a programmatic interface. For example, a common directive might be: "To change your password, telnet to the server, login and run the passwd command." Most often, a user will be telnetting to a Unix-like server system or a network device (such as a router) and obtain a login prompt to a command line text interface or a character-based full-screen manager.

HTTP

The Hypertext Transfer Protocol (HTTP) is an application protocol for distributed, collaborative, hypermedia information systems.[1] HTTP is the foundation of data communication for the World Wide Web. Hypertext is a multi-linear set of objects, building a network by using logical links (the so called hyperlinks) between the nodes (e.g. text or words). HTTP is the protocol to exchange or transfer hypertext.
The standards development of HTTP was coordinated by the Internet Engineering Task Force (IETF) and the World Wide Web Consortium (W3C), culminating in the publication of a series of Requests for Comments (RFCs), most notably RFC 2616 (June 1999), which defines HTTP/1.1, the version of HTTP in common use.

HTTP functions as a request-response protocol in the client-server computing model. In HTTP, a web browser, for example, acts as a client, while an application running on a computer hosting a web site functions as a server. The client submits an HTTP request message to the server. The server, which stores content, or provides resources, such as HTML files, or performs other functions on behalf of the client, returns a response message to the client. A response contains completion status information about the request and may contain any content requested by the client in its message body.
A web browser (or client) is often referred to as a user agent (UA). Other user agents can include the indexing software used by search providers, known as web crawlers, or variations of the web browser such as voice browsers, which present an interactive voice user interface.

HTTP is designed to permit intermediate network elements to improve or enable communications between clients and servers. High-traffic websites often benefit from web cache servers that deliver content on behalf of the original, so-called origin server, to improve response time. HTTP proxy servers at network boundaries facilitate communication when clients without a globally routable address are located in private networks by relaying the requests and responses between clients and servers.
HTTP is an Application Layer protocol designed within the framework of the Internet Protocol Suite. The protocol definitions presume a reliable Transport Layer protocol for host-to-host data transfer.[2] The Transmission Control Protocol (TCP) is the dominant protocol in use for this purpose. However, HTTP has found application even with unreliable protocols, such as the User Datagram Protocol (UDP) in methods such as the Simple Service Discovery Protocol (SSDP).

HTTP Resources are identified and located on the network by Uniform Resource Identifiers (URIs)—or, more specifically, Uniform Resource Locators (URLs)—using the http or https URI schemes. URIs and the Hypertext Markup Language (HTML), form a system of inter-linked resources, called hypertext documents, on the Internet, that led to the establishment of the World Wide Web in 1990 by English computer scientist and innovator Tim Berners-Lee.

The original version of HTTP (HTTP/1.0) was revised in HTTP/1.1. HTTP/1.0 uses a separate connection to the same server for every request-response transaction, while HTTP/1.1 can reuse a connection multiple times, to download, for instance, images for a just delivered page. Hence HTTP/1.1 communications experience less latency as the establishment of TCP connections presents considerable overhead.




Primer on information theory

(Refer the foloowing URL for this topic)