Scalable big data analytics for protein bioinformatics: efficient computational solutions for protein structures
978-3-319-98839-9, 3319988395, 9783319988405, 3319988409, 978-3-319-98838-2
This book presents a focus on proteins and their structures. The text describes various scalable solutions for protein s
Table of contents : Content: Intro Foreword Preface Scope of the Book Chapter Overview Summary Acknowledgements Contents Acronyms Part I Background 1 Formal Model of 3D Protein Structures for Functional Genomics, Comparative Bioinformatics, and Molecular Modeling 1.1 Introduction 1.2 General Definition of Protein Spatial Structure 1.3 A Reference to Representation Levels 1.3.1 Primary Structure 1.3.2 Secondary Structure 1.3.3 Tertiary Structure 1.3.4 Quaternary Structure 1.4 Relative Coordinates of Protein Structures 1.5 Energy Properties of Protein Structures 1.6 Summary References. 2 Technological Roadmap2.1 Cloud Computing 2.1.1 Cloud Service Models 2.1.2 Cloud Deployment Models 2.2 Big Data Challenge 2.2.1 The 5V Model of Big Data 2.2.2 Hadoop Platform 2.3 Multi-threading and Multi-threaded Applications 2.4 Graphics Processing Units and the CUDA 2.4.1 Graphics Processing Units 2.4.2 CUDA Architecture and Threads 2.5 Relational Databases and SQL 2.5.1 Relational Database Management Systems 2.5.2 SQL For Manipulating Relational Data 2.6 Scalability 2.7 Summary References Part II Cloud Services for Scalable Computations 3 Azure Cloud Services. 3.1 Microsoft Azure3.2 Virtual Machines, Series, and Sizes 3.3 Cloud Services in Action 3.4 Summary References 4 Scaling 3D Protein Structure Similarity Searching with Azure Cloud Services 4.1 Introduction 4.1.1 Why We Need Cloud Computing in Protein Structure Similarity Searching 4.1.2 Algorithms for Protein Structure Similarity Searching 4.1.3 Other Cloud-Based Solutions for Bioinformatics 4.2 Cloud4PSi for 3D Protein Structure Alignment 4.2.1 Use Case: Interaction with the Cloud4PSi 4.2.2 Architecture and Processing Model of the Cloud4PSi 4.2.3 Scaling Cloud4PSi. 4.3 Scalability of the Cloud4PSi4.3.1 Horizontal Scalability 4.3.2 Vertical Scalability 4.3.3 Influence of the Package Size 4.3.4 Scaling Up or Scaling Out? 4.4 Discussion 4.5 Summary References 5 Cloud Services for Efficient Ab Initio Predictions of 3D Protein Structures 5.1 Introduction 5.1.1 Computational Approaches for 3D Protein Structure Prediction 5.1.2 Cloud and Grid Computing in Protein Structure Determination 5.2 Cloud4PSP for 3D Protein Structure Prediction 5.2.1 Prediction Method 5.2.2 Cloud4PSP Architecture 5.2.3 Cloud4PSP Processing Model 5.2.4 Extending Cloud4PSP. 5.2.5 Scaling the Cloud4PSP5.3 Performance of the Cloud4PSP 5.3.1 Vertical Scalability 5.3.2 Horizontal Scalability 5.3.3 Influence of the Task Size 5.3.4 Scale Up, Scale Out, or Combine? 5.4 Discussion 5.5 Summary 5.6 Availability References Part III Big Data Analytics in Protein Bioinformatics 6 Foundations of the Hadoop Ecosystem 6.1 Big Data 6.2 Hadoop 6.2.1 Hadoop Distributed File System 6.2.2 MapReduce Processing Model 6.2.3 MapReduce 1.0 (MRv1) 6.2.4 MapReduce 2.0 (MRv2) 6.3 Apache Spark 6.4 Hadoop Ecosystem 6.5 Summary References.