Machine Learning with R
Updated and upgraded to the latest libraries and most modern thinking, Machine Learning with R, Second Edition provides you with a rigorous introduction to this essential skill of professional data science. Without shying away from technical theory, it is written to provide focused and practical knowledge to get you building algorithms and crunching your data, with minimal previous experience. With this book, you'll discover all the analytical tools you need to gain insights from complex data and learn how to choose the correct algorithm for your specific needs. Through full engagement with the sort of real-world problems data-wranglers face, you'll learn to apply machine learning methods to deal with common tasks, including classification, prediction, forecasting, market analysis, and clustering.
An introduction to a broad range of topics in deep learning, covering mathematical and conceptual background, deep learning techniques used in industry, and research perspectives.
Big Data For Dummies
Find the right big data solution for your business or organization Big data management is one of the major challenges facing business, industry, and not-for-profit organizations. Data sets such as customer transactions for a mega-retailer, weather patterns monitored by meteorologists, or social network activity can quickly outpace the capacity of traditional data management tools. If you need to develop or manage big data solutions, you'll appreciate how these four experts define, explain, and guide you through this new and often confusing concept. You'll learn what it is, why it matters, and how to choose and implement solutions that work. Effectively managing big data is an issue of growing importance to businesses, not-for-profit organizations, government, and IT professionals Authors are experts in information management, big data, and a variety of solutions Explains big data in detail and discusses how to select and implement a solution, security concerns to consider, data storage and presentation issues, analytics, and much more Provides essential information in a no-nonsense, easy-to-understand style that is empowering Big Data For Dummies cuts through the confusion and helps you take charge of big data solutions for your organization.
Big Data et Machine Learning Manuel du data scientist 2e d
Cet ouvrage s’adresse à tous ceux qui cherchent à tirer parti de l’énorme potentiel des « technologies Big Data », qu’ils soient data scientists, DSI, chefs de projets ou spécialistes métier. Le Big Data s’est imposé comme une innovation majeure pour toutes les entreprises qui cherchent à construire un avantage concurrentiel grâce à l’exploitation de leurs données clients, fournisseurs, produits, processus, machines, etc. Mais quelle solution technique choisir ? Quelles compétences métier développer au sein de la DSI ? Ce livre est un guide pour comprendre les enjeux d’un projet Big Data, en appréhender les concepts sous-jacents (en particulier le Machine Learning) et acquérir les compétences nécessaires à la mise en place d’un data lab. Il combine la présentation : • de notions théoriques (traitement statistique des données, calcul distribué...) ; • des outils les plus répandus (écosystème Hadoop, Storm...) ; • d’exemples d’applications ; • d’une organisation typique d’un projet de data science. Cette deuxième édition est complétée et enrichie par des mises à jour sur les réseaux de neurones et sur le Deep Learning ainsi que sur Spark.
Big Data Fundamentals
Big Data Science Fundamentals offers a comprehensive, easy-to-understand, and up-to-date understanding of Big Data for all business professionals and technologists. Leading enterprise technology author Thomas Erl introduces key Big Data concepts, theory, terminology, technologies, key analysis/analytics techniques, and more - all logically organized, presented in plain English, and supported by easy-to-understand diagrams and case study examples. Erl provides a uniquely valuable methodology for Big Data analysis, and introduces the underlying analysis techniques and enabling technological constructs that constitute a Big Data solution environment. He presents vendor-neutral guidance on implementing Big Data for competitive advantage; and for successfully integrating Big Data with existing enterprise systems. Coverage includes: Big Data's fundamental concepts and key business/technology drivers "5 V" characteristics of data in Big Data environments: volume, velocity, variety, veracity, and value Types of Big Data: structured, unstructured, semi-structured, and meta-data Big Data's relationships with OLTP, OLAP, ETL, data warehouses, and data marts Fundamental types of analysis, analytics, and machine learning Requirements and tools for visualizing big data Adoption and planning: business cases, privacy, security, provenance, performance, governance, and more Big Data technologies, including clusters, NoSQL, distributed and parallel data processing, Hadoop, cloud computing, and storage Big Data analysis and analytics across the full lifecycle And much more
La qualit et la gouvernance des donn es au service de la performance des entreprises
La bonne qualité des données est aujourd'hui la clé de voûte de toute organisation. La gestion et l'amélioration de cette qualité sont des tâches coûteuses et difficiles, mais néanmoins incontournables. Cet ouvrage propose une étude des différents outils et démarches qui assistent les spécialistes de la qualité et de la gouvernance des données. À travers les expériences de la communauté francophone animée par l'association ExQI (Excellence Qualité, Information), il présente, avec pédagogie et pragmatisme, un panorama des concepts-clés de la gestion de la qualité des données et leurs déclinaisons dans les entreprises (Business Intelligence, Data QualityManagement, Key Performance Indicator, Model Driven Engineering, Master Data Management, etc.). Des solutions théoriques et techniques performantes sont détaillées et de nombreux retours d'expérience permettent d'illustrer les bonnes pratiques à adopter. Mêlant contributions industrielles et académiques, cet ouvrage est un outil de référence en langue française sur la qualité et la gouvernance des données en entreprise.
Data Analysis with Open Source Tools
Collecting data is relatively easy, but turning raw information into something useful requires that you know how to extract precisely what you need. With this insightful book, intermediate to experienced programmers interested in data analysis will learn techniques for working with data in a business environment. You'll learn how to look at data to discover what it contains, how to capture those ideas in conceptual models, and then feed your understanding back into the organization through business plans, metrics dashboards, and other applications. Along the way, you'll experiment with concepts through hands-on workshops at the end of each chapter. Above all, you'll learn how to think about the results you want to achieve -- rather than rely on tools to think for you. Use graphics to describe data with one, two, or dozens of variables Develop conceptual models using back-of-the-envelope calculations, as well asscaling and probability arguments Mine data with computationally intensive methods such as simulation and clustering Make your conclusions understandable through reports, dashboards, and other metrics programs Understand financial calculations, including the time-value of money Use dimensionality reduction techniques or predictive analytics to conquer challenging data analysis situations Become familiar with different open source programming environments for data analysis "Finally, a concise reference for understanding how to conquer piles of data."--Austin King, Senior Web Developer, Mozilla "An indispensable text for aspiring data scientists."--Michael E. Driscoll, CEO/Founder, Dataspora
Advanced Analytics with Spark
In the second edition of this practical book, four Cloudera data scientists present a set of self-contained patterns for performing large-scale data analysis with Spark. The authors bring Spark, statistical methods, and real-world data sets together to teach you how to approach analytics problems by example. Updated for Spark 2.1, this edition acts as an introduction to these techniques and other best practices in Spark programming. You’ll start with an introduction to Spark and its ecosystem, and then dive into patterns that apply common techniques—including classification, clustering, collaborative filtering, and anomaly detection—to fields such as genomics, security, and finance. If you have an entry-level understanding of machine learning and statistics, and you program in Java, Python, or Scala, you’ll find the book’s patterns useful for working on your own data applications. With this book, you will: Familiarize yourself with the Spark programming model Become comfortable within the Spark ecosystem Learn general approaches in data science Examine complete implementations that analyze large public data sets Discover which machine learning tools make sense for particular problems Acquire code that can be adapted to many uses
Hadoop Application Architectures
Get expert guidance on architecting end-to-end data management solutions with Apache Hadoop. While many sources explain how to use various components in the Hadoop ecosystem, this practical book takes you through architectural considerations necessary to tie those components together into a complete tailored application, based on your particular use case. To reinforce those lessons, the book’s second section provides detailed examples of architectures used in some of the most commonly found Hadoop applications. Whether you’re designing a new Hadoop application, or planning to integrate Hadoop into your existing data infrastructure, Hadoop Application Architectures will skillfully guide you through the process. This book covers: Factors to consider when using Hadoop to store and model data Best practices for moving data in and out of the system Data processing frameworks, including MapReduce, Spark, and Hive Common Hadoop processing patterns, such as removing duplicate records and using windowing analytics Giraph, GraphX, and other tools for large graph processing on Hadoop Using workflow orchestration and scheduling tools such as Apache Oozie Near-real-time stream processing with Apache Storm, Apache Spark Streaming, and Apache Flume Architecture examples for clickstream analysis, fraud detection, and data warehousing
High Performance Spark
Apache Spark is amazing when everything clicks. But if you haven’t seen the performance improvements you expected, or still don’t feel confident enough to use Spark in production, this practical book is for you. Authors Holden Karau and Rachel Warren demonstrate performance optimizations to help your Spark queries run faster and handle larger data sizes, while using fewer resources. Ideal for software engineers, data engineers, developers, and system administrators working with large-scale data applications, this book describes techniques that can reduce data infrastructure costs and developer hours. Not only will you gain a more comprehensive understanding of Spark, you’ll also learn how to make it sing. With this book, you’ll explore: How Spark SQL’s new interfaces improve performance over SQL’s RDD data structure The choice between data joins in Core Spark and Spark SQL Techniques for getting the most out of standard RDD transformations How to work around performance issues in Spark’s key/value pair paradigm Writing high-performance Spark code without Scala or the JVM How to test for functionality and performance when applying suggested improvements Using Spark MLlib and Spark ML machine learning libraries Spark’s Streaming components and external community packages