Coverart for item
The Resource Data analysis with open source tools, Philipp K. Janert

Data analysis with open source tools, Philipp K. Janert

Label
Data analysis with open source tools
Title
Data analysis with open source tools
Statement of responsibility
Philipp K. Janert
Creator
Subject
Language
eng
Summary
Provides information on the techniques of data analysis using a variety of open source tools
Cataloging source
UKM
http://library.link/vocab/creatorName
Janert, Philipp K
Illustrations
illustrations
Index
index present
Literary form
non fiction
Nature of contents
bibliography
http://library.link/vocab/subjectName
  • Data mining
  • Open source software
Label
Data analysis with open source tools, Philipp K. Janert
Instantiates
Publication
Bibliography note
Includes bibliographical references and index
Contents
1. Introduction -- Data Analysis -- What's in This Book -- What's with the Workshops? -- What's with the Math? -- What You'll Need -- What's Missing -- pt. I Graphics: Looking at Data -- 2. A Single Variable: Shape and Distribution -- Dot and Jitter Plots -- Histograms and Kernel Density Estimates -- The Cumulative Distribution Function -- Rank-Order Plots and Lift Charts -- Only When Appropriate: Summary Statistics and Box Plots -- Workshop: Numpy -- Further Reading -- 3. Two Variables: Establishing Relationships -- Scatter Plots -- Conquering Noise: Smoothing -- Logarithmic Plots -- Banking -- Linear Regression and All That -- Showing What's Important -- Graphical Analysis and Presentation Graphics -- Workshop: matplotlib -- Further Reading -- 4. Time As a Variable: Time-Series Analysis -- Examples -- The Task -- Smoothing -- Don't Overlook the Obvious! -- The Correlation Function -- Optional: Filters and Convolutions -- Workshop: Scipy.signal -- Further Reading -- 5. More than Two Variables: Graphical Multivariate Analysis -- False-Color Plots -- A Lot at a Glance: Multiplots -- Composition Problems -- Novel Plot Types -- Interactive Explorations -- Workshop: Tools for Multivariate Graphics -- Further Reading -- 6. Intermezzo: A Data Analysis Session -- A Data Analysis Session -- Workshop: gnuplot -- Further Reading -- pt. II Analytics: Modeling Data -- 7. Guesstimation and the Back of the Envelope -- Principles of Guesstimation -- How Good Are Those Numbers? -- Optional: A Closer Look at Perturbation Theory and Error Propagation -- Workshop: The Gnu Scientific Library (GSL) -- Further Reading -- 8. Models from Scaling Arguments -- Models -- Arguments from Scale -- Mean-Field Approximations -- Common Time-Evolution Scenarios -- Case Study: How Many Servers Are Best? -- Why Modeling? -- Workshop: Sage -- Further Reading -- 9. Arguments from Probability Models -- The Binomial Distribution and Bernoulli Trials -- The Gaussian Distribution and the Central Limit Theorem -- Power-Law Distributions and Non-Normal Statistics -- Other Distributions -- Optional: Case Study---Unique Visitors over Time -- Workshop: Power-Law Distributions -- Further Reading -- 10. What You Really Need to Know About Classical Statistics -- Genesis -- Statistics Defined -- Statistics Explaned -- Controlled Experiments Versus Observational Studies -- Optional: Bayesian Statistics---The Other Point of View -- Workshop: R -- Further Reading -- 11. Intermezzo: Mythbusting---Bigfoot, Least Squares, And All That -- How to Average Averages -- The Standard Deviation -- Least Squares -- Further Reading -- pt. III Computation: Mining Data -- 12. Simulations -- A Warm-Up Question -- Monte Carlo Simulations -- Resampling Methods -- Workshop: Discrete Event Simulations with Simpy -- Further Reading -- 13. Finding Clusters -- What Constitutes a Cluster? -- Distance and Similarity Measures -- Clustering Methods -- Pre-and Postprocessing -- Other Thoughts -- A Special Case: Market Basket Analysis -- A Word of Warning -- Workshop: Pycluster and the C Clustering Library -- Further Reading -- 14. Seeing the Forest For the Trees: Finding Important Attributes -- Principal Component Analysis -- Visual Techniques -- Kohonen Maps -- Workshop: PCA with R -- Further Reading -- 15. Intermezzo: When More is Different -- A Horror Story -- Some Suggestions -- What About Map/Reduce? -- Workshop: Generating Permutations -- Further Reading -- pt. IV Applications: Using Data -- 16. Reporting, Business Intelligence, and Dashboards -- Business Intelligence -- Corporate Metrics and Dashboards -- Data Quality Issues -- Workshop: Berkeley DB and SQLite -- Further Reading -- 17. Financial Calculations and Modeling -- The Time Value of Money -- Uncertainty in Planning and Opportunity Costs -- Cost Concepts and Depreciation -- Should You Care? -- Is This All That Matters? -- Workshop: The Newsvendor Problem -- Further Reading -- 18. Predictive Analytics -- Introduction -- Some Classification Terminology -- Algorithms for Classification -- The Process -- The Secret Sauce -- The Nature of Statistical Learning -- Workshop: Two Do-It-Yourself Classifiers -- Further Reading -- 19. Epilogue: Facts are Not Reality -- A. Programming Environments for Scientific Computation and Data Analysis -- Software Tools -- A Catalog of Scientific Software -- Writing Your Own -- Further Reading -- B. Results From Calculus -- Common Functions -- Calculus -- Useful Tricks -- Notation and Basic Math -- Where to Go from Here -- Further Reading -- C. Working with Data -- Sources for Data -- Cleaning and Conditioning -- Sampling -- Data File Formats -- The Care and Feeding of Your Data Zoo -- Skills -- Terminology -- Further Reading
Control code
ocn624414159
Dimensions
23 cm
Extent
xviii, 509 p.
Isbn
9780596802356
Isbn Type
(pbk.)
Other physical details
ill.
System control number
(OCoLC)624414159
Label
Data analysis with open source tools, Philipp K. Janert
Publication
Bibliography note
Includes bibliographical references and index
Contents
1. Introduction -- Data Analysis -- What's in This Book -- What's with the Workshops? -- What's with the Math? -- What You'll Need -- What's Missing -- pt. I Graphics: Looking at Data -- 2. A Single Variable: Shape and Distribution -- Dot and Jitter Plots -- Histograms and Kernel Density Estimates -- The Cumulative Distribution Function -- Rank-Order Plots and Lift Charts -- Only When Appropriate: Summary Statistics and Box Plots -- Workshop: Numpy -- Further Reading -- 3. Two Variables: Establishing Relationships -- Scatter Plots -- Conquering Noise: Smoothing -- Logarithmic Plots -- Banking -- Linear Regression and All That -- Showing What's Important -- Graphical Analysis and Presentation Graphics -- Workshop: matplotlib -- Further Reading -- 4. Time As a Variable: Time-Series Analysis -- Examples -- The Task -- Smoothing -- Don't Overlook the Obvious! -- The Correlation Function -- Optional: Filters and Convolutions -- Workshop: Scipy.signal -- Further Reading -- 5. More than Two Variables: Graphical Multivariate Analysis -- False-Color Plots -- A Lot at a Glance: Multiplots -- Composition Problems -- Novel Plot Types -- Interactive Explorations -- Workshop: Tools for Multivariate Graphics -- Further Reading -- 6. Intermezzo: A Data Analysis Session -- A Data Analysis Session -- Workshop: gnuplot -- Further Reading -- pt. II Analytics: Modeling Data -- 7. Guesstimation and the Back of the Envelope -- Principles of Guesstimation -- How Good Are Those Numbers? -- Optional: A Closer Look at Perturbation Theory and Error Propagation -- Workshop: The Gnu Scientific Library (GSL) -- Further Reading -- 8. Models from Scaling Arguments -- Models -- Arguments from Scale -- Mean-Field Approximations -- Common Time-Evolution Scenarios -- Case Study: How Many Servers Are Best? -- Why Modeling? -- Workshop: Sage -- Further Reading -- 9. Arguments from Probability Models -- The Binomial Distribution and Bernoulli Trials -- The Gaussian Distribution and the Central Limit Theorem -- Power-Law Distributions and Non-Normal Statistics -- Other Distributions -- Optional: Case Study---Unique Visitors over Time -- Workshop: Power-Law Distributions -- Further Reading -- 10. What You Really Need to Know About Classical Statistics -- Genesis -- Statistics Defined -- Statistics Explaned -- Controlled Experiments Versus Observational Studies -- Optional: Bayesian Statistics---The Other Point of View -- Workshop: R -- Further Reading -- 11. Intermezzo: Mythbusting---Bigfoot, Least Squares, And All That -- How to Average Averages -- The Standard Deviation -- Least Squares -- Further Reading -- pt. III Computation: Mining Data -- 12. Simulations -- A Warm-Up Question -- Monte Carlo Simulations -- Resampling Methods -- Workshop: Discrete Event Simulations with Simpy -- Further Reading -- 13. Finding Clusters -- What Constitutes a Cluster? -- Distance and Similarity Measures -- Clustering Methods -- Pre-and Postprocessing -- Other Thoughts -- A Special Case: Market Basket Analysis -- A Word of Warning -- Workshop: Pycluster and the C Clustering Library -- Further Reading -- 14. Seeing the Forest For the Trees: Finding Important Attributes -- Principal Component Analysis -- Visual Techniques -- Kohonen Maps -- Workshop: PCA with R -- Further Reading -- 15. Intermezzo: When More is Different -- A Horror Story -- Some Suggestions -- What About Map/Reduce? -- Workshop: Generating Permutations -- Further Reading -- pt. IV Applications: Using Data -- 16. Reporting, Business Intelligence, and Dashboards -- Business Intelligence -- Corporate Metrics and Dashboards -- Data Quality Issues -- Workshop: Berkeley DB and SQLite -- Further Reading -- 17. Financial Calculations and Modeling -- The Time Value of Money -- Uncertainty in Planning and Opportunity Costs -- Cost Concepts and Depreciation -- Should You Care? -- Is This All That Matters? -- Workshop: The Newsvendor Problem -- Further Reading -- 18. Predictive Analytics -- Introduction -- Some Classification Terminology -- Algorithms for Classification -- The Process -- The Secret Sauce -- The Nature of Statistical Learning -- Workshop: Two Do-It-Yourself Classifiers -- Further Reading -- 19. Epilogue: Facts are Not Reality -- A. Programming Environments for Scientific Computation and Data Analysis -- Software Tools -- A Catalog of Scientific Software -- Writing Your Own -- Further Reading -- B. Results From Calculus -- Common Functions -- Calculus -- Useful Tricks -- Notation and Basic Math -- Where to Go from Here -- Further Reading -- C. Working with Data -- Sources for Data -- Cleaning and Conditioning -- Sampling -- Data File Formats -- The Care and Feeding of Your Data Zoo -- Skills -- Terminology -- Further Reading
Control code
ocn624414159
Dimensions
23 cm
Extent
xviii, 509 p.
Isbn
9780596802356
Isbn Type
(pbk.)
Other physical details
ill.
System control number
(OCoLC)624414159

Library Locations

    • Wellington LibraryBorrow it
      Wellington- Massey University Library, Block 5, 63 Wallace Street, Wellington, 6021, NZ
      -40.385395 175.617407
Processing Feedback ...