Graphs with Big data

Vision. Spark.ML. MLLib. Data frame. SVD. Import.apache.spark. CRIM: Centre recherche.. Spark. Group by reduce by Key. Spark is not a hdfs hbase Cassandra. Gît. Microbatching. Why graphs: Web semantics. Communities. GraphLab. MS. Graph Construction. Post-processing. Triangles. Graphx: Vertex n edge tables Val relationships rdd.vertex(string string) SC.Paralallize(array(= RDD.edge() Val graph=graph() Save Vertices.saveAsObjectFile. Usecases:Page rank. Triangles. Shortest[…]

BI Roles & Responsibilities Matrix

Here are the roles and responsibilities of BI specialists as a matrix:  Role  Responsibilities  Qualifications Senior Data Analytics Strategist – Responsible for developing data analytics and measurement strategies to facilitate fact based planning and decision making at the City – enable service improvement projects. – A technical leader and subject matter expert for data and analytics.[…]

SAP Business Objects

Business Objects: A set of SAP tools that help build a data analytics solution: – Data Integration: BO Data Services. – Formatable reports: Crystal Reports. – Dashboarding: Xelisius. – Data Exploration: Explorer. – Ad-hoc reporting: Web intelligence. – In-depth analysis: BEx Analyzer, an excel component. First experience with SAP BO: Creation of a universe. A[…]

Big Data and Security – Cloudera

Security issues to consider for securing the data: What can be accessed by who when where from. Authentication. Authorization. Encryption. Key management. Identity management system. Cloudera has a distribution of Hadoop that contains advanced security features, serving two objectives: 1. To protect the data contained in Hadoop cluster. 2. To analyze stream data detecting where security might have[…]

Data Hub

A data hub is a centralized location for data, a special case of a data lake, where data are well structured, homogeneous, and with high reusability: serving data in multiple formats from multiple sources and to multiple potential destination. Multiple data hub architectures exist: The Publish-Subscribe Data Hub The Integration Hub The Operational Data Store (ODS)[…]

Software Development Life Cycle of Machine Learning Projects

Software Development Life Cycle of Machine Learning Projects is split into two phases: R&D: Preprocessing: Business objective and rules R&D: BPM. UML. Use Case. etc. Data R&D. Data Profiling. ETL: Extraction. Cleanup. Integration. Transformation. Aggregation. Solution Model R&D: Understand the problem and the solution required (Classification, number forecasting, etc.) Use different techniques, get the most accurate[…]

Dashboard & BI Requirements Design

Here is a template and an example for Dashboard & BI Requirements Design:   Vision: Optimize profit by optimizing lead funneling process: Reduce Costs. Increase Profit.   Sub-Objectives: Lead Source Optimization Marketing Optimization Call Center Optimization Future: Call Center Internal Optimization Actor Lead Manager Campaign Director Campaign Director Internal Call Center Manager Objectives Choose best and[…]

SQL Functions

SQL functions are functions that return one or more data points when they are called. They can be called within the queries. There multiple types of functions: Scalar Functions: Returning one value. Like Len() function. In-line Table-valued Functions (ITVF): You can call as tables. CREATE FUNCTION s1.F1(int @b1) RETURNS TABLE AS RETURN SELECT a FROM T WHERE[…]