いつも君は僕のPAYDAYを取り上げるんだ: Apache Incubater(2016)をさらっと見てみた

今後の流れみたいなのが見えるかなと思い、Apache Incubatorを年に1回ぐらいはチェックしてます。

#正直、ピンキリの世界なので、あくまで概要だけでもさらっと読んで流す程度ですけどね。

ちょっと前まではNoSQLの流行からかDB系のものが多かった気がするのですが、今はわりとカオスな状況な感じがします。あと夢で終わりそうなのもちらほら。

この中から今年Graduateするのはいくつ出るのでしょうね。

Airflow	Airflow is a workflow automation and scheduling system that can be used to author and manage data pipelines.
Apex	Apex is an enterprise grade native YARN big data-in-motion platform that unifies stream processing as well as batch processing.
AsterixDB	Apache AsterixDB is a scalable big data management system (BDMS) that provides storage, management, and query capabilities for large collections of semi-structured data.
Atlas	Apache Atlas is a scalable and extensible set of core foundational governance services that enables enterprises to effectively and efficiently meet their compliance requirements within Hadoop and allows integration with the complete enterprise data ecosystem
BatchEE	BatchEE projects aims to provide a JBatch implementation (aka JSR352) and a set of useful extensions for this specification.
Beam	Apache Beam is an open source, unified model and set of language-specific SDKs for defining and executing data processing workflows, and also data ingestion and integration flows, supporting Enterprise Integration Patterns (EIPs) and Domain Specific Languages (DSLs). Dataflow pipelines simplify the mechanics of large-scale batch and streaming data processing and can run on a number of runtimes like Apache Flink, Apache Spark, and Google Cloud Dataflow (a cloud service). Beam also brings DSL in different languages, allowing users to easily implement their data integration processes.
Blur	Blur is a search platform capable of searching massive amounts of data in a cloud computing environment.
Climate Model Diagnostic Analyzer	CMDA provides web services for multi-aspect physics-based and phenomenon-oriented climate model performance evaluation and diagnosis through the comprehensive and synergistic use of multiple observational data, reanalysis data, and model outputs.
CommonsRDF	Commons RDF is a set of interfaces and classes for RDF 1.1 concepts and behaviours. The commons-rdf-api module defines interfaces and testing harness. The commons-rdf-simple module provides a basic reference implementation to exercise the test harness and clarify API contracts.
Concerted	Apache Concerted is a Do-It-Yourself toolkit for building in-memory data engines.
DataFu	DataFu provides a collection of Hadoop MapReduce jobs and functions in higher level languages based on it to perform data analysis. It provides functions for common statistics tasks (e.g. quantiles, sampling), PageRank, stream sessionization, and set and bag operations. DataFu also provides Hadoop jobs for incremental data processing in MapReduce.
Eagle	Eagle is a Monitoring solution for Hadoop to instantly identify access to sensitive data, recognize attacks, malicious activities and take actions in real time.
Fineract	Fineract is an open source system for core banking as a platform.
FreeMarker	FreeMarker is a template engine, i.e. a generic tool to generate text output based on templates. FreeMarker is implemented in Java as a class library for programmers.
Gearpump	Gearpump is a reactive real-time streaming engine based on the micro-service Actor model.
Geode	Geode is a data management platform that provides real-time, consistent access to data-intensive applications throughout widely distributed cloud architectures.
Guacamole	Guacamole is an enterprise-grade, protocol-agnostic, remote desktop gateway. Combined with cloud hosting, Guacamole provides an excellent alternative to traditional desktops. Guacamole aims to make cloud-hosted desktop access preferable to traditional, local access.
HAWQ	HAWQ is an advanced enterprise SQL on Hadoop analytic engine built around a robust and high-performance massively-parallel processing (MPP) SQL framework evolved from Pivotal Greenplum Database.
HORN	HORN is a neuron-centric programming APIs and execution framework for large-scale deep learning, built on top of Apache Hama.
HTrace	HTrace is a tracing framework intended for use with distributed systems written in java.
Impala	Impala is a high-performance C++ and Java SQL query engine for data stored in Apache Hadoop-based clusters.
iota	Open source system that enables the orchestration of IoT devices.
Johnzon	Implementation of JSR-353 JavaTM API for JSON Processing (Renamed from Fleece)
Joshua	Joshua is a statistical machine translation toolkit
Kudu	Kudu is a distributed columnar storage engine built for the Apache Hadoop ecosystem.
log4cxx2	Logging for C++
MADlib	Big Data Machine Learning in SQL for Data Scientists.
Metron	Metron is a project dedicated to providing an extensible and scalable advanced network security analytics tool. It has strong foundations in the Apache Hadoop ecosystem.
Milagro	Distributed Cryptography; M-Pin protocol for Identity and Trust
Mnemonic	Mnemonic is a Java based non-volatile memory library for in-place structured data processing and computing.
MRQL	MRQL is a query processing and optimization system for large-scale, distributed data analysis, built on top of Apache Hadoop, Hama, Spark, and Flink.
Mynewt	Mynewt is a real-time operating system for constrained embedded systems like wearables, lightbulbs, locks and doorbells. It works on a variety of 32-bit MCUs (microcontrollers), including ARM Cortex-M and MIPS architectures.
Myriad	Myriad enables co-existence of Apache Hadoop YARN and Apache Mesos together on the same cluster and allows dynamic resource allocations across both Hadoop and other applications running on the same physical data center infrastructure.
ODF Toolkit	Java modules that allow programmatic creation, scanning and manipulation of OpenDocument Format (ISO/IEC 26300 == ODF) documents
Omid	Omid is a flexible, reliable, high performant and scalable ACID transactional framework that allows client applications to execute transactions on top of MVCC key/value-based NoSQL datastores (currently Apache HBase) providing Snapshot Isolation guarantees on the accessed data.
OpenAz	Tools and libraries for developing Attribute-based Access Control (ABAC) Systems in a variety of languages.
Quarks	Quarks is a stream processing programming model and lightweight runtime to execute analytics at devices on the edge or at the gateway.
Quickstep	Quickstep is a high-performance database engine.
Ranger	The Ranger project is a framework to enable, monitor and manage comprehensive data security across the Hadoop platform.
Rya	Rya (pronounced "ree-uh" /rēə/) is a cloud-based RDF triple store that supports SPARQL queries. Rya is a scalable RDF data management system built on top of Accumulo. Rya uses novel storage methods, indexing schemes, and query processing techniques that scale to billions of triples across multiple nodes. Rya provides fast and easy access to the data through SPARQL, a conventional query mechanism for RDF data.
S2Graph	S2Graph is a distributed and scalable OLTP graph database built on Apache HBase to support fast traversal of extremely large graphs.
SAMOA	SAMOA provides a collection of distributed streaming algorithms for the most common data mining and machine learning tasks such as classification, clustering, and regression, as well as programming abstractions to develop new algorithms that run on top of distributed stream processing engines (DSPEs). It features a pluggable architecture that allows it to run on several DSPEs such as Apache Storm, Apache S4, and Apache Samza.
Singa	Singa is a distributed deep learning platform.
Sirona	Monitoring Solution.
Slider	Slider is a collection of tools and technologies to package, deploy, and manage long running applications on Apache Hadoop YARN clusters.
Streams	Apache Streams is a lightweight server for ActivityStreams.
SystemML	SystemML provides declarative large-scale machine learning (ML) that aims at flexible specification of ML algorithms and automatic generation of hybrid runtime plans ranging from single node, in-memory computations, to distributed computations such as Apache Hadoop MapReduce and Apache Spark.
Tamaya	Tamaya is a highly flexible configuration solution based on an modular, extensible and injectable key/value based design, which should provide a minimal but extendible modern and functional API leveraging SE, ME and EE environments.
Taverna	Taverna is a domain-independent suite of tools used to design and execute data-driven workflows.
Tephra	Tephra is a system for providing globally consistent transactions on top of Apache HBase and other storage engines.
TinkerPop	TinkerPop is a graph computing framework written in Java
Toree	Toree provides applications with a mechanism to interactively and remotely access Apache Spark.
Trafodion	Trafodion is a webscale SQL-on-Hadoop solution enabling transactional or operational workloads on Hadoop.
Twill	Twill is an abstraction over Apache Hadoop YARN that reduces the complexity of developing distributed applications, allowing developers to focus more on their business logic
Unomi	Unomi is a reference implementation of the OASIS Context Server specification currently being worked on by the OASIS Context Server Technical Committee. It provides a high-performance user profile and event tracking server.
Wave	A wave is a hosted, live, concurrent data structure for rich communication. It can be used like email, chat, or a document.
Zeppelin	A collaborative data analytics and visualization tool for distributed, general-purpose data processing systems such as Apache Spark, Apache Flink, etc.

いつも君は僕のPAYDAYを取り上げるんだ

ページ

2016年4月18日月曜日

Apache Incubater(2016)をさらっと見てみた

0 件のコメント:

コメントを投稿