2016年4月29日金曜日

Alpine Linuxが軽量で素敵

このエントリーをはてなブックマークに追加
Dockerなどのコンテナを利用する上でOSが軽量なのは非常にいいことかと思い、VMなどの仮想マシンでも軽量なことはやはりうれしいことです。

軽量ディストリビューションの中で最近、Alpine Linuxがちょっと流行りだ。サイトでもうたっている特徴としては、

Alpine Linux is built around musl libc and busybox. This makes it smaller and more resource efficient than traditional GNU/Linux distributions.
GNU Linuxに沿い、BusyBoxやmusl libcなどを使用して軽量化していること。
A container requires no more than 8 MB and a minimal installation to disk requires around 130 MB of storage.
MB単位でのディスクの利用しかしないディストリビューションであること。

を特徴としている。

そこで一度使ってみることにした。

基本的なところ


・デフォルトシェルはash
・パッケージマネージャはapk
コマンドの詳細は
http://wiki.alpinelinux.org/wiki/Alpine_Linux_package_management#Overview

2016年4月18日月曜日

Apache Incubater(2016)をさらっと見てみた

このエントリーをはてなブックマークに追加
今後の流れみたいなのが見えるかなと思い、Apache Incubatorを年に1回ぐらいはチェックしてます。

#正直、ピンキリの世界なので、あくまで概要だけでもさらっと読んで流す程度ですけどね。

ちょっと前まではNoSQLの流行からかDB系のものが多かった気がするのですが、今はわりとカオスな状況な感じがします。あと夢で終わりそうなのもちらほら。

この中から今年Graduateするのはいくつ出るのでしょうね。


Airflow is a workflow automation and scheduling system that can be used to author and manage data pipelines.
Apex is an enterprise grade native YARN big data-in-motion platform that unifies stream processing as well as batch processing.
Apache AsterixDB is a scalable big data management system (BDMS) that provides storage, management, and query capabilities for large collections of semi-structured data.
Apache Atlas is a scalable and extensible set of core foundational governance services that enables enterprises to effectively and efficiently meet their compliance requirements within Hadoop and allows integration with the complete enterprise data ecosystem
BatchEE projects aims to provide a JBatch implementation (aka JSR352) and a set of useful extensions for this specification.
Apache Beam is an open source, unified model and set of language-specific SDKs for defining and executing data processing workflows, and also data ingestion and integration flows, supporting Enterprise Integration Patterns (EIPs) and Domain Specific Languages (DSLs). Dataflow pipelines simplify the mechanics of large-scale batch and streaming data processing and can run on a number of runtimes like Apache Flink, Apache Spark, and Google Cloud Dataflow (a cloud service). Beam also brings DSL in different languages, allowing users to easily implement their data integration processes.
Blur is a search platform capable of searching massive amounts of data in a cloud computing environment.
CMDA provides web services for multi-aspect physics-based and phenomenon-oriented climate model performance evaluation and diagnosis through the comprehensive and synergistic use of multiple observational data, reanalysis data, and model outputs.
Commons RDF is a set of interfaces and classes for RDF 1.1 concepts and behaviours. The commons-rdf-api module defines interfaces and testing harness. The commons-rdf-simple module provides a basic reference implementation to exercise the test harness and clarify API contracts.
Apache Concerted is a Do-It-Yourself toolkit for building in-memory data engines.
DataFu provides a collection of Hadoop MapReduce jobs and functions in higher level languages based on it to perform data analysis. It provides functions for common statistics tasks (e.g. quantiles, sampling), PageRank, stream sessionization, and set and bag operations. DataFu also provides Hadoop jobs for incremental data processing in MapReduce.
Eagle is a Monitoring solution for Hadoop to instantly identify access to sensitive data, recognize attacks, malicious activities and take actions in real time.
Fineract is an open source system for core banking as a platform.
FreeMarker is a template engine, i.e. a generic tool to generate text output based on templates. FreeMarker is implemented in Java as a class library for programmers.
Gearpump is a reactive real-time streaming engine based on the micro-service Actor model.
Geode is a data management platform that provides real-time, consistent access to data-intensive applications throughout widely distributed cloud architectures.
Guacamole is an enterprise-grade, protocol-agnostic, remote desktop gateway. Combined with cloud hosting, Guacamole provides an excellent alternative to traditional desktops. Guacamole aims to make cloud-hosted desktop access preferable to traditional, local access.
HAWQ is an advanced enterprise SQL on Hadoop analytic engine built around a robust and high-performance massively-parallel processing (MPP) SQL framework evolved from Pivotal Greenplum Database.
HORN is a neuron-centric programming APIs and execution framework for large-scale deep learning, built on top of Apache Hama.
HTrace is a tracing framework intended for use with distributed systems written in java.
Impala is a high-performance C++ and Java SQL query engine for data stored in Apache Hadoop-based clusters.
Open source system that enables the orchestration of IoT devices.
Implementation of JSR-353 JavaTM API for JSON Processing (Renamed from Fleece)
Joshua is a statistical machine translation toolkit
Kudu is a distributed columnar storage engine built for the Apache Hadoop ecosystem.
Logging for C++
Big Data Machine Learning in SQL for Data Scientists.
Metron is a project dedicated to providing an extensible and scalable advanced network security analytics tool. It has strong foundations in the Apache Hadoop ecosystem.
Distributed Cryptography; M-Pin protocol for Identity and Trust
Mnemonic is a Java based non-volatile memory library for in-place structured data processing and computing.
MRQL is a query processing and optimization system for large-scale, distributed data analysis, built on top of Apache Hadoop, Hama, Spark, and Flink.
Mynewt is a real-time operating system for constrained embedded systems like wearables, lightbulbs, locks and doorbells. It works on a variety of 32-bit MCUs (microcontrollers), including ARM Cortex-M and MIPS architectures.
Myriad enables co-existence of Apache Hadoop YARN and Apache Mesos together on the same cluster and allows dynamic resource allocations across both Hadoop and other applications running on the same physical data center infrastructure.
Java modules that allow programmatic creation, scanning and manipulation of OpenDocument Format (ISO/IEC 26300 == ODF) documents
Omid is a flexible, reliable, high performant and scalable ACID transactional framework that allows client applications to execute transactions on top of MVCC key/value-based NoSQL datastores (currently Apache HBase) providing Snapshot Isolation guarantees on the accessed data.
Tools and libraries for developing Attribute-based Access Control (ABAC) Systems in a variety of languages.
Quarks is a stream processing programming model and lightweight runtime to execute analytics at devices on the edge or at the gateway.
Quickstep is a high-performance database engine.
The Ranger project is a framework to enable, monitor and manage comprehensive data security across the Hadoop platform.
Rya (pronounced "ree-uh" /rēə/) is a cloud-based RDF triple store that supports SPARQL queries. Rya is a scalable RDF data management system built on top of Accumulo. Rya uses novel storage methods, indexing schemes, and query processing techniques that scale to billions of triples across multiple nodes. Rya provides fast and easy access to the data through SPARQL, a conventional query mechanism for RDF data.
S2Graph is a distributed and scalable OLTP graph database built on Apache HBase to support fast traversal of extremely large graphs.
SAMOA provides a collection of distributed streaming algorithms for the most common data mining and machine learning tasks such as classification, clustering, and regression, as well as programming abstractions to develop new algorithms that run on top of distributed stream processing engines (DSPEs). It features a pluggable architecture that allows it to run on several DSPEs such as Apache Storm, Apache S4, and Apache Samza.
Singa is a distributed deep learning platform.
Monitoring Solution.
Slider is a collection of tools and technologies to package, deploy, and manage long running applications on Apache Hadoop YARN clusters.
Apache Streams is a lightweight server for ActivityStreams.
SystemML provides declarative large-scale machine learning (ML) that aims at flexible specification of ML algorithms and automatic generation of hybrid runtime plans ranging from single node, in-memory computations, to distributed computations such as Apache Hadoop MapReduce and Apache Spark.
Tamaya is a highly flexible configuration solution based on an modular, extensible and injectable key/value based design, which should provide a minimal but extendible modern and functional API leveraging SE, ME and EE environments.
Taverna is a domain-independent suite of tools used to design and execute data-driven workflows.
Tephra is a system for providing globally consistent transactions on top of Apache HBase and other storage engines.
TinkerPop is a graph computing framework written in Java
Toree provides applications with a mechanism to interactively and remotely access Apache Spark.
Trafodion is a webscale SQL-on-Hadoop solution enabling transactional or operational workloads on Hadoop.
Twill is an abstraction over Apache Hadoop YARN that reduces the complexity of developing distributed applications, allowing developers to focus more on their business logic
Unomi is a reference implementation of the OASIS Context Server specification currently being worked on by the OASIS Context Server Technical Committee. It provides a high-performance user profile and event tracking server.
A wave is a hosted, live, concurrent data structure for rich communication. It can be used like email, chat, or a document.
A collaborative data analytics and visualization tool for distributed, general-purpose data processing systems such as Apache Spark, Apache Flink, etc.

2016年4月16日土曜日

VoyagerX2をVMWare Fusionで動かした時のディスプレイ解像度固定の方法

このエントリーをはてなブックマークに追加
どこ触ってもできなかったので、メモしておく。

最終的には/etc/default/grubに記述すれば反映された。

#GRUB_GFXMODE=640x480
GRUB_GFXMODE=2560x1600   # 設定したい解像度

2016年4月11日月曜日

Windows10で使えるいいターミナルソフトがないか探してTerminalを捨て結局Rlginを選んだ

このエントリーをはてなブックマークに追加
Windows10で使えるいいターミナルソフトがないか探してTerminalsを選んだを書いたんだけど、割と見掛け倒しでしょぼかったので今度こそは思って選んだのがRloginというもの。

割と多機能なようでポートフォワードも対応してるし、スクリプトもかけるらしいのでこれにした。

#自分的にはリモートデスクトップをSSH Tunnelで接続したりしてるのでポートフォワード超重要。

SSHはTerminalの際に出たdiffie-hellman-group1-sha1問題も無く普通に接続できた。なんだよ。やっぱり普通にできるじゃねーか。


さてポートフォワードの出番だ。これが使いたい。

「ファイル」→「サーバに接続」を選択し、「新規」を選択すると下記のような画面になる。

「プロトコル」から「ssh」を選択し、リモートサーバのIPアドレス、アカウントとパスワードを入力する。


左側のメニューから「プロトコル」を選択し、下にある「ポートフォワード」を押す。


「新規」を押し、


Listenするポートと接続に利用するポートを入力する。


このようにリモートに接続できたことを確認したら、


「リモートデスクトップ接続」を開いて接続すると、


Windowsファイアウォールが通信を許可するか聞いてくるので「はい」を選択するとRDPで接続ができる。


これでいいじゃん!

2016年4月9日土曜日

Go言語を使って見る

このエントリーをはてなブックマークに追加
無駄に色々手を出してみているが気にしないでほしい。何かひ触ってみることでひらめくこともいっぱいあるんじゃないかと思っている。どうでもいいことだけど最近RICOH THETAを触った時にズキュンと使えそうな場面がいろいろ思いついた。あの感覚は大事にしたいと思っている。

ということでGo言語が最近なんかおもしろそうなので手をだしてみた。今回もOSはLinuxを使ってインストールする。


Linux用のtar.gzをサイトからダウンロード。

$ wget https://storage.googleapis.com/golang/go1.6.linux-amd64.tar.gz

解凍する。

$ tar -xvzf go1.6.linux-amd64.tar.gz

環境変数GOROOTGOPATHPATH~/.bash_profileなどに記述して通す。

export GOROOT=/usr/local/go
export GOPATH=$HOME/work
export PATH=$GOROOT/bin:$PATH:$HOME/bin

上にもあるように$GOPATHに必要なworkディレクトリを作成する。

$ mkdir $HOME/work

必要なディレクトリはworkの下にsrcディレクトリとbinディレクトリを作成する必要がある。

$ mkdir $GOPATH/src
$ mkdir $GOPATH/bin

そのあと、$GOPATH/srcディレクトリの下にはパッケージ名を表したディレクトリを作成する。今回はベタにhelloだ

$ mkdir $GOPATH/src/hello

$GOPATH/src/helloディレクトリの下にソースを書いて試してみる。Getting Startedのまんまだ。

package main

import "fmt"

func main() {
 fmt.Printf("Hello, world.\n")
}

コンパイルしてみる。installのあとはパッケージ名(=ディレクトリ名)になる。

$ go install hello

$GOPATH/binディレクトリの下にhelloという実行ファイルができているので、それを実行する。

$ $GOPATH/bin/hello
Hello, world.

Printfしかしてないけど、もう少し踏み込めたら。