Here you will find links related to the BDEM Project and within the area of interest.
Open Source Tools
Apache Hadoop is a project developing open-source software for reliable, scalable, distributed computing
Apache Flink is an open-source platform for distributed stream and batch data processing
Apache Spark is a general engine for large-scale data processing
Enabling Big Data through Europe’s New Data Protection Regulation. Viktor Mayer-Schönberger & Yann Padova
Privacy in the Age of Big Data, The Stanford Law Review
Ethics and Big Data
Perspectives on Big Data, Ethics, and Society. May 23, 2016 / By Jacob Metcalf, Emily F. Keller Danah Boyd
The Social, Cultural, & Ethical Dimensions of “Big Data”, March 17, 2014 – New York, NY
Advanced AI Tools
TensorFlow - an open source software library for numerical computation using data flow graphs.
The Microsoft Cognitive Toolkit: A free, open-source, commercial-grade toolkit that trains deep learning algorithms to learn like the human brain.
Global Map: A set of consistent GIS layers covering the whole globe at 1km resolution including: transportation, elevation, drainage, vegetation, administrative boundaries, land cover, land use and population centres. Produced by the International Steering Committee on Global Mapping.
Koordinates: GIS data aggregation site including data in a number of categories such as elevation, environment, climate etc. Some global datasets, some based on continents, some for specific countries. Registration required.
European Environment Agency: Maps and datasets from the European Environment Agency, covering a huge range of physical geography and environmental topics. Europe only.
Satellite Application Facility on Climate Monitoring: Provides near real-time and retroactively-generated datasets of cloud cover, type and temperature, surface radiation budget and temperatures, among others.
Gridded climatic data for North America, South America and Europe: A huge range of climatic data at 1km and 4km resolution, derived from various models, including temperature, precipitation, snow and derived variables such as water deficit.
Natural Disaster Hazards: Hazard Frequency, Mortality and Economic Loss Risk as gridded data for the globe. Covers cyclones, drought, earthquakes, flood, landslide, volcano and a combination of them all.
Natural Disaster Hotspots: A wide range of geographic data on natural disasters (including volcanoes, earthquakes, landslide, flood and 'multihazards') with hazard frequency, economic loss etc.
Open Flights: Airport, airline and route data across the globe. Data is provided as CSV files which can be easily processed to produce GIS outputs. Data includes all known airports, and a large number of routes betwen airports.
Global Roads Open Access Data Set: A vector dataset of roads across the world, using a globally consistent data model, and suitable for mapping at the 1:250,000 level. Only roads between settlements are included, not residential streets, and the dataset is accurate to approximately 50m.
Earth Engine’s public data catalog includes a variety of standard Earth science raster datasets.
Capitaine European Train Stations: Metadata for all train stations in Europe including latitude and longitude.
GAR15: UN dataset for Global Assessment of Risk, showing the amount of capital invested in infrastructure at a 5km resolution. Useful for assessment of infrastructure risk and cost of natural disasters.
MODIS provides continuous global coverage every one to two days, and collects data from 36 spectral bands. Resolution: 250-1000m. 1999 Wide range of different datasets.
Datasets for the BDEM Course
The following datasets have been filtered and refined from a social media (Twitter) dataset, which can be used for BDEM course on Big data and Emergency Management.
DTdata has a header row consisting of four attributes such as Topic, TWDate, RTNumber and Demand, and 27 rows of training data. Demand would be the output variable as the predicted class, and the others would be the input variables.
NBdata is same with upper dataset (i.e. DTdata), except for the number of retweet. The RTNumber column containing numerical numbers is transformed to categorical values for easy calculating the probabilities. In addition, the data set contains one record as test data.
KMdata contains 161 tweets with location data (i.e. GPS coordinates) to group it. Note that we created the latitude and longitude of extracted physical addresses from the collected tweets by performing a geocoding procedure, and negative values of the west longitudes were changed into positive values to fulfil the k-mean clustering.
SVMdata1 is generated by grouping into two or three clusters for the KMdata. It contains four column TWNumber, Latitude, Longitude and ClusterValue. The column ClusterValue indicates group numbers as the results of k-means clustering.
ANNdata is manipulated from an original data set and consequently contains five columns such as TWDate, RTNumber as integer, Latitude, Longitude and Demand. The TWDate was modified as generation days (i.e. 27, 28 and 29), and the Demand was distinguished into three values (i.e. 0, 0.5 and 1). The values denote the relevance degree of tweets for demand, in other words "0" and "1" respectively represent "no relevance for demand" and "related to demand."
Datasets in Norway
Norwegian Mapping Agency Open Data: Open data from the Norwegian Mapping Agency, including topographical maps, road networks, elevation data, place names etc.
An API with ready-made datasets from SSB
Floods datasets in Norway
Norweigan Land Cover: Various datasets concerning land resources in Norway provided by the Norwegian Landscape and Forest Institute, including land type, forest, tree species and site index .
Open and free geospatial data from Norway
Geological Survey of Norway: Geological data for Norway
Norwegian Petroleum Directorate: Data on licensed extraction areas, wells, fields, pipelines and survey data
HSDPA-bandwidth logs for mobile HTTP streaming scenarios (source: UiO)
Soccer Video and Player Position Dataset
Other Video/Audio Datasets
Berkeley DeepDrive BDD100k: The dataset for self-driving AI. It has over 100,000 videos of over 1,100-hour driving experiences across different times of the day and weather conditions. The annotated images come from New York and San Francisco areas.
Europeana Data, contains open metadata on 20 million texts, images, videos and sounds gathered by Europeana - the trusted and comprehensive resource for European cultural heritage content.
Pouring Dataset: Videos of people pouring a variety of liquids from and into a variety of receptacles, used for research on unsupervised imitation learning (This data is licensed by Google Inc. under a Creative Commons Attribution 4.0 International License.)
An autonomous driving dataset and benchmark for optical flow: HD1K Benchmark Suite.
Multi-view video datasets based on 360° cameras.
The Cityscapes Dataset focuses on semantic understanding of urban street scenes.
DriveU Traffic Light Dataset — a dataset which addresses to researchers in the field of traffic light recognition/detection.
Sahana (Open and free system)
Ushahidi (Open and free system)
GeoNames (Geo-tagging software)
OpenStreetMap (Geographical information, important for gazeteers)
PyBossa (Crowdsourcing software)
Data visualization tools
GATE (Text processing)
WEKA (Open-source data mining software in Java)
ArkNLP (Twitter specific Natural Language Processing)
HDX (Humanitarian Data eXchange, datasets of humanitarian variables by UN OCHA)
TREC Temporal Summarization Track (Corpus for social media update summarization)
Twitter Events Corpus 120 million tweets, with relevance judgments for over 500 events
Disaster Risk - Datasets
TREC Microblog Corpus (Corpus of social media messages)
TREC Temporal Summarization – crisis events from 2012 aligned with TREC KBA Corpus
CrisisLex (Corpora of disaster-related social media messages)
CredBank (Corpus for credibility research)
Japan Radiation Map (derived from the SPEEDI data set)
Scikit-learn is simple and efficient tools for data mining and data analysis, accessible to anyone, and reusable in several context, built on NumPy, SciPy, and matplotlib, open source, commercially usable – BSD license. Github URL: Scikit-learn
Keras, a high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano.
Fuel is a data pipeline framework which provides your machine learning models with the data they need. It is planned to be used by both the Blocks and Pylearn2 neural network libraries. Github URL: Fuel
PyTorch, Tensors and Dynamic neural networks in Python with strong GPU acceleration. Github URL: pytorch
Nilearn is a Python module for fast and easy statistical learning on NeuroImaging data. It leverages the scikit-learn Python toolbox for multivariate statistics with applications such as predictive modelling, classification, decoding, or connectivity analysis. Github URL: Nilearn