Genestack Python Client Library¶
The Genestack Python Client Library is a Python library that allows you to interact programmatically with an instance of the Genestack platform.
Supported Python versions:
- Python 2.7: 2.7.5 and newer
- Python 3: 3.5 and newer
At a low level, it allows you only to login to Genestack as a specific user (like you would through a web browser) and call the public Java methods of any application that your user has access to.
Importantly, the Genestack Python Client Library includes the genestack-application-manager
, a command-line utility that allows you to upload your own applications to a Genestack instance. This tool is essential if you are a third-party developer, as this is currently the only way for you to upload your apps to a Genestack instance.
Several functions are also provided to perform typical Genestack file system operations, such as uploading files to Genestack, finding files, retrieving their metainfo, and so on. Additionally, several wrapper classes allow you to interact with command-line applications on the platform, to create files and edit their parameters.
Apart from uploading apps, typical use cases of the Python Client Library include:
- uploading many files to Genestack at once
- editing the metainfo of Genestack files based on some local data (e.g. an Excel spreadsheet)
- creating / updating files using command-line applications
Note
All communications with the Genestack server use HTTPS.
You can read the section Getting Started with the Genestack Python Client Library for a gentle introduction to the client library.
Contents¶
Getting Started with the Genestack Python Client Library¶
Installing the Library¶
You can install Genestack Python Client Library using pip
if it is available on your system (if not, have a look at the pip install instructions)
$ pip install genestack-client
Note
You can also install the library without pip
:
You need to download the sources, either by cloning the repository from GitHub or by retrieving and unpacking the latest release. Then, open up a terminal, go to the folder containing the sources and run:
$ python setup.py install .
In that case, make sure that the dependencies keyring and requests are installed.
Then, you can test your installation by executing:
$ python -c 'import genestack_client; print genestack_client.__version__'
If the command executes without returning an error, you have successfully installed the Genestack Python Client Library. Yay!
Note
If you see a warning such as InsecurePlatformWarning: A true SSLContext
object is not available
in the console, you can either update your Python
to the latest 2.7.*
version (or switch to an up-to-date Python 3
version), or install the security
package extras using pip
:
$ sudo pip install 'requests[security]'
Configuring Credentials¶
The Genestack Python Client Library works by logging in to a Genestack instance (by default platform.genestack.org). Therefore, before doing anything you need to have an account on the Genestack instance to which you want to connect.
To avoid typing in your credentials every time you connect to a Genestack instance programmatically, the library comes with a utility genestack-user-setup
which allows you to store locally, in a secure manner, a list of user identities to login to Genestack. To configure your first user identity, type in the following command in a terminal:
$ genestack-user-setup init -H https://<odm-host>
You will be prompted for your token or email and password to connect to Genestack. If they are valid and the connection to the Genestack server is successful, you’re all set!
To check the result, you can run:
$ genestack-user-setup list
user@email.com (default):
email user@email.com
host platform.genestack.org
Warning
By default, your passwords and/or tokens will be stored using a secure storage system provided by your OS (see https://pypi.python.org/pypi/keyring for more information) If the secure storage system is not accessible, you will be asked for permission to store your password in plain text in a configuration file. However, this option is strongly discouraged. You have been warned!
Note
The information you supply to genestack-user-setup
is only stored locally on your computer. Therefore, if you change your password or token in Platform UI, you will need to update your local configuration as well.
Setting up additional users¶
If you have multiple accounts on Genestack (or you are using multiple instances of Genestack), you can define multiple identities with the genestack-user-setup
.
Each user has an alias (unique identifier), an email address, a host address and a password. The host name will be platform.genestack.com
by default. There is no limitation to the number of identities you can store locally, and you can even use different aliases for the same account. To add a new identity, type in:
$ genestack-user-setup add
Note
To know more about user management, have a look at: genestack-user-setup
Connecting to a Genestack instance¶
To communicate with a Genestack instance using the library, the first thing you need is to open a connection to the server.
Passing Connection Parameters via Command-line Arguments¶
The easiest way to open a connection is through the helper function: get_connection()
.
It uses command line arguments parsed by an argparse.ArgumentParser
to find your credentials in the local config file. If no arguments are supplied to your script, the connection will attempt to log in with the default user specified by genestack-user-setup
.
You can specify another user by appending -u <user_alias>
to your command line call. For example, let’s consider the following script, saved in my_genestack_script.py
, that simply creates a connection to the Genestack server and returns the e-mail address of the current user:
from genestack_client import get_connection
connection = get_connection()
print connection.whoami()
Using the connection parameters, you can run this script from a terminal using different Genestack identities:
# login with default user
$ python my_genestack_script.py
user@email.com
# login as bob@email.com, present in the config file under the alias "bob"
$ python my_genestack_script.py -u bob
bob@email.com
If your script accepts custom command-line arguments, you can add them to the arguments parser returned by make_connection_parser()
.
The arguments -u
, -p
, --host
(-H
), --token
, --show-logs
and --debug
are reserved for the connection parameters.
Have a look at the following example:
from genestack_client import get_connection, make_connection_parser
# create an instance of argparse.ArgumentParser with predefined arguments for connection
parser = make_connection_parser()
parser.add_argument('-c', '--unicorn', dest='unicorn', action='store_true', help='Set if you have a unicorn.')
args = parser.parse_args()
connection = get_connection(args)
email = connection.whoami()
if args.unicorn:
print '%s has a UNICORN!!' % email
else:
print '%s does not have a unicorn :(' % email
$ python my_script.py --unicorn
user@email.com has a UNICORN!!
$ python my_script.py -u bob
bob@email.com does not have a unicorn :(
Warning
If you use custom arguments, make sure to follow the syntax of the previous script: first, retrieve the parser with make_connection_parser()
, then add the new argument to it, parse the command-line arguments and finally send them to get_connection
.
Arguments Accepted by the Connection Parser¶
If no connection parameter is passed to your script, get_connection
will attempt a connection using the default identity from your local configuration file (you can change it via the command genestack-user-setup default
).
If only the parameter -u <alias>
is supplied, the parser will look for the corresponding identity in the local configuration file. If no match is found, the script will switch to interactive login.
You can also supply the parameters -u <email> -H <host> -p <password>
.
By default, the host is platform.genestack.com
and if no password is provided, you will be prompted for one.
Or you can supply -H <host> --token <token>
.
$ python my_script.py -u user@email.com -H platform.genestack.org -p password
$ python my_script.py -H platform.genestack.org --token token
Using Hard-coded Connection Parameters¶
You can also supply hard-coded parameters for the connection directly inside your script.
Warning
This approach is only provided for reference, but it is strongly discouraged, as it requires you (among other things) to store your e-mail and password in plain text inside your code.
from genestack_client import Connection
# crease connection object for server
connection = Connection('https://platform.genestack.org/endpoint')
# login as user: 'user@email.com' with password 'password'
connection.login('user@email.com', 'password')
print connection.whoami()
$ python my_script.py
user@email.com
Calling an Application’s Methods¶
You can use the client library to call the public Java methods of any application that is available to the current user. You just need to supply the application ID and the method name
from genestack_client import get_connection
connection = get_connection()
print connection.application('genestack/signin').invoke('whoami')
And here is how to call a Java method with arguments:
from genestack_client import get_connection, Metainfo, PRIVATE
connection = get_connection()
metainfo = Metainfo()
metainfo.add_string(Metainfo.NAME, "New folder")
print connection.application('genestack/filesUtil').invoke('createFolder', PRIVATE, metainfo)
The number, order and type of the arguments should match between your Java methods and the Python call to invoke
. Type conversion between Python and Java generally behaves in the way you would expect (a Python numeric variable will be either an int
or double
, a Python list will become a List
, a dictionary will become a Map
, etc.)
The client library comes with a lot of wrapper classes around common Genestack applications, which allow you to use a more convenient syntax to invoke the methods of specific application (see section below).
If you need to make extensive use of an application that does not already have a wrapper class in the client library, you can easily create your own wrapper class in a similar way. Your class simply needs to inherit from Application
and declare an APPLICATION_ID
:
from genestack_client import Application, get_connection
class SignIn(Application):
APPLICATION_ID = 'genestack/signin'
def whoami(self):
return self.invoke('whoami')
connection = get_connection()
signin = SignIn(connection)
print signin.whoami()
Pre-defined Application Wrappers¶
This section illustrates briefly some of the things you can do using the pre-defined application wrappers from the client library. For a more detailed description of these wrappers, have a look at Application Wrappers.
FilesUtil¶
FilesUtil
is a Genestack application used for typical file system operations: finding, linking, removing and sharing files.
First, let’s open a connection:
>>> from genestack_client import get_connection
>>> connection = get_connection()
Then we create a new instance of the class:
>>> from genestack_client import FilesUtil
>>> files_util = FilesUtil(connection)
Then we can create a new empty folder:
>>> folder_accession = files_util.create_folder("My new folder")
>>> print folder_accession
GSF000001
By default, this one was created in the “Created Files” folder of the current user, but we can define any folder as parent:
>>> inner_folder_accession = files_util.create_folder("My inner folder", parent=folder_accession)
>>> print inner_folder_accession
GSF000002
Finding a folder by its name:
>>> folder_accession = files_util.find_file_by_name("My inner folder", file_class=FilesUtil.IFolder)
>>> print folder_accession
GSF000002
See FilesUtil for more methods.
Importers¶
As always, we start by creating a connection:
>>> from genestack_client import get_connection
>>> connection = get_connection()
Then we create a new instance of the app:
>>> from genestack_client import DataImporter
>>> importer = DataImporter(connection)
Then let’s create an experiment in Imported files
:
>>> experiment = importer.create_experiment(name='Sample of paired-end reads from A. fumigatus WGS experiment',
... description='A segment of a paired-end whole genome sequencing experiment of A. fumigatus')
We can add a sequencing assay to the experiment, using local files as sources:
>>> assay = importer.create_sequencing_assay(experiment,
... name='Test paired-end sequencing of A. fumigatus',
... links=['ds1.gz', 'ds2.gz'],
... organism='Aspergillus fumigatus',
... method='genome variation profiling by high throughput sequencing')
Uploading ds1.gz - 100.00%
Uploading ds2.gz - 100.00%
Let’s print the results to know the accession of our files:
>>> print 'Successfully load assay with accession %s to experiment %s' % (assay, experiment)
Successfully load assay with accession GSF000002 to experiment GSF000001
And finally we can start the initialization of the file:
>>> from genestack_client import FileInitializer
>>> initializer = FileInitializer(connection)
>>> initializer.initialize([assay])
>>> print 'Start initialization of %s' % assay
Start initialization of GSF000002
As a result you should have:
- an
Experiment
folder inImported files
;- a
Sequencing assay
file inside the experiment;- two
Raw Upload
files in theUploaded files
folder (these are just plain copies of your raw uploaded files; they can be removed once the sequencing assays have been initialized).
See DataImporter for more info.
TaskLogViewer¶
The Task Log Viewer allows you to access the contents of initialization logs programatically.
Again, we start by opening a connection and instantiating the class:
>>> from genestack_client import get_connection
>>> connection = get_connection()
>>> from genestack_client import TaskLogViewer
>>> log_viewer = TaskLogViewer(connection)
Then we can check the error log of a file:
>>> log_viewer.print_log('GSF000001', log_type=TaskLogViewer.STDERR, follow=False)
This log is empty (perhaps there was no log produced)
See TaskLogViewer for more info.
Sample Scripts¶
This section provides reusable examples of scripts that allow you to perform actions on Genestack that would be tedious to accomplish through the web interface, such as multiple file uploads with custom metadata.
Retrieving task logs¶
This is a simple script to retrieve and print the logs of an initialization task on Genestack.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | #!/usr/bin/env python # -*- coding: utf-8 -*- from __future__ import absolute_import from __future__ import division from __future__ import print_function from __future__ import unicode_literals from future import standard_library standard_library.install_aliases() from builtins import * from genestack_client import TaskLogViewer, get_connection, make_connection_parser # add extra arguments to the Genestack arguments parser for this script parser = make_connection_parser() parser.add_argument('-f', '--follow', action='store_true', help="Follow the logs' output if the task is not done") parser.add_argument('-t', '--type', metavar='<log_type>', choices=[TaskLogViewer.STDERR, TaskLogViewer.STDOUT], default=TaskLogViewer.STDOUT, help="Type of logs to display ('{0}' or '{1}' ; default is '{0}')".format( TaskLogViewer.STDOUT, TaskLogViewer.STDERR)) parser.add_argument('accession', metavar='<accession>', help='Accession of the file for which to display the logs') arguments = parser.parse_args() # connect to Genestack connection = get_connection(arguments) log_viewer = TaskLogViewer(connection) # print task logs log_viewer.print_log(arguments.accession, log_type=arguments.type, follow=arguments.follow) |
The script connects to Genestack and uses the TaskLogViewer
class defined in the client library, to retrieve the
logs for a file whose accession should be passed as a command-line parameter to the script.
TaskLogViewer
is a child class of Application
, and it provides an interface to the Genestack application
genestack/task-log-viewer
which exposes a public Java method (getFileInitializationLog
) to access the
initialization logs of a file.
Uploading multiple files with custom metainfo¶
A typical situation when you want to upload data is that you have some raw sequencing files somewhere (on an FTP site, on your local computer, etc.) and a spreadsheet with information about these files, that you would want to record in Genestack.
So let’s imagine that we have a comma-delimited CSV file with the following format:
name,organism,disease,link
HPX12,Homo sapiens,lung cancer,ftp://my.ftp/raw_data/HPX12_001.fq.gz
HPZ24,Homo sapiens,healthy,ftp://my.ftp/raw_data/HPZ24_001.fq.gz
.........
Now let’s write a Python script that uses the Genestack Client Library to upload these files to a Genestack instance, with the right metainfo. The script will take as input the CSV file, and create a Genestack Experiment with a Sequencing Assay for each row of the CSV file.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 | #!/usr/bin/env python # -*- coding: utf-8 -*- from __future__ import print_function from __future__ import absolute_import from __future__ import division from __future__ import unicode_literals from future import standard_library standard_library.install_aliases() from builtins import * import csv from genestack_client import (BioMetaKeys, DataImporter, GenestackException, Metainfo, get_connection, make_connection_parser, unaligned_reads) # keys that must be supplied in the CSV file MANDATORY_KEYS = ['name', 'link'] # keys that have existing dedicated "Genestack" metainfo key names SPECIAL_KEYS = {'name': Metainfo.NAME, 'organism': BioMetaKeys.ORGANISM, 'method': BioMetaKeys.METHOD, 'sex': BioMetaKeys.SEX, 'cell line': BioMetaKeys.CELL_LINE} # parse script arguments parser = make_connection_parser() parser.add_argument('csv_file', help='Path to the local comma-delimited CSV file containing the data') parser.add_argument('--name', help='Name of the experiment to create in Genestack') parser.add_argument('--description', help='Description of the experiment to display in Genestack') args = parser.parse_args() csv_input = args.csv_file print('Connecting to Genestack...') # get connection and application handlers connection = get_connection(args) importer = DataImporter(connection) # file format of the reads to import file_format = unaligned_reads.compose_format_map(unaligned_reads.Space.BASESPACE, unaligned_reads.Format.PHRED33, unaligned_reads.Type.SINGLE) # create the experiment where we will store the data in Genestack experiment = importer.create_experiment(name=args.name or "Imported experiment", description=args.description or "No description provided") print('Created a new experiment with accession %s...' % experiment) # parse the CSV file with open(csv_input, 'r') as the_file: reader = csv.DictReader(the_file, delimiter=",") field_names = reader.fieldnames # check if mandatory keys are in the CSV file for mandatory_key in MANDATORY_KEYS: if mandatory_key not in field_names: raise GenestackException("The key '%s' must be supplied in the CSV file" % mandatory_key) for file_data in reader: # for each entry, prepare a Metainfo object metainfo = Metainfo() for key in field_names: # 'link' and 'organism' are treated separately, as they are added to the metainfo using specific methods if key == "link": url = file_data[key] metainfo.add_external_link(key=BioMetaKeys.READS_LINK, text="link", url=url, fmt=file_format) elif key == "organism": metainfo.add_string(BioMetaKeys.ORGANISM, file_data[key]) # all the other keys are added as strings else: metainfo_key = SPECIAL_KEYS.get(key.lower(), key) metainfo.add_string(metainfo_key, file_data[key]) # create the sequencing assay on Genestack created_file = importer.create_sequencing_assay(experiment, metainfo=metainfo) print('Created file "%s" (%s)' % (file_data['name'], created_file)) print('All done! Bye now...') |
This script uses many features from the client library:
- we start by adding arguments to the argument parser to process our metadata file
- then we establish a connection to Genestack, and instantiate a
DataImporter
to be able to create our files on Genestack- we create the experiment where we will store our data
- we parse the CSV file using a Python
csv.DictReader
, create a new metainfo object for each row and a corresponding Sequencing Assay with that metainfo
We can then run the script like this:
python make_experiment_from_csv.py --name "My experiment" --description "My description" my_csv_metadata_file.csv
The metainfo of each Sequencing Assay specified inside the CSV file needs to contain at least a name
and valid link
(either to a local or a remote file). By default, the experiment will be created inside the user’s Imported Files
folder on Genestack, since we haven’t specified a folder.
Note
One could easily extend this script to support two files per sample (in the case of paired-end reads).
Importing ENCODE RNA-seq data¶
We can extend the previous script to download data from the ENCODE project . If we select a list of experiments from the ENCODE experiment matrix, we can obtain a link to a CSV file which contains all the metadata and data for the matching assays. For instance, this is the link for all FASTQ files for human RNA-seq experiments .
By browsing this TSV file, we see that it contains the following useful fields:
File accession
Experiment accession
Biosample sex
Biosample organism
Biosample term name
: cell line or tissueBiosample Age
Paired with
: if the sample was paired-end, this field points to the file accession of the other mate
We also notice that file download URLs always follow the template:
https://www.encodeproject.org/files/<FILE_ACCESSION>/@@download/<FILE_ACCESSION>.fastq.gz
We can use this observation to generate the reads URLs from the fields File accession
and possibly Paired with
.
We use the following logic: we read through the metadata file, while keeping a set of all the accessions of the
paired FASTQ files handled so far.
If the current line corresponds to a file that has already been created (second mate of a paired-end
file), then we skip it. Otherwise we prepare a metainfo object for the file and create the Genestack file.
If the row contains a Paired with
accession, we also add the corresponding URL to the current metadata, and add
the accession to the set of FASTQ files seen so far.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 | #!/usr/bin/env python # -*- coding: utf-8 -*- # This script parses ENCODE metadata files such as this one: # https://www.encodeproject.org/metadata/type=Experiment&replicates.library.biosample.donor.organism.scientific_name = Homo + sapiens & files.file_type = fastq & assay_title = RNA - seq / metadata.tsv from __future__ import print_function from __future__ import absolute_import from __future__ import division from __future__ import unicode_literals from future import standard_library standard_library.install_aliases() from builtins import * import csv from genestack_client import (BioMetaKeys, DataImporter, Metainfo, get_connection, make_connection_parser) # ENCODE data FILE_ACCESSION = "File accession" PAIRED_ACCESSION = "Paired with" # dictionary: ENCODE file column name -> Genestack metainfo key (None when identical) VALID_FIELDS = { FILE_ACCESSION: Metainfo.NAME, "Experiment accession": None, "Biosample sex": BioMetaKeys.SEX, "Biosample organism": BioMetaKeys.ORGANISM, "Biosample Age": None, "Biosample term name": BioMetaKeys.CELL_TYPE, "Platform": BioMetaKeys.PLATFORM } ENCODE_URL_PATTERN = "https://www.encodeproject.org/files/{0}/@@download/{0}.fastq.gz" # parse script arguments parser = make_connection_parser() parser.add_argument('tsv_file', metavar='<tsv_file>', help='Path to the local tab-delimited file containing the data') args = parser.parse_args() tsv_input = args.tsv_file print('Connecting to Genestack...') # get connection and application handlers connection = get_connection(args) importer = DataImporter(connection) # create the experiment where we will store the data in Genestack experiment = importer.create_experiment(name="ENCODE Human RNA-seq", description="Human RNA-seq assays from ENCODE") print('Created a new experiment with accession %s...' % experiment) created_pairs = set() # parse the CSV file with open(tsv_input, 'r') as the_file: reader = csv.DictReader(the_file, dialect='excel_tab') field_names = reader.fieldnames for file_data in reader: # skip the entry if the file was already included in a previously created paired-end assay if file_data[FILE_ACCESSION] in created_pairs: continue # for each entry, prepare a Metainfo object metainfo = Metainfo() for key in VALID_FIELDS.keys(): metainfo.add_string(VALID_FIELDS.get(key) or key, file_data[key]) metainfo.add_external_link(BioMetaKeys.READS_LINK, ENCODE_URL_PATTERN.format(file_data[FILE_ACCESSION])) if file_data.get(PAIRED_ACCESSION): # add URL of second mate if the reads are paired-end metainfo.add_string(FILE_ACCESSION, PAIRED_ACCESSION) metainfo.add_external_link(BioMetaKeys.READS_LINK, ENCODE_URL_PATTERN.format(file_data[PAIRED_ACCESSION])) created_pairs.add(file_data[PAIRED_ACCESSION]) # create the sequencing assay on Genestack created_file = importer.create_sequencing_assay(experiment, metainfo=metainfo) print('Created file "%s" (%s)' % (file_data[FILE_ACCESSION], created_file)) print('All done!') |
Editing the metainfo of existing files¶
In the real world, data and metadata live in different places and you may not have access to both of them at the same time.
Sometimes, you may be in a situation where you have uploaded data on Genestack and you are only provided with metadata later on.
The following script takes as input a comma-delimited CSV
file containing metadata and adds that metadata to existing files on Genestack. The files should be located in a
specific folder, and the correspondence between records in the CSV file and the remote files is determined by the
name of the remote files. The name of the files should be stored in a specific column of the CSV file,
whose name must be supplied to the script as local-key
parameter.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 | #!/usr/bin/env python # -*- coding: utf-8 -*- from __future__ import print_function from __future__ import absolute_import from __future__ import division from __future__ import unicode_literals from future import standard_library standard_library.install_aliases() from builtins import * import csv from genestack_client import (BioMetaKeys, FilesUtil, GenestackException, Metainfo, get_connection, make_connection_parser) # keys that have existing dedicated "Genestack" metainfo key names SPECIAL_KEYS = {'name': Metainfo.NAME, 'organism': BioMetaKeys.ORGANISM, 'method': BioMetaKeys.METHOD, 'sex': BioMetaKeys.SEX, 'gender': BioMetaKeys.SEX, 'age': BioMetaKeys.AGE, 'cell line': BioMetaKeys.CELL_LINE, 'accession': Metainfo.ACCESSION} # Logic to parse 'boolean-like' values as metainfo booleans TRUE_VALUES = {'true', 'yes', 'y'} FALSE_VALUES = {'false', 'no', 'n'} def parse_as_boolean(s): if s.lower().strip() in TRUE_VALUES: return True elif s.lower().strip() in FALSE_VALUES: return False return None if __name__ == "__main__": # parse script arguments parser = make_connection_parser() parser.add_argument('csv_file', help='Path to the local comma-delimited CSV file containing the data') parser.add_argument('local_key', help='Name of the local key to match CSV records and Genestack files names') parser.add_argument('folder', help='Accession of the Genestack folder containing the files') args = parser.parse_args() csv_input = args.csv_file local_key = args.local_key print('Connecting to Genestack...') # get connection and application handlers connection = get_connection(args) files_util = FilesUtil(connection) print('Collecting files...') files = files_util.get_file_children(args.folder) print('Found %d files. Collecting metadata...' % len(files)) infos = files_util.get_infos(files) identifier_map = {info['name']: info['accession'] for info in infos} # parse the CSV file with open(csv_input, 'r') as the_file: reader = csv.DictReader(the_file, delimiter=",") field_names = reader.fieldnames if args.local_key not in field_names: raise GenestackException("Error: the local key %s is not present in the supplied CSV file" % args.local_key) for file_data in reader: # find the corresponding file local_identifier = file_data[local_key] remote_file = identifier_map.get(local_identifier) if not remote_file: print('Warning: no match found for file name "%s"' % local_identifier) continue # prepare a Metainfo object metainfo = Metainfo() for key in field_names: # key parsing logic value = file_data[key] if value == "" or value is None: continue if key == args.local_key: continue if key == "organism": metainfo.add_string(BioMetaKeys.ORGANISM, value) else: metainfo_key = SPECIAL_KEYS.get(key.lower(), key) if parse_as_boolean(value) is not None: metainfo.add_boolean(metainfo_key, parse_as_boolean(value)) else: metainfo.add_string(metainfo_key, value) # edit the metadata on Genestack files_util.add_metainfo_values(remote_file, metainfo) print("Edited metainfo for '%s' (%s)" % (local_identifier, remote_file)) print('All done!') |
For instance, imagine we have the following CSV file:
file_name,organism,disease
Patient 1,Homo sapiens,Asthma
Patient 2,Homo sapiens,Healthy
....
The script is then called with the following syntax:
python add_metainfo_from_table.py my_csv_file.csv file_name GSF12345
Organising files into folders based on their metainfo¶
Keeping your files organised is a difficult thing. A common thing to do when you have many files belonging to the same project is to group them into folders based on their application. The following script takes as input a folder of files and organises these files into subfolders, such that all files created with the same application will go into the same subfolder. We will also provide an option to unlink the files from their folder of origin. The script illustrates the use of the FilesUtil class to perform typical file manipulation operations.
The script can be called with the following syntax:
python group_files_into_folders.py [--move-files] <source_folder_accession>
You can easily adapt the script to group files based on some other criterion from their metainfo, like their organism, their creation date, or in fact any metainfo value.
Running a data analysis pipeline¶
Generally, if you want to run multiple files through the same analysis pipeline, the easiest way to do it is using the Data Flow Editor through the web interface. This tool is powerful enough to cover most of the use cases you could think of. However, some complex pipelines are not supported by the Data Flow Editor. In that case, you can write your own script to generate all the files on Genestack programmatically.
Our script will take as input the Genestack accession of a folder containing Unaligned Reads files. It will then produce for each file three downstream files: a Mapped Reads file produced with Bowtie (possibly using a custom reference genome), a Mapped Reads QC Report, and a Variant Calling file.
To do this, we define a BatchFilesCreator
class with some simple methods to create multiple files from a CLA. We also create a BowtieBatchFilesCreator
inheriting from this class, that has additional logic to change the reference genome. You can easily adapt this logic to your own pipeline.
Moreover, we will create Variant Calling files with non-default parameter (specifically, we only want to call SNPs and not indels). To do this, we use the method CLApplication.change_command_line_arguments
, which takes as input the accession of the CLA file, and the list of command-line strings to use.
Since the syntax of the command-line strings can vary from one CLA to another, the easiest way to know what command-line strings you should specify is to first create a file with the corresponding CLA on Genestack and change the parameters to the ones you want using the user interface. Then, look at the metainfo of the file (for example by clicking the name of the file inside the CLA page and choosing “View metainfo”). The metainfo field “Parameters” will store the list of command-line strings that you want (strings are separated by commas in the Metainfo Viewer dialog).
Note
File references (like reference genomes) are not specified in the parameters strings. They are stored in separate metainfo fields. The code for BowtieBatchFilesCreator
illustrates how to use a custom reference genome for the Bowtie-based Unspliced Mapping CLA.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 | #!/usr/bin/env python # -*- coding: utf-8 -*- from __future__ import print_function from __future__ import absolute_import from __future__ import division from __future__ import unicode_literals from future import standard_library standard_library.install_aliases() from builtins import * from builtins import object from genestack_client import (AlignedReadsQC, BioMetaKeys, BowtieApplication, FilesUtil, SpecialFolders, VariationCaller2Application, get_connection, make_connection_parser) # base class to create multiple files with a CLA class BatchFilesCreator(object): def __init__(self, cla, base_folder, friendly_name, custom_args=None): """ Constructor of the general batch files creator, to create multiple files from a CLA. :param cla: a ``CLApplication`` object, wrapper for the corresponding CLA :param base_folder: accession of the base folder where the pipeline files will be organised into subfolders :param friendly_name: user-friendly name of the files produced by the app ; used in the on-screen statements and in the name of the project subfolders :param custom_args: list of custom command-line argument strings for the files. Default is ``None`` """ self._cla = cla self._files_util = FilesUtil(cla.connection) self._base_folder = base_folder self._friendly_name = friendly_name self._custom_args = custom_args def create_files(self, sources): print('Creating %s files...' % self._friendly_name) output_folder = self._files_util.create_folder(self._friendly_name, parent=self._base_folder) output_files = [] for i, source in enumerate(sources, 1): output = self._create_output_file(source) self._files_util.link_file(output, output_folder) print('Created %s file %s (%d/%d)' % (self._friendly_name, output, i, len(output))) output_files.append(output) return output_files # this method can be overridden in child classes to allow for more complex file creation logic def _create_output_file(self, source): output = self._cla.create_file(source) if self._custom_args: self._cla.change_command_line_arguments(output, self._custom_args) return output # special class for Bowtie to replace the default reference genome class BowtieBatchFilesCreator(BatchFilesCreator): def __init__(self, cla, base_folder, friendly_name, custom_args=None, ref_genome=None): BatchFilesCreator.__init__(self, cla, base_folder, friendly_name, custom_args) self._ref_genome = ref_genome def _create_output_file(self, source): output = BatchFilesCreator._create_output_file(self, source) # replace reference genome if self._ref_genome: self._files_util.remove_metainfo_value([output], BioMetaKeys.REFERENCE_GENOME) self._cla.replace_file_reference(output, BioMetaKeys.REFERENCE_GENOME, None, self._ref_genome) return output # These CLA arguments correspond to all default options except the type of variants to look for (SNPs only). # The easiest way to know the syntax of the command-line arguments for a specific app is to look at the "Parameters" # metainfo field of a CLA file on Genestack that has the parameters you want. VC_ARGUMENTS_NO_INDELS = ["--skip-indels -d 250 -m 1 -E --BCF --output-tags DP,DV,DP4,SP", "", "--skip-variants indels --multiallelic-caller --variants-only"] if __name__ == "__main__": # parse script arguments parser = make_connection_parser() parser.add_argument('raw_reads_folder', help='Genestack accession of the folder containing the raw reads files to process') parser.add_argument('--name', default="New Project", help='Name of the Genestack folder where to put the output files') parser.add_argument('--ref-genome', help='Accession of the reference genome to use for the mapping step') args = parser.parse_args() project_name = args.name print('Connecting to Genestack...') # get connection and create output folder connection = get_connection(args) files_util = FilesUtil(connection) created_files_folder = files_util.get_special_folder(SpecialFolders.CREATED) project_folder = files_util.create_folder(project_name, parent=created_files_folder) # create application wrappers and batch files creators bowtie_app = BowtieApplication(connection) mapped_qc_app = AlignedReadsQC(connection) variant_calling_app = VariationCaller2Application(connection) bowtie_creator = BowtieBatchFilesCreator(bowtie_app, project_folder, "Mapped Reads", ref_genome=args.ref_genome) mapped_qc_creator = BatchFilesCreator(mapped_qc_app, project_folder, "Mapped Reads QC") vc_creator = BatchFilesCreator(variant_calling_app, project_folder, "Variants", custom_args=VC_ARGUMENTS_NO_INDELS) # collect files print('Collecting raw reads...') raw_reads = files_util.get_file_children(args.raw_reads_folder) files_count = len(raw_reads) print('Found %d files to process' % files_count) # Create pipeline files mapped_reads = bowtie_creator.create_files(raw_reads) mapped_reads_qcs = mapped_qc_creator.create_files(mapped_reads) vc_creator.create_files(mapped_reads) print('All done! Your files are in the folder %s' % project_folder) |
The script can then be called from a terminal with the following syntax:
python run_vc_pipeline.py -u <user_alias> --ref-genome <custom_ref_genome_accession> --name "My project name" <raw_reads_folder>
Note that the folder supplied as input to the script should only contain Unaligned Reads files.
API Reference¶
This is the complete API reference of the Genestack Client Library. For a more gentle introduction, you can read the Getting Started with the Genestack Python Client Library section.
Application Wrappers¶
Application¶
-
class
genestack_client.
Application
(connection, application_id=None)¶ Bases:
object
Create a new application instance for the given connection. The connection must be logged in to call the application’s methods. The application ID can be specified either as an argument to the class constructor or by overriding the
APPLICATION_ID
attribute in a child class.-
get_response
(method, params=None, trace=True)¶ Invoke one of the application’s public Java methods and return Response object. Allow to access to logs and traces in code, if you need only result use
invoke()
Parameters: Returns: Response object
Return type: Response
-
DataImporter¶
-
class
genestack_client.
DataImporter
(connection)¶ Bases:
object
A class used to import files to a Genestack instance. If no
parent
is specified, the files are created in the special folderImported files
Required and recommended values can be set by arguments directly or passed inside a
Metainfo
:create_bed(name="Bed", url="some/url") # is equivalent to: metainfo = Metainfo() metainfo.add_string(Metainfo.NAME, "Bed") metainfo.add_external_link(Metainfo.DATA_LINK, "some/url", text="link name") create_bed(metainfo=metainfo)
However, do not pass the same value both through the arguments and inside a metainfo object.
Genestack accepts both compressed and uncompressed files. If the protocol is not specified,
file://
will be used. Special characters should be escaped excepts3://
. Links to Amazon S3 storage should be formatted as in s3cmd.Supported protocols:
file://
:test.txt.gz
file://test.txt
file%20name.gz
ftp://
ftp://server.com/file.txt
http://
https://
http://server.com/file.txt
ascp://
ascp://<user>@<server>:file.txt
s3://
s3://bucket/file.gz
s3://bucket/file name.gz
If you are uploading a local file, a
Raw Upload
intermediary file will be created on the platform.-
AFFYMETRIX_ANNOTATION
= 'affymetrixMicroarrayAnnotation'¶ Affymetrix microarray annotation type
-
AGILENT_ANNOTATION
= 'agilentMicroarrayAnnotation'¶ Agilent microarray annotation type
-
INFINIUM_ANNOTATION
= 'methylationArrayAnnotation'¶ Infinium microarray annotation type
-
MICROARRAY_ANNOTATION_TYPES
= ('agilentMicroarrayAnnotation', 'affymetrixMicroarrayAnnotation', 'TSVMicroarrayAnnotation', 'methylationArrayAnnotation')¶ Supported microarray annotation types
-
TSV_ANNOTATION
= 'TSVMicroarrayAnnotation'¶ TSV (GenePix etc) microarray annotation type
-
create_bam
(parent=None, name=None, url=None, organism=None, strain=None, reference_genome=None, metainfo=None)¶ Create a Genestack Aligned Reads file from a local or remote BAM file.
name
,url
andorganism
are required fields. They can be specified through the arguments or via aMetainfo
instance.Parameters: - parent (str) – accession of parent folder
(if not provided, files will be created in the
Imported files
folder) - name (str) – name of the file
- url – URL of a BAM file; the index will be created at initialization
- organism (str) – organism
- strain – strain
- reference_genome (str) – reference genome accession
- metainfo (Metainfo) – metainfo object
Returns: file accession
Return type: - parent (str) – accession of parent folder
(if not provided, files will be created in the
-
create_bed
(parent=None, name=None, reference_genome=None, url=None, metainfo=None)¶ Create a Genestack BED Track from a local or remote BED file.
name
andurl
are mandatory fields. They can be specified through the arguments or via aMetainfo
instance.Parameters: Returns: file accession
Return type:
-
create_dbnsfp
(parent=None, url=None, name=None, organism=None, metainfo=None)¶ Create a Genestack Variation Database file.
name
andurl
are required fields. They can be specified through the arguments or via aMetainfo
instance.Parameters: Returns: file accession
Return type:
-
create_dictionary
(parent=None, name=None, url=None, term_type=None, metainfo=None, parent_dictionary=None)¶ Create a Dictionary file from a local or remote file. owl, obo, and csv formats are supported.
name
andurl
are required fields. They can be specified through the arguments or via aMetainfo
instance.Parameters: - parent (str) – accession of parent folder
(if not provided, files will be created in the
Imported files
folder) - name (str) – name of the file
- url (str) – URL of a file
- term_type (str) – dictionary term type
- metainfo (Metainfo) – metainfo object
- parent_dictionary (str) – accession of parent dictionary
Returns: file accession
Return type: - parent (str) – accession of parent folder
(if not provided, files will be created in the
-
create_expression_levels
(parent=None, unit=None, name=None, url=None, metainfo=None)¶ Create a Expression Levels file from a local or remote expression levels file.
name
,url
andunit
are required fields. They can be specified through the arguments or via aMetainfo
instance.Parameters: Returns: file accession
Return type:
-
create_gene_expression_signature
(parent=None, name=None, url=None, organism=None, metainfo=None)¶ Create a Gene Expression Signature file from a local or remote gene expression signature file.
name
,url
andorganism
are required fields. They can be specified through the arguments or via aMetainfo
instance.Parameters: Returns: file accession
Return type:
-
create_gene_list
(parent=None, name=None, url=None, organism=None, metainfo=None)¶ Create a Gene List file from a local or remote gene list file.
name
,url
andorganism
are required fields. They can be specified through the arguments or via aMetainfo
instance.Parameters: Returns: file accession
Return type:
-
create_genome_annotation
(parent=None, url=None, name=None, organism=None, reference_genome=None, strain=None, metainfo=None)¶ Create a Genestack Genome Annotation file from a local or remote file.
name
andurl
are required fields. They can be specified through the arguments or via aMetainfo
instance.Parameters: - parent (str) – accession of parent folder
(if not provided, files will be created in the
Imported files
folder) - url (str) – URL or local path
- name (str) – name of the file
- organism (str) – organism
- reference_genome (str) – reference genome accession
- strain (str) – strain
- metainfo (Metainfo) – metainfo object
Returns: file accession
Return type: - parent (str) – accession of parent folder
(if not provided, files will be created in the
-
create_infinium_microarray_data
(parent, name=None, urls=None, method=None, metainfo=None)¶ Create a Genestack Infinium Microarrays Data inside a folder. We can’t use create_microarray_data method because ‘microarrayData’ importer can have only one source file, while infinium assay has two. So we invoke ‘infinium MicroarrayData’ importer with two links for BioMetaKeys.DATA_LINK key in metainfo.
Infinum microarrays available only for humans so we have no ‘organism’ key in arguments.
Parameters: Returns: file accession
Return type:
-
create_mapped_reads_count
(parent=None, name=None, url=None, reference_genome=None, metainfo=None)¶ Create a Mapped Reads Count file from a local or remote mapped reads count file.
name
andurl
are required fields. They can be specified through the arguments or via aMetainfo
instance.Parameters: Returns: file accession
Return type:
-
create_microarray_annotation
(annotation_type, parent=None, name=None, url=None, metainfo=None)¶ Create a Dictionary file from a local or remote microarray annotation file.
name
andurl
are required fields. They can be specified through the arguments or via aMetainfo
instance.Parameters: - annotation_type (str) – type of annotation being loaded,
an element of
MICROARRAY_ANNOTATION_TYPES
- parent (str) – accession of parent folder
(if not provided, files will be created in the
Imported files
folder) - name (str) – name of the file
- url – URL of a file
- metainfo (Metainfo) – metainfo object
Returns: file accession
Return type: - annotation_type (str) – type of annotation being loaded,
an element of
-
create_microarray_data
(parent, name=None, urls=None, method=None, organism=None, metainfo=None)¶ Create a Genestack Microarray Data inside an folder.
name
andurls
are required fields. They can be specified through the arguments or via aMetainfo
instance.Parameters: Returns: file accession
Return type:
-
create_reference_genome
(parent=None, name=None, description='', sequence_urls=None, annotation_url=None, organism=None, assembly=None, release=None, strain=None, metainfo=None)¶ Create a Genestack Reference Genome from a collection of local or remote FASTA sequence files, and a GTF or GFF annotation file.
name
,sequence_urls
,organism
andannotation_url
are required fields. They can be specified through the arguments or via aMetainfo
instance.Parameters: - parent (str) – accession of parent folder
(if not provided, files will be created in the
Imported files
folder) - name (str) – name of the file
- description (str) – experiment description
- sequence_urls (list) – list urls or local path to sequencing files.
- annotation_url (str) – url to annotation file
- organism (str) – organism
- assembly (str) – assembly
- release (str) – release
- strain (str) – strain
- metainfo (Metainfo) – metainfo object
Returns: - parent (str) – accession of parent folder
(if not provided, files will be created in the
-
create_report_file
(parent=None, name=None, urls=None, metainfo=None)¶ Create a Genestack Report File from a local or remote data file.
name
andurls
are required fields. They can be specified through the arguments or via aMetainfo
instance.Parameters: Returns: file accession
Return type:
-
create_unaligned_read
(parent=None, name=None, urls=None, method=None, organism=None, metainfo=None)¶ Create a Genestack Unaligned Reads file from one or several local or remote files. Most common file formats encoding sequencing reads with quality scores are accepted (FASTQ 33/64, SRA, FASTA+QUAL, SFF, FAST5).
name
andurls
are required fields. They can be specified through the arguments or via aMetainfo
instance.Parameters: Returns: file accession
Return type:
-
create_vcf
(parent=None, name=None, reference_genome=None, url=None, metainfo=None)¶ Create a Genestack Variants file from a local or remote VCF file.
name
andurl
are required fields. They can be specified through the arguments or via aMetainfo
instance.Parameters: Returns: file accession
Return type:
-
create_wig
(parent=None, name=None, reference_genome=None, url=None, metainfo=None)¶ Create a Genestack Wiggle Track from a local or remote WIG file.
name
andurl
are required fields. They can be specified through the arguments or via aMetainfo
instance.Parameters: Returns: file accession
Return type:
FilesUtil¶
-
class
genestack_client.
FilesUtil
(connection, application_id=None)¶ Bases:
genestack_client.Application
An application to perform file management operations on Genestack.
-
add_checksums
(app_file, expected_checksums)¶ Add expected MD5 checksum to the metainfo of a CLA file. Expected checksums are calculated in the following way:
- The number of checksums equals number of entries in storage. For instance, a Reference Genome file has 2 entries (annotation and sequence files).
- If there are multiple files in one entry, they will be concatenated in the same order
as they were
PUT
to storage by the initialization script. - If a file is marked for testing, then after initialization its metainfo will contain both expected and actual checksum values.
Parameters: - app_file – accession of application file
- expected_checksums – collection of MD5 checksums
Returns: None
-
add_metainfo_string_value
(accession_list, key, value)¶ Add a string value to the metainfo of specified files.
Parameters: Return type:
-
add_metainfo_values
(accession, metainfo, skip_existing_keys=True, replace_existing_keys=False)¶ Add metainfo to a specified file. By default, metainfo keys that are already present in the file will be skipped.
Parameters: - accession – accession of the file to update
- metainfo (Metainfo) – metainfo object containing the metainfo to add
- skip_existing_keys (bool) – ignore metainfo keys that are already present in the file’s metainfo
(default:
True
) - replace_existing_keys (bool) – replace the existing metainfo value for the metainfo keys
that are already present in the file’s metainfo (default:
False
)
Return type:
-
collect_initializable_files_in_container
(accession)¶ Recursively search for all initialisable file in container.
Parameters: accession (str) – accession of container Returns: list of accessions Return type: list
-
collect_metainfos
(accessions)¶ Get complete metainfo of a list of files.
Parameters: accessions (list[str]) – list of accessions Returns: list of metainfo objects Return type: list[Metainfo]
-
count_file_children
(container_accession)¶ Count children of a container (not recursive). :param container_accession: accession of container :type container_accession: str :return: number of children :rtype int:
-
create_folder
(name, parent=None, description=None, metainfo=None)¶ Create a folder.
Parameters: - name (str) – name of the folder
- parent (str) – if not specified, create folder in the user’s private folder
- description (str) – description of the folder (goes into the metainfo)
- metainfo (Metainfo) – additional
Metainfo
. Description and accession should be specified either via arguments or in a metainfo object (but not in both).
Returns: accession of created folder
-
find_file_by_name
(name, parent=None, file_class='com.genestack.api.files.IFile')¶ Finds file with specified name (ignore case!) and type. If no file is found
None
is returned. If more than one file is found the first one is returned. If the parent container is not found, the corresponding exceptions are thrown.Parameters: Returns: file accession
Return type:
-
find_files
(file_filter, sort_order='DEFAULT', ascending=False, offset=0, limit=2000)¶ Search for files with
file_filter
and return dictionary with two key/value pairs:'total'
: total number (int
) of files matching the query'result'
: list of file info dictionaries for subset of matching files- (from
offset
tooffset+limit
). See the documentation ofget_infos()
for the structure of these objects.
Parameters: - file_filter (FileFilter) – file filter
- sort_order (str) – sorting order for the results,
see
SortOrder
- ascending (bool) – should the results be in ascending order? (default: False)
- offset (int) – search offset (default: 0, cannot be negative)
- limit (int) – maximum number of results to return (max and default: 100)
Returns: a dictionary with search response
Return type:
-
find_or_create_folder
(name, parent=None)¶ Return the folder accession if it already exists, and create it otherwise. If more than one folder is found the first one is returned.
Parameters: Returns: accession of folder
Return type:
-
find_reference_genome
(organism, assembly, release)¶ Returns the accession of the reference genome with the specified parameters:
organism
,assembly
,release
. If more than one or no genome is found, the corresponding exceptions are thrown.Parameters: Returns: accession
Return type: Raises: GenestackServerException
if more than one genome, or no genome is found
-
get_file_children
(container_accession)¶ Return accessions of files linked to current container.
Parameters: container_accession (str) – accession of container Returns: list of accessions Return type: list
-
get_folder
(parent, *names, **kwargs)¶ Find a subfolder (by name) in a folder passed as an accession, returning accession of that subfolder. If several names are provided, treat them as a path components for the sub-sub-…-folder down the folder hierarchy, returning accession of that deepmost folder:
fu.get_folder('GS777', 'RNASeq')
looks for subfolder with name “RNASeq” in folder with accession “GS777”, and returns accession of that “RNASeq” subfolder;fu.get_folder('GS777', 'Experiments', 'RNASeq')
looks for subfolder with name “Experiments” in a folder with accession “GS777”, then looks for “RNASeq” in “Experiments”, and returns the accession of “RNASeq”.
If
create=True
is passed as a kwarg, all the folders innames
hierarchy will be created (otherwiseGenestackException
is raised).Parameters: Returns: accession of found (or created) subfolder
Return type: Raises: GenestackException – if no name is passed, or folder with required name is not found (and shouldn’t be created)
-
get_home_folder
()¶ Return the accession of the current user’s home folder.
Returns: accession of home folder Return type: str
-
get_infos
(accession_list)¶ Returns a list of dictionaries with information about each of the specified files. This will return an error if any of the accessions is not valid. The order of the returned list is the same as the one of the accessions list.
The information dictionaries have the following structure:
accession
owner
name
isDataset
application
- id
initializationStatus
- isError
- id
permissionsByGroup (the value for each key is a dictionary with group accessions as keys)
- groupNames
- ids
time
- fileCreation
- initializationQueued
- initializationStart
- initializationEnd
- fileCreation
- lastMetainfoModification
Parameters: accession_list (list) – list of valid accessions. Returns: list of file info dictionaries. Return type: list[dict[str, object]]
-
get_metainfo_values_as_string_list
(accessions_list, keys_list=None)¶ Retrieve metainfo values as lists of strings for specific files and metainfo keys. The function returns a dictionary.
Parameters: - accessions_list – accessions of the files to retrieve
- keys_list – metainfo keys to retrieve (if
None
, all non-technical keys are retrieved for each file)
Type: accessions: list[str]
Type: keys: list[str]|None
Returns: a two-level dictionary with the following structure: accession -> key -> value list
Return type:
-
get_metainfo_values_as_strings
(accessions_list, keys_list=None)¶ Retrieve metainfo values as strings for specific files and metainfo keys. Metainfo value lists are concatenated to string using ‘, ‘ as delimiter. The function returns a dictionary.
Parameters: - accessions_list – accessions of the files to retrieve
- keys_list – metainfo keys to retrieve (if
None
, all non-technical keys are retrieved for each file)
Type: accessions: list[str]
Type: keys: list[str]|None
Returns: a two-level dictionary with the following structure: accession -> key -> value
Return type:
-
get_public_folder
()¶ Return the accession of the
Public
folder on the current Genestack instance.Returns: accession of Public
folderReturn type: str
-
get_special_folder
(name)¶ Return the accession of a special folder.
Available special folders are described in
SpecialFolders
Parameters: name (str) – special folder name Returns: accession Return type: str Raises: GenestackException: if folder name is unknown
-
link_file
(accession, parent)¶ Link a file to a folder.
Parameters: Return type:
-
link_files
(children_to_parents_dict)¶ Link files to containers.
Parameters: children_to_parents_dict – dictionary where keys are accessions of the files to link, and values are lists of accessions of the containers to link into Type: dict Return type: None
-
mark_for_tests
(app_file)¶ Mark Genestack file as test one by adding corresponding key to metainfo. Test file will calculate md5 checksums of its encapsulated physical files during initialization.
Parameters: app_file – accession of file Returns: None
-
mark_obsolete
(accession)¶ Mark Genestack file as obsolete one by adding corresponding key to metainfo.
Parameters: accession – accession of file Returns: None
-
remove_metainfo_value
(accession_list, key)¶ Delete a key from the metainfo of specified files.
Parameters: Return type:
-
rename_file
(accession, name)¶ Rename a file.
Parameters: Return type:
-
replace_metainfo_string_value
(accession_list, key, value)¶ Replace a string value in the metainfo of specified files.
Parameters: Return type:
-
replace_metainfo_value
(accession_list, key, value)¶ Replace a value in the metainfo of specified files.
Parameters: - accession_list (list[str]) – list of files to be updated
- key (str) – metainfo key
- value (MetainfoScalarValue) – metainfo value
Return type:
-
unlink_file
(accession, parent)¶ Unlink a file from a folder.
Parameters: Return type:
-
GroupsUtil¶
-
class
genestack_client.
GroupsUtil
(connection, application_id=None)¶ Bases:
genestack_client.Application
FileInitializer¶
-
class
genestack_client.
FileInitializer
(connection, application_id=None)¶ Bases:
genestack_client.Application
Wrapper class around the File Initializer application.
-
initialize
(accessions)¶ Start initialization for the specified accessions. Missed accession and initialization failures are ignored silently.
Parameters: accessions (list[str]) – list of accessions Return type: None
-
load_info
(accessions)¶ Takes as input a list of file accessions and returns a list of dictionaries (one for each accession) with the following structure:
- accession: (str) file accession
- name: (str) file name if the file exists
- status: (str) initialization status
The possible values for
status
are:- NoSuchFile
- NotApplicable
- NotStarted
- InProgress
- Complete
- Failed
Parameters: accessions (list[str]) – list of accessions Returns: list of dictionaries Return type: list
-
TaskLogViewer¶
-
class
genestack_client.
TaskLogViewer
(connection, application_id=None)¶ Bases:
genestack_client.Application
A wrapper class for the Task Logs Viewer application. This application allows you to access the initialization logs of a file.
-
print_log
(accession, log_type=None, follow=True, offset=0)¶ Print a file’s latest task initialization logs to stdout. Raises an exception if the file is not found or has no associated initialization task. By default the output stdout log is shown. You can also view the stderr error log.
follow=True
will wait until initialization is finished. Incoming logs will be printed to the console.Parameters: - accession – file accession
- log_type – stdout or stderr
- follow – if enabled, wait and display new lines as they appear (similar to
tail --follow
) - offset – offset from which to start retrieving the logs. Set to -1 if you want to start retrieving logs from the latest chunk.
-
DatasetsUtil¶
-
class
genestack_client.
DatasetsUtil
(connection, application_id=None)¶ -
APPLICATION_ID
= 'genestack/datasetsUtil'¶
-
BATCH_SIZE
= 100¶
-
add_dataset_children
(accession, children)¶ Add new files to a dataset.
Parameters:
-
add_file_to_datasets
(file_accession, dataset_accessions)¶ Add given file to several datasets.
Parameters:
-
create_dataset
(name, dataset_type, children, parent=None, dataset_metainfo=None)¶ Create a dataset.
Parameters: - name (str) – name of the dataset
- dataset_type (str) – type of the dataset (children files interface name, must extend IDataFile)
- children (list[str]) – list of children accessions
- parent (str) – folder for the new dataset, ‘My datasets’ if not specified
- dataset_metainfo (Metainfo) – metainfo of the created dataset
Returns: dataset accession
Return type:
-
create_empty_dataset
(name, dataset_type, parent=None, dataset_metainfo=None)¶ Create an empty dataset.
Parameters: Returns: dataset accession
Return type:
-
create_subset
(accession, children, parent=None)¶ Create a subset from dataset’s children.
Parameters: Returns: accession of the created subset
Return type:
-
get_dataset_children
(accession)¶ Return generator over children accessions of the provided dataset.
Parameters: accession (str) – dataset accession Returns: generator over dataset’s children accessions
-
get_dataset_size
(accession)¶ Get number of files in dataset.
Parameters: accession (str) – dataset accession Returns: number of files in dataset Return type: int
-
merge_datasets
(datasets, parent=None)¶ Create a new dataset from the given datasets.
Parameters: Returns: accession of the created dataset
Return type:
-
SampleLinker (Beta)¶
-
class
genestack_client.samples.
SampleLinker
(connection, application_id=None)¶ Application for linking data files to samples.
It operates with the following concepts:
- A study is a dataset (collection) of samples.
- A sample is a file that contains common metainfo that can be attached to files with data.
- When linking data files and samples, data files must be uploaded and put into an upload dataset. This dataset simplifies operations on these files in Genestack and provides data versioning. Upload dataset is linked to the study.
- When uploading files to the upload dataset, they are put inside this dataset and initialized. Each file’s metainfo will contain a link to the according sample.
A typical workflow might look like this:
- A study with samples is created via the Study Design application inside Genestack.
- Study number is generated and exported via the Study Design API.
- An upload dataset is created and linked to the provided study.
- Files with data are uploaded, linked to samples and initialized via the ‘import_data’ method.
- If some data files are considered corrupted or invalid, they can be removed using the ‘unlink_data’ method.
- When all required data files are uploaded, data can be made visible to others by releasing the upload dataset using the ‘release’ method.
NOTE: This API is currently in Beta stage and is a subject to change, so no backwards compatibility is guaranteed at this point.
-
APPLICATION_ID
= 'genestack/sample-linker'¶
-
create_upload_dataset
(study_number, file_type, **kwargs)¶ Create a dataset that will later be used to hold uploaded data files.
This method accepts additional parameters required for creating files inside Genestack. These parameters depend on the file type:
- “ExpressionLevels”: no additional parameters.
Example:
sample_linker.create_upload_dataset( study_number=1, file_type='ExpressionLevels' )
Supported file types:
- “ExpressionLevels”: expression data
- “MappedReadCounts”: deprecated, use “ExpressionLevels” instead
Parameters: Returns: accession of the created dataset
Return type:
-
import_data
(samples, upload_dataset_accession)¶ Create data files inside the upload dataset and link them to the specified samples.
Created files are initialized upon creation.
NOTE: This method can only handle 100 files at a time, so in case of uploading more files than that they must be uploaded in batches of this size.
Example:
sample_linker.import_data( samples={ 'sampleId1': ['http://data_url1', 'http://data_url2'], 'sampleId2': ['http://more.data'] }, upload_dataset_accession='GSF000123' )
This call will return the following dictionary:
{ 'sampleId1': ['GSF0001', 'GSF0002'], 'sampleId2': ['GSF0003'] }
Parameters: Returns: mapping from sample id to a list of accessions of the created data files.
Return type:
-
release
(group_name, upload_dataset_accession)¶ Release the provided dataset. Releasing a dataset means that all data files are ready and can be shared with the outer world.
This method is idempotent and can be run multiple times in case of errors.
Parameters:
-
unlink_data
(file_accessions, upload_dataset_accession)¶ Remove uploaded data files from the given dataset and unlink them from their samples. Links to samples are always removed but actual files may not be removed from the system.
Removing a file that isn’t present in the dataset is a no-op and will not throw an exception.
Parameters:
Command-Line Applications¶
CLApplication¶
-
class
genestack_client.
CLApplication
(connection, application_id=None)¶ Bases:
genestack_client.Application
Base class to interact with Genestack command-line applications. The
APPLICATION_ID
is mandatory. You can either pass it as an argument to the class constructor or override it in a child class. Source files and parameters are application-specific.-
change_command_line_arguments
(accession, params)¶ Change the command-line arguments strings in a file’s metainfo.
params
is a list of command-line strings. Note that the syntax of command-line argument strings is application-specific. The only way for you to know which command-line strings to provide it is to look at theParameters
metainfo field of a CLA file that has the correct parameters specified through the graphical user interface of the application.If the file is not found, does not have the right file type or is already initialized, an exception will be thrown.
Parameters: Returns: None
-
create_file
(source_files, name=None, params=None, calculate_checksums=False, expected_checksums=None, initialize=False)¶ Create a native Genestack file with the application and return its accession. If a source file is not found or is not of the expected type, an exception will be thrown.
Parameters: - source_files (list) – list of source files accessions
- name (str) – if a name is provided, the created file will be renamed
- params – custom command-line arguments strings; if None, the application defaults will be used.
- params – list
- calculate_checksums (bool) – a flag used in the initialization script to compute checksums for the created files
- expected_checksums (dict) – Dict of expected checksums (
{metainfo_key: expected_checksum}
) - initialize – should initialization be started immediately after the file is created?
Returns: accession of created file
Return type:
-
replace_file_reference
(accession, key, accession_to_remove, accession_to_add)¶ Replace a file reference on the file.
If the file is not found or is not of the right file type, the corresponding exceptions are thrown. If
accession_to_remove
oraccession_to_add
is not found, an exception will be thrown.Parameters: - accession – file accession or accession list
- key – key for source files
- accession_to_remove – accession to remove
- accession_to_add – accession to add
Returns: None
-
AffymetrixMicroarraysNormalizationApplication¶
-
class
genestack_client.
AffymetrixMicroarraysNormalizationApplication
(connection, application_id=None)¶ Bases:
genestack_client.CLApplication
-
APPLICATION_ID
= 'genestack/affymetrix-normalization'¶
-
AgilentMicroarraysNormalizationApplication¶
-
class
genestack_client.
AgilentMicroarraysNormalizationApplication
(connection, application_id=None)¶ Bases:
genestack_client.CLApplication
-
APPLICATION_ID
= 'genestack/agilent-normalization'¶
-
AlignedReadsQC¶
-
class
genestack_client.
AlignedReadsQC
(connection, application_id=None)¶ Bases:
genestack_client.CLApplication
-
APPLICATION_ID
= 'genestack/alignedreads-qc'¶
-
AlignedReadsSubsamplingApplication¶
-
class
genestack_client.
AlignedReadsSubsamplingApplication
(connection, application_id=None)¶ Bases:
genestack_client.CLApplication
-
APPLICATION_ID
= 'genestack/aligned-subsampling'¶
-
ArrayQualityMetricsApplication¶
-
class
genestack_client.
ArrayQualityMetricsApplication
(connection, application_id=None)¶ Bases:
genestack_client.CLApplication
-
APPLICATION_ID
= 'genestack/arrayqualitymetrics'¶
-
BWAApplication¶
-
class
genestack_client.
BWAApplication
(connection, application_id=None)¶ Bases:
genestack_client.CLApplication
-
APPLICATION_ID
= 'genestack/bwaMapper'¶
-
BowtieApplication¶
-
class
genestack_client.
BowtieApplication
(connection, application_id=None)¶ Bases:
genestack_client.CLApplication
-
APPLICATION_ID
= 'genestack/bowtie'¶
-
BsmapApplication¶
-
class
genestack_client.
BsmapApplication
(connection, application_id=None)¶ Bases:
genestack_client.CLApplication
-
APPLICATION_ID
= 'genestack/bsmap'¶
-
BsmapApplicationWG¶
-
class
genestack_client.
BsmapApplicationWG
(connection, application_id=None)¶ Bases:
genestack_client.CLApplication
-
APPLICATION_ID
= 'genestack/bsmapWG'¶
-
ConcatenateVariantsApplication¶
-
class
genestack_client.
ConcatenateVariantsApplication
(connection, application_id=None)¶ Bases:
genestack_client.CLApplication
-
APPLICATION_ID
= 'genestack/concatenateVariants'¶
-
CuffquantApplication¶
-
class
genestack_client.
CuffquantApplication
(connection, application_id=None)¶ Bases:
genestack_client.CLApplication
-
APPLICATION_ID
= 'genestack/cuffquant'¶
-
DoseResponseApplication¶
-
class
genestack_client.
DoseResponseApplication
(connection, application_id=None)¶ Bases:
genestack_client.CLApplication
-
APPLICATION_ID
= 'genestack/dose-response'¶
-
EffectPredictionApplication¶
-
class
genestack_client.
EffectPredictionApplication
(connection, application_id=None)¶ Bases:
genestack_client.CLApplication
-
APPLICATION_ID
= 'genestack/snpeff'¶
-
FastQCApplicaton¶
-
class
genestack_client.
FastQCApplicaton
(connection, application_id=None)¶ Bases:
genestack_client.CLApplication
-
APPLICATION_ID
= 'genestack/fastqc-report'¶
-
FilterByQuality¶
-
class
genestack_client.
FilterByQuality
(connection, application_id=None)¶ Bases:
genestack_client.CLApplication
-
APPLICATION_ID
= 'genestack/qualityFilter'¶
-
FilterDuplicatedReads¶
-
class
genestack_client.
FilterDuplicatedReads
(connection, application_id=None)¶ Bases:
genestack_client.CLApplication
-
APPLICATION_ID
= 'genestack/filter-duplicated-reads'¶
-
GOEnrichmentAnalysis¶
-
class
genestack_client.
GOEnrichmentAnalysis
(connection, application_id=None)¶ Bases:
genestack_client.CLApplication
-
APPLICATION_ID
= 'genestack/functionalEnrichmentAnalysis'¶
-
GenePixMicroarraysNormalizationApplication¶
-
class
genestack_client.
GenePixMicroarraysNormalizationApplication
(connection, application_id=None)¶ Bases:
genestack_client.CLApplication
-
APPLICATION_ID
= 'genestack/genepix-normalization'¶
-
HTSeqCountsApplication¶
-
class
genestack_client.
HTSeqCountsApplication
(connection, application_id=None)¶ Bases:
genestack_client.CLApplication
-
APPLICATION_ID
= 'genestack/htseqCount'¶
-
InfiniumMicroarraysNormalizationApplication¶
-
class
genestack_client.
InfiniumMicroarraysNormalizationApplication
(connection, application_id=None)¶ Bases:
genestack_client.CLApplication
-
APPLICATION_ID
= 'genestack/infinium-methylation-normalization'¶
-
IntersectApplication¶
-
class
genestack_client.
IntersectApplication
(connection, application_id=None)¶ Bases:
genestack_client.CLApplication
Parent class for all intersect applications.
-
APPLICATION_ID
= None¶
-
create_file
(source_files, name=None, params=None, calculate_checksums=False, expected_checksums=None, initialize=False)¶ Same as the parent method except that intersect applications also need a separate source file to intersect with, so it treats the last element of the
source_files
array as that file.
-
IntersectGenomicFeaturesMapped¶
IntersectGenomicFeaturesVariants¶
L1000MicroarraysNormalizationApplication¶
-
class
genestack_client.
L1000MicroarraysNormalizationApplication
(connection, application_id=None)¶ Bases:
genestack_client.CLApplication
-
APPLICATION_ID
= 'genestack/l1000-normalization'¶
-
MarkDuplicated¶
-
class
genestack_client.
MarkDuplicated
(connection, application_id=None)¶ Bases:
genestack_client.CLApplication
-
APPLICATION_ID
= 'genestack/markDuplicates'¶
-
MergeMappedReadsApplication¶
-
class
genestack_client.
MergeMappedReadsApplication
(connection, application_id=None)¶ Bases:
genestack_client.CLApplication
-
APPLICATION_ID
= 'genestack/merge-mapped-reads'¶
-
MethratioApplication¶
-
class
genestack_client.
MethratioApplication
(connection, application_id=None)¶ Bases:
genestack_client.CLApplication
-
APPLICATION_ID
= 'genestack/methratio'¶
-
NormalizationApplication¶
-
class
genestack_client.
NormalizationApplication
(connection, application_id=None)¶ Bases:
genestack_client.CLApplication
-
APPLICATION_ID
= 'genestack/normalization'¶
-
QiimeMicrobiomeAnalysis¶
-
class
genestack_client.
QiimeMicrobiomeAnalysis
(connection, application_id=None)¶ Bases:
genestack_client.CLApplication
-
APPLICATION_ID
= 'genestack/qiime-report'¶
-
RemoveDuplicated¶
-
class
genestack_client.
RemoveDuplicated
(connection, application_id=None)¶ Bases:
genestack_client.CLApplication
-
APPLICATION_ID
= 'genestack/removeDuplicates'¶
-
SingleCellRNASeqAnalysisApplication¶
-
class
genestack_client.
SingleCellRNASeqAnalysisApplication
(connection, application_id=None)¶ Bases:
genestack_client.CLApplication
-
APPLICATION_ID
= 'genestack/rnaseq'¶
-
SingleCellRNASeqVisualiserApplication¶
-
class
genestack_client.
SingleCellRNASeqVisualiserApplication
(connection, application_id=None)¶ Bases:
genestack_client.CLApplication
-
APPLICATION_ID
= 'genestack/scrvis'¶
-
SubsampleReads¶
-
class
genestack_client.
SubsampleReads
(connection, application_id=None)¶ Bases:
genestack_client.CLApplication
-
APPLICATION_ID
= 'genestack/subsampling'¶
-
TargetedSequencingQC¶
-
class
genestack_client.
TargetedSequencingQC
(connection, application_id=None)¶ Bases:
genestack_client.CLApplication
-
APPLICATION_ID
= 'genestack/alignedreads-qc-enrichment'¶
-
TestCLApplication¶
-
class
genestack_client.
TestCLApplication
(connection, application_id=None)¶ Bases:
genestack_client.CLApplication
-
APPLICATION_ID
= 'genestack/testcla'¶
-
TophatApplication¶
-
class
genestack_client.
TophatApplication
(connection, application_id=None)¶ Bases:
genestack_client.CLApplication
-
APPLICATION_ID
= 'genestack/tophat'¶
-
TrimAdaptorsAndContaminants¶
-
class
genestack_client.
TrimAdaptorsAndContaminants
(connection, application_id=None)¶ Bases:
genestack_client.CLApplication
-
APPLICATION_ID
= 'genestack/fastq-mcf'¶
-
TrimLowQualityBases¶
-
class
genestack_client.
TrimLowQualityBases
(connection, application_id=None)¶ Bases:
genestack_client.CLApplication
-
APPLICATION_ID
= 'genestack/trim-low-quality-bases'¶
-
TrimToFixedLength¶
-
class
genestack_client.
TrimToFixedLength
(connection, application_id=None)¶ Bases:
genestack_client.CLApplication
-
APPLICATION_ID
= 'genestack/fastx-trimmer'¶
-
UnalignedReadsQC¶
-
class
genestack_client.
UnalignedReadsQC
(connection, application_id=None)¶ Bases:
genestack_client.CLApplication
-
APPLICATION_ID
= 'genestack/unalignedreads-qc'¶
-
VariantsAssociationAnalysisApplication¶
-
class
genestack_client.
VariantsAssociationAnalysisApplication
(connection, application_id=None)¶ Bases:
genestack_client.CLApplication
-
APPLICATION_ID
= 'genestack/variantsAssociationAnalysis'¶
-
VariationCaller2Application¶
-
class
genestack_client.
VariationCaller2Application
(connection, application_id=None)¶ Bases:
genestack_client.CLApplication
-
APPLICATION_ID
= 'genestack/variationCaller-v2'¶
-
VariationCallerApplication¶
-
class
genestack_client.
VariationCallerApplication
(connection, application_id=None)¶ Bases:
genestack_client.CLApplication
-
APPLICATION_ID
= 'genestack/variationCaller'¶
-
VariationMergerApplication¶
-
class
genestack_client.
VariationMergerApplication
(connection, application_id=None)¶ Bases:
genestack_client.CLApplication
-
APPLICATION_ID
= 'genestack/variationMerger'¶
-
Genestack Objects¶
Metainfo¶
-
class
genestack_client.
Metainfo
¶ Bases:
dict
A Python representation of metainfo objects.
-
add_boolean
(key, value)¶ Add a boolean value.
Parameters: Return type:
-
add_date_time
(key, time)¶ Add a date. The time parameter can be passed in one of the following formats:
datetime.datetime
datetime.date
str
in format:'%Y-%m-%d %H:%M:%S'
or'%Y-%m-%d'
- number of seconds since the epoch as a floating point number
Parameters: - key (str) – key
- time – time value
Return type:
-
add_decimal
(key, value)¶ Add a decimal value.
Parameters: - key (str) – key
- value (float | str) – integer value
Return type:
-
add_external_link
(key, url, text=None, fmt=None)¶ Add an external link. The URL should point to a valid source file. The source should be either a publicly available file on the web, or a local file. Local files will be uploaded if imported with
DataImporter
Parameters: Return type:
-
add_file_reference
(key, accession)¶ Add a reference to another Genestack file.
Parameters: Return type:
-
add_integer
(key, value)¶ Add an integer value.
Parameters: Return type:
-
add_memory_size
(key, value)¶ Add a memory size in bytes.
Parameters: Return type:
-
add_organization
(key, name, department=None, country=None, city=None, street=None, postal_code=None, state=None, phone=None, email=None, url=None)¶ Add an organization. The name is required, and all other fields are optional. All fields will be visible to anyone who has access to this metainfo object.
Parameters: Return type: Deprecated since 0.32.0, use compound metainfo keys instead
-
add_person
(key, name, phone=None, email=None)¶ Add a person. The name is required, and all other fields are optional. All fields will be visible to anyone who has access to this metainfo object.
Parameters: Return type: Deprecated since 0.32.0, use compound metainfo keys instead
-
add_publication
(key, title, authors, journal_name, issue_date, identifiers=None, issue_number=None, pages=None)¶ Add a publication. All fields will be visible to anyone who has access to this metainfo object.
Parameters: - key (str) –
- title (str) – publication title
- identifiers (dict) – publication identifiers
- authors (str) – publication authors
- journal_name (str) – name of the journal containing this publication
- issue_date (str) – journal issue date
- issue_number (str) – journal issue number
- pages (str) – pages in the journal issue
Return type: Deprecated since 0.32.0, use compound metainfo keys instead
-
add_string
(key, value)¶ Add a string value.
Parameters: Return type:
-
add_temperature
(key, value, unit)¶ Add a temperature value. The value can be any number, supplied with a unit from a controlled vocabulary.
- The temperature unit should be one of the following:
CELSIUS
,KELVIN
,FAHRENHEIT
,
Parameters: Return type: Deprecated since 0.32.0, use compound metainfo keys instead
-
add_time
(key, value, unit)¶ Add a time value (like an age, or the duration of an experiment for example).
The value can be any number, supplied with a unit from a controlled vocabulary.
- The time unit should be one of the following:
YEAR
,MONTH
,WEEK
,DAY
,HOUR
,MINUTE
,SECOND
,MILLISECOND
Parameters: Param: number of units as float
Return type: Deprecated since 0.32.0, use compound metainfo keys instead
-
add_value
(key, value)¶ Add a scalar value to a metainfo key. If adding to an existing key, the value will be appended to the list of existing values. :param key: key :type key: str :param value: value :type value: MetainfoScalarValue :rtype None:
-
classmethod
parse_metainfo_from_dict
(source_dict)¶ Parse a Java map representing a metainfo object and create a Python Client Metainfo. :param source_dict: Java map :type source_dict: dict :rtype: Metainfo
-
Metainfo scalar values¶
-
class
genestack_client.metainfo_scalar_values.
ExternalLink
(url, text=None, fmt=None)¶ -
get_format
()¶
-
get_text
()¶
-
get_url
()¶
-
-
class
genestack_client.metainfo_scalar_values.
MetainfoScalarValue
(value)¶
-
class
genestack_client.metainfo_scalar_values.
Organization
(name, department=None, country=None, city=None, street=None, postal_code=None, state=None, phone=None, email=None, url=None)¶ -
get_organization
()¶
-
File filters¶
-
class
genestack_client.file_filters.
ActualOwnerFileFilter
¶ Filter to select files that are owned by the current user.
-
class
genestack_client.file_filters.
ActualPermissionFileFilter
(permission)¶ Filter to select files for which the current user has a specific permission. See File Permissions.
-
class
genestack_client.file_filters.
AndFileFilter
(first, second)¶ “AND” combination of two file filters.
-
class
genestack_client.file_filters.
BelongsToDatasetFileFilter
(file_accession)¶ Same as
ChildrenFileFilter
but searches for files that belong to the specified dataset.
-
class
genestack_client.file_filters.
ChildrenFileFilter
(container, recursive=False)¶ Filter to select files that are the children or descendants of a given container.
-
class
genestack_client.file_filters.
ContainsFileFilter
(file_accession)¶ Filter to select containers that contain a given file.
-
class
genestack_client.file_filters.
FileFilter
¶ Base file filter class.
-
AND
(other)¶ Return a new filter combining this one with another one in an AND clause.
Parameters: other (FileFilter) – other filter Return type: FileFilter
-
OR
(other)¶ Return a new filter combining this one with another one in an OR clause.
Parameters: other (FileFilter) – other filter Return type: FileFilter
-
-
class
genestack_client.file_filters.
FixedValueFileFilter
(value)¶ Fixed value filter (either
True
orFalse
).
-
class
genestack_client.file_filters.
HasInProvenanceFileFilter
(file_accession)¶ Filter to select files that have a given file in their provenance graph.
-
class
genestack_client.file_filters.
KeyValueFileFilter
(key, value)¶ Filter to select files with a given metainfo key-value pair.
-
class
genestack_client.file_filters.
MetainfoValuePatternFileFilter
(key, value)¶ Filter to select files matching a specific substring value for a metainfo key.
-
class
genestack_client.file_filters.
NotFileFilter
(other_filter)¶ Negation of another
FileFilter
-
class
genestack_client.file_filters.
OrFileFilter
(first, second)¶ “OR” combination of two file filters.
-
class
genestack_client.file_filters.
OwnerFileFilter
(email)¶ Filter to select files owned by a specific user.
-
class
genestack_client.file_filters.
PermissionFileFilter
(group, permission)¶ Filter to select files for which a specific group has a specific permission. See File Permissions.
-
class
genestack_client.file_filters.
TypeFileFilter
(file_type)¶ Filter to select files with a given file type. See File Types for a list of possible file types.
Genome Queries¶
-
class
genestack_client.genome_query.
GenomeQuery
¶ Class describing a genome query.
Create a new genome query. The default parameters for a query are:
- offset = 0
- limit = 5000
- no filters
- search across all contrasts
- sorting by increasing FDR
Return type: GenomeQuery -
class
Filter
¶ -
MAX_FDR
= 'maximumFDR'¶
-
MIN_LOG_COUNTS
= 'minimumLogCountsPerMillion'¶
-
MIN_LOG_FOLD_CHANGE
= 'minimumLogFoldChange'¶
-
REGULATION
= 'regulation'¶
-
-
class
SortingOrder
¶ -
BY_FDR
= 'ByPValue'¶
-
BY_LOG_COUNTS
= 'ByLogCountsPerMillion'¶
-
BY_LOG_FOLD_CHANGE
= 'ByLogFoldChange'¶
-
-
add_filter
(key, value)¶
-
get_map
()¶
-
set_contrasts
(contrasts)¶
-
set_feature_ids
(features)¶
-
set_limit
(limit)¶ Set maximum number of entries to retrieve per contrast.
Parameters: limit – Returns:
-
set_offset
(offset)¶
-
set_order_ascending
(ascending)¶
-
set_sorting_order
(order)¶
File Types¶
-
class
genestack_client.file_types.
FileTypes
¶ -
ALIGNED_READS
= 'com.genestack.bio.files.IAlignedReads'¶
-
APPLICATION_PAGE_FILE
= 'com.genestack.api.files.IApplicationPageFile'¶
-
AUXILIARY_FILE
= 'com.genestack.api.files.IAuxiliaryFile'¶
-
BTB_DOCUMENT
= 'com.genestack.api.files.btb.IBTBDocumentFile'¶
-
CODON_TABLE
= 'com.genestack.bio.files.ICodonTable'¶
-
CONTAINER
= 'com.genestack.api.files.IContainerFile'¶
-
DATASET
= 'com.genestack.api.files.IDataset'¶
-
DICTIONARY_FILE
= 'com.genestack.api.files.IDictionaryFile'¶
-
DIFFERENTIAL_EXPRESSION_FILE
= 'com.genestack.bio.files.differentialExpression.IDifferentialExpressionFile'¶
-
EXPRESSION_LEVELS
= 'com.genestack.bio.files.IExpressionLevels'¶
-
EXTERNAL_DATABASE
= 'com.genestack.bio.files.IExternalDataBase'¶
-
FEATURE_LIST
= 'com.genestack.bio.files.IFeatureList'¶
-
FILE
= 'com.genestack.api.files.IFile'¶
-
FOLDER
= 'com.genestack.api.files.IFolder'¶
-
GENE_EXPRESSION_SIGNATURE
= 'com.genestack.bio.files.IGeneExpressionSignature'¶
-
GENOME_ANNOTATIONS
= 'com.genestack.bio.files.IGenomeAnnotations'¶
-
GENOME_BED_DATA
= 'com.genestack.bio.files.IGenomeBEDData'¶
-
GENOME_WIGGLE_DATA
= 'com.genestack.bio.files.IGenomeWiggleData'¶
-
HT_SEQ_COUNTS
= 'com.genestack.bio.files.IHTSeqCounts'¶
-
INDEX_FILE
= 'com.genestack.api.files.IIndexFile'¶
-
MICROARRAY_DATA
= 'com.genestack.bio.files.IMicroarrayData'¶
-
PREFERENCES_FILE
= 'com.genestack.api.files.IPreferencesFile'¶
-
RAW_FILE
= 'com.genestack.api.files.IRawFile'¶
-
REFERENCE_GENOME
= 'com.genestack.bio.files.IReferenceGenome'¶
-
REPORT_FILE
= 'com.genestack.api.files.IReportFile'¶
-
SAMPLE
= 'com.genestack.api.files.ISample'¶
-
SEARCH_FOLDER
= 'com.genestack.api.files.ISearchFolder'¶
-
UNALIGNED_READS
= 'com.genestack.bio.files.IUnalignedReads'¶
-
VARIATION_FILE
= 'com.genestack.bio.files.IVariationFile'¶
-
Users and Connections¶
Connection¶
-
class
genestack_client.
Connection
(server_url, debug=False, show_logs=False)¶ Bases:
object
A class to handle a connection to a specified Genestack server. Instantiating the class does mean you are logged in to the server. To do so, you need to call the
login()
method.Parameters: -
application
(application_id)¶ Returns an application handler for the application with the specified ID.
Parameters: application_id (str) – Application ID. Returns: application class Return type: Application
-
check_version
()¶ Check the version of the client library required by the server. The server will return a message specifying the compatible version. If the current version is not supported, an exception is raised.
Returns: None
-
login
(email, password)¶ Attempt a login on the connection with the specified credentials. Raises an exception if the login fails.
Parameters: Return type: Raises: GenestackServerException
if module version is outdatedGenestackAuthenticationException
if login failed
-
login_by_token
(token)¶ Attempt a login on the connection with the specified token. Raises an exception if the login fails.
Parameters: token – token Return type: None Raises: GenestackServerException
if module version is outdatedGenestackAuthenticationException
if login failed
-
perform_request
(path, data='', follow=True, headers=None)¶ Perform an HTTP request to Genestack server.
Connects to remote server and sends
data
to an endpointpath
with additionalheaders
.Parameters: - path (str) – URL path (endpoint) to be used (concatenated with
self.server_url
). - data (dict|file|str) – dictionary, bytes, or file-like object to send in the body
- follow (bool) – should we follow a redirection (if any)
- str] headers (dict[str,) – dictionary of additional headers; list of pairs is supported too until v1.0 (for backward compatibility)
Returns: response from server
Return type: Response
- path (str) – URL path (endpoint) to be used (concatenated with
-
settings.User¶
-
class
genestack_client.settings.
User
(email, alias=None, host=None, password=None, token=None)¶ Bases:
object
Class encapsulating all user info required for authentication.
- That includes:
- user alias
- server URL (or is it hostname?)
- token or email/password pair
All fields are optional. If
alias
is None it will be the same asemail
.If you login interactively, no
email
orpassword
is required. The alias is used to find the matching user inget_user()
Parameters: -
get_connection
(interactive=True, debug=False, show_logs=False)¶ Return a logged-in connection for current user. If
interactive
isTrue
and the password or email are unknown, they will be asked in interactive mode.Parameters: Returns: logged connection
Return type:
Helper methods¶
get_connection¶
-
genestack_client.
get_connection
(args=None)¶ This is the same as
get_user()
.get_connection()
Generally the fastest way to get an active connection.Parameters: args (argparse.Namespace) – argument from argparse.parse_args
Returns: connection Return type: genestack_client.Connection
make_connection_parser¶
-
genestack_client.
make_connection_parser
(user=None, password=None, host=None, token=None)¶ Creates an argument parser with the provided connection parameters. If one of
email
,password
oruser
is specified, they are used. Otherwise, the default identity from the local config file will be used.Parameters: Returns: parser
Return type:
get_user¶
-
genestack_client.
get_user
(args=None)¶ Returns the user corresponding to the provided arguments. If
args
isNone
, usesmake_connection_parser()
to get arguments.Parameters: args (argparse.Namespace) – result of commandline parse Returns: user Return type: settings.User
Exceptions¶
GenestackBaseException¶
GenestackException¶
-
class
genestack_client.
GenestackException
¶ Bases:
genestack_client.genestack_exceptions.GenestackBaseException
Client-side exception class.
Raise its instances (instead of
Exception
) if anything is wrong on client side.
GenestackServerException¶
-
class
genestack_client.
GenestackServerException
(message, path, post_data, debug=False, stack_trace=None)¶ Bases:
genestack_client.GenestackException
Server-side exception class.
Raised when Genestack server returns an error response (error message generated by Genestack Java code, not an HTTP error).
Parameters:
GenestackAuthenticationException¶
-
class
genestack_client.
GenestackAuthenticationException
¶ Bases:
genestack_client.GenestackException
Exception thrown on an authentication error response from server.
GenestackResponseError¶
-
class
genestack_client.
GenestackResponseError
(reason)¶ Bases:
genestack_client.genestack_exceptions.GenestackBaseException
,urllib.error.URLError
Wrapper for HTTP response errors.
Extends
urllib2.URLError
for backward compatibility.
GenestackConnectionFailure¶
-
class
genestack_client.
GenestackConnectionFailure
(message)¶ Bases:
genestack_client.genestack_exceptions.GenestackBaseException
,urllib.error.URLError
Wrapper for server connection failures.
Extends
urllib2.URLError
for backward compatibility.
Others¶
GenestackShell¶
-
class
genestack_client.genestack_shell.
GenestackShell
(*args, **kwargs)¶ Bases:
cmd.Cmd
Arguments to be overridden in children:
INTRO
: greeting at start of shell modeCOMMAND_LIST
: list of available commandsDESCRIPTION
: description for help.
Run as script:
script.py [connection_args] command [command_args]
Run as shell:
script.py [connection_args]
- Default shell commands:
help
: show help about shell or commandquit
: quits shellctrl+D
: quits shell
-
cmdloop
(intro=None)¶ Repeatedly issue a prompt, accept input, parse an initial prefix off the received input, and dispatch to action methods, passing them the remainder of the line as argument.
-
default
(line)¶ Called on an input line when the command prefix is not recognized.
If this method is not overridden, it prints an error message and returns.
-
do_help
(line)¶ List available commands with “help” or detailed help with “help cmd”.
-
emptyline
()¶ Called when an empty line is entered in response to the prompt.
If this method is not overridden, it repeats the last nonempty command entered.
-
get_commands_for_help
()¶ Return list of command - description pairs to shown in shell help command.
Returns: command - description pairs Return type: list[(str, str)]
-
get_shell_parser
(offline=False)¶ Returns the parser for shell arguments.
Returns: parser for shell commands Return type: argparse.ArgumentParser
-
postloop
()¶ Hook method executed once when the cmdloop() method is about to return.
-
preloop
()¶ Hook method executed once when the cmdloop() method is called.
-
process_command
(command, argument_list, shell=False)¶ Runs the given command with the provided arguments and returns the exit code
Parameters: Returns: 0 if the command was executed successfully, 1 otherwise
Return type:
-
set_shell_user
(args)¶ Set the connection for shell mode.
Parameters: args (argparse.Namespace) – script arguments
Command¶
-
class
genestack_client.genestack_shell.
Command
¶ Bases:
object
Command class to be inherited.
COMMAND
: name of the commandDESCRIPTION
: description as shown in the help messageOFFLINE
: set toTrue
if the command does not require a connection to the Genestack server.
-
get_command_parser
(parser=None)¶ Returns a command parser. This function is called each time before a command is executed. To add new arguments to the command, you should override the
update_parser()
method.Parameters: parser (argparse.ArgumentParser) – base argument parser. For offline commands and commands inside shell, it will be None
. For the other cases, it will be the result ofmake_connection_parser()
Returns: parser Return type: argparse.ArgumentParser
-
get_short_description
()¶ Returns a short description for the command. Used in the “help” message.
:return short description :rtype: str
-
run
()¶ Override this method to implement the command action.
Return value of this method is always ignored. If this method raises an exception, the command will be treated as failed.
If this command is executed in the shell mode, the failed state is ignored, otherwise exit code
1
is returned.Raise
GenestackException
to indicate command failure without showing the stacktrace.Return type: None
-
set_arguments
(args)¶ Set parsed arguments for the command.
Parameters: args (argparse.Namespace) – parsed arguments
-
set_connection
(conn)¶ Set a connection for the command.
Parameters: conn (genestack_client.Connection) – connection
SpecialFolders¶
-
class
genestack_client.
SpecialFolders
¶ Bases:
object
IMPORTED
: folder with files created by Data ImportersCREATED
: folder with files created byPreprocess
andAnalyse
applicationsTEMPORARY
: folder with temporary filesUPLOADED
: folder with uploaded raw filesMY_DATASETS
: folder with created datasets
Command-Line Utilities¶
Command-line utilities installed with the Python Client Library.
genestack-user-setup¶
genestack-user-setup
is installed with the Python Client Library and can be accessed from a terminal by typing genestack-user-setup
.
Usage¶
This script can be used both in interactive shell mode and in static command-line mode:
usage: __main__.py [-H <host>] [-u <user>] [-p <password>] [--token <api-token>] [--debug] [--show-logs] [-h] [-v] [<command>] Genestack user management application. positional arguments: <command> "init", "list", "add", "default", "change- password", "change-token", "path", "remove", "rename" or empty to use shell optional arguments: -h, --help show this help message and exit -v, --version show version connection: -H <host>, --host <host> server host -u <user>, --user <user> user alias from settings or email -p <password>, --password <password> user password --token <api-token> API token to be used instead of the login and password --debug include server stacktrace into error messages (implies --show-logs) --show-logs print application logs received from server to stdout
You can get a description for every command
by typing:
$ genestack-user-setup command -h
In shell mode, type help
to get a list of available commands.
Use help command
to get help for a specific command.
See Connecting to a Genestack instance for more information about connection arguments.
genestack-user-setup
exits with 0
return code in case of success, 1
in case of
various nondescript errors, and 13
if server requires newer Python Client
version.
Commands¶
add:
usage: __main__.py add [-h] Add new user. optional arguments: -h, --help show this help message and exit
change-password:
usage: __main__.py change-password [-h] [<alias>] Change password for user. command arguments: <alias> Alias for user to change password optional arguments: -h, --help show this help message and exit
change-token:
usage: __main__.py change-token [-h] [<alias>] Change token for user. command arguments: <alias> Alias for user to change token for optional arguments: -h, --help show this help message and exit
default:
usage: __main__.py default [-h] [<alias>] Set default user. command arguments: <alias> Alias for user to change password optional arguments: -h, --help show this help message and exit
init:
usage: __main__.py [-h] [-H <host>] Create default settings. command arguments: -H <host>, --host <host> Genestack host address optional arguments: -h, --help show this help message and exit
list:
usage: __main__.py list [-h] List all users. optional arguments: -h, --help show this help message and exit
path:
usage: __main__.py path [-h] Show path to configuration file. optional arguments: -h, --help show this help message and exit
remove:
usage: __main__.py remove [-h] [<alias>] Remove user. command arguments: <alias> Alias for user to change password optional arguments: -h, --help show this help message and exit
rename:
usage: __main__.py rename [-h] [<alias>] [<new_alias>] Rename user. command arguments: <alias> Alias to be renamed <new_alias> New alias optional arguments: -h, --help show this help message and exit
genestack-application-manager¶
genestack-application-manager
is installed with the Python Client Library and can be accessed from a terminal by typing genestack-application-manager
.
Usage¶
This script can be used both in interactive shell mode and in static command-line mode:
usage: __main__.py [-H <host>] [-u <user>] [-p <password>] [--token <api-token>] [--debug] [--show-logs] [-h] [-v] [<command>] The Genestack Application Manager is a command-line utility that allows you to upload and manage your applications on a specific Genestack instance positional arguments: <command> "info", "install", "versions", "applications", "stable", "remove", "reload", "invoke", "visibility", "release", "status" or empty to use shell optional arguments: -h, --help show this help message and exit -v, --version show version connection: -H <host>, --host <host> server host -u <user>, --user <user> user alias from settings or email -p <password>, --password <password> user password --token <api-token> API token to be used instead of the login and password --debug include server stacktrace into error messages (implies --show-logs) --show-logs print application logs received from server to stdout
You can get a description for every command
by typing:
$ genestack-application-manager command -h
In shell mode, type help
to get a list of available commands.
Use help command
to get help for a specific command.
See Connecting to a Genestack instance for more information about connection arguments.
genestack-application-manager
exits with 0
return code in case of success, 1
in case of
various nondescript errors, and 13
if server requires newer Python Client
version.
Commands¶
applications:
usage: __main__.py applications [-h] [-H <host>] [-u <user>] [-p <password>] [--token <api-token>] [--debug] [--show-logs] Show information about available applications. optional arguments: -h, --help show this help message and exit connection: -H <host>, --host <host> server host -u <user>, --user <user> user alias from settings or email -p <password>, --password <password> user password --token <api-token> API token to be used instead of the login and password --debug include server stacktrace into error messages (implies --show-logs) --show-logs print application logs received from server to stdout
info:
usage: __main__.py info [-h] [-f] [-F] [--vendor] <jar_file_or_folder> [<jar_file_or_folder> ...] Display information about an application's JAR file. command arguments: -f, --with-filename show file names for each JAR -F, --no-filename do not show file names --vendor show only vendor for each JAR file <jar_file_or_folder> file to upload or folder with single JAR file inside (recursively) optional arguments: -h, --help show this help message and exit
install:
usage: __main__.py install [-h] [-H <host>] [-u <user>] [-p <password>] [--token <api-token>] [--debug] [--show-logs] [-f] [-o] [-s] [-S <scope>] [-i <visibility>] [-n] <version> <jar_file_or_folder> [<jar_file_or_folder> ...] Upload and install an application's JAR file to a Genestack instance. command arguments: -f, --force Run installation without any prompts (use with caution) -o, --override overwrite old version of the applications with the new one -s, --stable mark installed applications as stable -S <scope>, --scope <scope> scope in which application will be stable (default is 'user'): system | user | session -i <visibility>, --visibility <visibility> set initial visibility (use `-i organization` for setting organization visibility or `-i <group_accession>` for group visibility) -n, --no-wait Don't wait until all installed applications will be completely loaded <version> version of applications to upload <jar_file_or_folder> file to upload or folder with single JAR file inside (recursively) optional arguments: -h, --help show this help message and exit connection: -H <host>, --host <host> server host -u <user>, --user <user> user alias from settings or email -p <password>, --password <password> user password --token <api-token> API token to be used instead of the login and password --debug include server stacktrace into error messages (implies --show-logs) --show-logs print application logs received from server to stdout
invoke:
usage: __main__.py invoke [-h] [-H <host>] [-u <user>] [-p <password>] [--token <api-token>] [--debug] [--show-logs] <appId> <method> [<args> [<args> ...]] Invoke method of a stable application. command arguments: <appId> application identifier <method> application method to call <args> application method to call optional arguments: -h, --help show this help message and exit connection: -H <host>, --host <host> server host -u <user>, --user <user> user alias from settings or email -p <password>, --password <password> user password --token <api-token> API token to be used instead of the login and password --debug include server stacktrace into error messages (implies --show-logs) --show-logs print application logs received from server to stdout
release:
usage: __main__.py release [-h] [-H <host>] [-u <user>] [-p <password>] [--token <api-token>] [--debug] [--show-logs] <appId> <version> <newVersion> Create released application from testing one command arguments: <appId> application identifier <version> application version <newVersion> version of released application (must differ from other version of this application) optional arguments: -h, --help show this help message and exit connection: -H <host>, --host <host> server host -u <user>, --user <user> user alias from settings or email -p <password>, --password <password> user password --token <api-token> API token to be used instead of the login and password --debug include server stacktrace into error messages (implies --show-logs) --show-logs print application logs received from server to stdout
reload:
usage: __main__.py reload [-h] [-H <host>] [-u <user>] [-p <password>] [--token <api-token>] [--debug] [--show-logs] <version> <appId> [<appId> ...] Reload a specific version of an application. command arguments: <version> application version <appId> ID of the application to be marked as stable optional arguments: -h, --help show this help message and exit connection: -H <host>, --host <host> server host -u <user>, --user <user> user alias from settings or email -p <password>, --password <password> user password --token <api-token> API token to be used instead of the login and password --debug include server stacktrace into error messages (implies --show-logs) --show-logs print application logs received from server to stdout
remove:
usage: __main__.py remove [-h] [-H <host>] [-u <user>] [-p <password>] [--token <api-token>] [--debug] [--show-logs] [-f] <version> <appId> [<appId> ...] Remove a specific version of an application. command arguments: -f, --force Remove without any prompts (use with caution) <version> application version <appId> identifier of the application to remove (or `ALL` for removing all _your_ applications with specified version) optional arguments: -h, --help show this help message and exit connection: -H <host>, --host <host> server host -u <user>, --user <user> user alias from settings or email -p <password>, --password <password> user password --token <api-token> API token to be used instead of the login and password --debug include server stacktrace into error messages (implies --show-logs) --show-logs print application logs received from server to stdout
stable:
usage: __main__.py stable [-h] [-H <host>] [-u <user>] [-p <password>] [--token <api-token>] [--debug] [--show-logs] [-S <scope>] <version> <appId> [<appId> ...] Mark applications with the specified version as stable. command arguments: <version> applications version or '-' (hyphen) to remove stable version <appId> ID of the application to be marked as stable -S <scope>, --scope <scope> scope in which the application will be stable (default is 'user'): system | user | session optional arguments: -h, --help show this help message and exit connection: -H <host>, --host <host> server host -u <user>, --user <user> user alias from settings or email -p <password>, --password <password> user password --token <api-token> API token to be used instead of the login and password --debug include server stacktrace into error messages (implies --show-logs) --show-logs print application logs received from server to stdout
status:
usage: __main__.py status [-h] [-H <host>] [-u <user>] [-p <password>] [--token <api-token>] [--debug] [--show-logs] [-s] <version> <appId> [<appId> ...] Shows loading status of application and additional loading info command arguments: <version> application version <appId> identifier of the application -s, --state-only show only id and state, without error descriptions optional arguments: -h, --help show this help message and exit connection: -H <host>, --host <host> server host -u <user>, --user <user> user alias from settings or email -p <password>, --password <password> user password --token <api-token> API token to be used instead of the login and password --debug include server stacktrace into error messages (implies --show-logs) --show-logs print application logs received from server to stdout
versions:
usage: __main__.py versions [-h] [-H <host>] [-u <user>] [-p <password>] [--token <api-token>] [--debug] [--show-logs] [-s] [-i] [-l] [-r] [-o] <appId> Show information about available applications. command arguments: -s display stable scopes in output (S: System, U: User, E: sEssion) -i display visibility of each version -l display loading state of application with specific version -r display release state of version -o show only versions owned by current user <appId> application identifier to show versions optional arguments: -h, --help show this help message and exit connection: -H <host>, --host <host> server host -u <user>, --user <user> user alias from settings or email -p <password>, --password <password> user password --token <api-token> API token to be used instead of the login and password --debug include server stacktrace into error messages (implies --show-logs) --show-logs print application logs received from server to stdout
visibility:
usage: __main__.py visibility [-h] [-H <host>] [-u <user>] [-p <password>] [--token <api-token>] [--debug] [--show-logs] [-r] <appId> <version> <level> [<groups_accessions> [<groups_accessions> ...]] Set or remove visibility for application command arguments: -r, --remove Specifies if visibility must be removed (by default specific visibility will be added) <appId> application identifier <version> application version <level> Visibility level which will be set to application: group | organization | all <groups_accessions> Accessions of groups for 'group' visibility rule optional arguments: -h, --help show this help message and exit connection: -H <host>, --host <host> server host -u <user>, --user <user> user alias from settings or email -p <password>, --password <password> user password --token <api-token> API token to be used instead of the login and password --debug include server stacktrace into error messages (implies --show-logs) --show-logs print application logs received from server to stdout
Usage examples¶
If -u
is not specified, the default user is used.
Installing applications¶
If you want to install a new JAR file containing applications, simply type:
genestack-application-manager install my-version path/to/file.jar
If your JAR file is located in a specific folder, and this folder and its subfolders do not contain any other JAR file, you can specify the path to the folder instead of the full path to the JAR file. In that case, the folder and its subfolders will be searched for JAR files. If no JAR file or more than one JAR file is found, an error is returned.
genestack-application-manager install my-version path/to/folder
If you want to upload a JAR file and also mark all the applications inside it as stable for your current user, you can use
-s
option of theinstall
command (the default scope for marking applications as stable isuser
):genestack-application-manager install -s my-version path/to/file.jar
If you want to make an applications stable only for your session, you should specify
-S session
:genestack-application-manager install -s -S session my-version path/to/file.jar
Otherwise, you can use the
stable
command after installing the JAR file:JAR=path/to/file.jar VERSION=my-version genestack-application-manager install $VERSION $JAR for A in $(genestack-application-manager info $JAR | tail -n+3); do genestack-application-manager stable -S system $VERSION $A done
If you want to reinstall your applications later with the same version (whether or not that version was marked as stable), you can simply use the
-o
option for theinstall
commandThis option works exactly as removing the old version before uploading the new one, so there are two things to keep in mind: -
-o
can be used to overwrite only your versions, because you cannot overwrite or remove versions uploaded by other users; --o
removes the global stable mark, so if you overwrite a globally stable version, then after that no globally stable version will be available on the systemgenestack-application-manager install -o my-version path/to/file.jar
Sometimes you may want to upload a JAR file with many applications, and only mark as stable one of them. In this case you should use the
install
andstable
commands:genestack-application-manager install my-version path/to/file.jar genestack-application-manager stable my-version vendor/appIdFromJarFile
Removing all of your applications¶
If you want to remove all your applications, you can use the following bash script:
for A in $(genestack-application-manager applications); do for V in $(genestack-application-manager versions -o $A); do genestack-application-manager remove $V $A done done
And if you want to remove only those your applications that were loaded from a specific JAR file, then:
JAR=path/to/file.jar for A in $(genestack-application-manager info $JAR | tail -n+3); do for V in $(genestack-application-manager versions -o $A); do genestack-application-manager remove $V $A done done
genestack-shell¶
genestack-shell
is installed with the Python Client Library and can be accessed from a terminal by typing genestack-shell
.
Usage¶
This script can be used both in interactive shell mode and in static command-line mode:
usage: __main__.py [-H <host>] [-u <user>] [-p <password>] [--token <api-token>] [--debug] [--show-logs] [-h] [-v] [<command>] Shell and commandline application positional arguments: <command> "time", "call", "groups" or empty to use shell optional arguments: -h, --help show this help message and exit -v, --version show version connection: -H <host>, --host <host> server host -u <user>, --user <user> user alias from settings or email -p <password>, --password <password> user password --token <api-token> API token to be used instead of the login and password --debug include server stacktrace into error messages (implies --show-logs) --show-logs print application logs received from server to stdout
You can get a description for every command
by typing:
$ genestack-shell command -h
In shell mode, type help
to get a list of available commands.
Use help command
to get help for a specific command.
See Connecting to a Genestack instance for more information about connection arguments.
genestack-shell
exits with 0
return code in case of success, 1
in case of
various nondescript errors, and 13
if server requires newer Python Client
version.
Commands¶
call:
usage: __main__.py call [-h] [-H <host>] [-u <user>] [-p <password>] [--token <api-token>] [--debug] [--show-logs] applicationId method ... call another application's method command arguments: applicationId full application id method application method params params optional arguments: -h, --help show this help message and exit connection: -H <host>, --host <host> server host -u <user>, --user <user> user alias from settings or email -p <password>, --password <password> user password --token <api-token> API token to be used instead of the login and password --debug include server stacktrace into error messages (implies --show-logs) --show-logs print application logs received from server to stdout
groups:
usage: __main__.py groups [-h] [-H <host>] [-u <user>] [-p <password>] [--token <api-token>] [--debug] [--show-logs] print information about user groups optional arguments: -h, --help show this help message and exit connection: -H <host>, --host <host> server host -u <user>, --user <user> user alias from settings or email -p <password>, --password <password> user password --token <api-token> API token to be used instead of the login and password --debug include server stacktrace into error messages (implies --show-logs) --show-logs print application logs received from server to stdout
time:
usage: __main__.py time [-h] [-H <host>] [-u <user>] [-p <password>] [--token <api-token>] [--debug] [--show-logs] applicationId method ... invoke with timer command arguments: applicationId full application id method application method params params optional arguments: -h, --help show this help message and exit connection: -H <host>, --host <host> server host -u <user>, --user <user> user alias from settings or email -p <password>, --password <password> user password --token <api-token> API token to be used instead of the login and password --debug include server stacktrace into error messages (implies --show-logs) --show-logs print application logs received from server to stdout
genestack-uploader¶
genestack-uploader
is installed with the Python Client Library and can be accessed from a terminal by typing genestack-uploader
.
Usage¶
usage: __main__.py [-h] [-H <host>] [-u <user>] [-p <password>] [--token <api-token>] [--debug] [--show-logs] [-n] [-F <name> | --upload-to <accession>] <paths> [<paths> ...] Upload raw files to server and try to auto recognize them as genestack files. - Collecting files: Application can handle files and folder (will recursively collect all files). All paths must be valid. There is not limit to number of files. - Uploading: Files are stored in subfolder of 'Raw uploads'; subfolder name corresponds to user local time. Files are uploaded one by one, each in multiple threads. In case of network errors application attempts to retry until number of retries exceeded (5 by default), in which case application exits with error code. Uploaded data is not lost though and you can continue uploading this file from the point you stop. ATTENTION: When you upload multiple files from the command line, be sure to remove successfully uploaded files from the arguments when before re-running uploader, because otherwise all of them will be uploaded to the server again. - Recognition: Recognition done only if all files were uploaded successfully. It works over all files. Files that were not recognized are linked to subfolder 'Unrecognized files'. ATTENTION: Recognition of big number of files may cause server timeouts. Split uploading with recognition into relatively small iterations to prevent timeout failures. optional arguments: -h, --help show this help message and exit connection: -H <host>, --host <host> server host -u <user>, --user <user> user alias from settings or email -p <password>, --password <password> user password --token <api-token> API token to be used instead of the login and password --debug include server stacktrace into error messages (implies --show-logs) --show-logs print application logs received from server to stdout command arguments: <paths> path to files or folders -n, --no-recognition don't try to recognize files -F <name>, --folder_name <name> name of the upload folder, if name is not specified it will be generated --upload-to <accession> accession of the upload folder
genestack-uploader
exits with 0
return code in case of success, 1
if
recognition failed, and 13
if server requires newer Python Client version.