API Reference

This is the complete API reference of the Genestack Client Library. For a more gentle introduction, you can read the Getting Started with the Genestack Python Client Library section.

Application Wrappers

Application

class genestack_client.Application(connection, application_id=None)

Bases: object

Create a new application instance for the given connection. The connection must be logged in to call the application’s methods. The application ID can be specified either as an argument to the class constructor or by overriding the APPLICATION_ID attribute in a child class.

get_response(method, params=None, trace=True)

Invoke one of the application’s public Java methods and return Response object. Allow to access to logs and traces in code, if you need only result use invoke()

Parameters:
  • method (str) – name of the public Java method
  • params (tuple) – arguments that will be passed to the Java method. Arguments must be JSON-serializable.
  • trace (bool) – request trace from server
Returns:

Response object

Return type:

Response

invoke(method, *params)

Invoke one of the application’s public Java methods.

Parameters:
  • method (str) – name of the public Java method
  • params – arguments that will be passed to the Java method. Arguments must be JSON-serializable.
Returns:

JSON-deserialized response

upload_file(file_path, token)

Upload a file to the current Genestack instance. This action requires a special token that can be generated by the application.

Parameters:
  • file_path (str) – path to existing local file.
  • token – upload token
Return type:

None

DataImporter

class genestack_client.DataImporter(connection)

Bases: object

A class used to import files to a Genestack instance. If no parent is specified, the files are created in the special folder Imported files

Required and recommended values can be set by arguments directly or passed inside a Metainfo:

create_bed(name="Bed", url="some/url")

# is equivalent to:
metainfo = Metainfo()
metainfo.add_string(Metainfo.NAME, "Bed")
metainfo.add_external_link(Metainfo.DATA_LINK, "some/url", text="link name")
create_bed(metainfo=metainfo)

However, do not pass the same value both through the arguments and inside a metainfo object.

Genestack accepts both compressed and uncompressed files. If the protocol is not specified, file:// will be used. Special characters should be escaped except s3://. Links to Amazon S3 storage should be formatted as in s3cmd.

Supported protocols:

  • file://:
    • test.txt.gz
    • file://test.txt
    • file%20name.gz
  • ftp://
    • ftp://server.com/file.txt
  • http:// https://
    • http://server.com/file.txt
  • ascp://
    • ascp://<user>@<server>:file.txt
  • s3://
    • s3://bucket/file.gz
    • s3://bucket/file name.gz

If you are uploading a local file, a Raw Upload intermediary file will be created on the platform.

AFFYMETRIX_ANNOTATION = 'affymetrixMicroarrayAnnotation'

Affymetrix microarray annotation type

AGILENT_ANNOTATION = 'agilentMicroarrayAnnotation'

Agilent microarray annotation type

INFINIUM_ANNOTATION = 'methylationArrayAnnotation'

Infinium microarray annotation type

MICROARRAY_ANNOTATION_TYPES = ('agilentMicroarrayAnnotation', 'affymetrixMicroarrayAnnotation', 'TSVMicroarrayAnnotation', 'methylationArrayAnnotation')

Supported microarray annotation types

TSV_ANNOTATION = 'TSVMicroarrayAnnotation'

TSV (GenePix etc) microarray annotation type

create_bam(parent=None, name=None, url=None, organism=None, strain=None, reference_genome=None, metainfo=None)

Create a Genestack Aligned Reads file from a local or remote BAM file. name, url and organism are required fields. They can be specified through the arguments or via a Metainfo instance.

Parameters:
  • parent (str) – accession of parent folder (if not provided, files will be created in the Imported files folder)
  • name (str) – name of the file
  • url – URL of a BAM file; the index will be created at initialization
  • organism (str) – organism
  • strain – strain
  • reference_genome (str) – reference genome accession
  • metainfo (Metainfo) – metainfo object
Returns:

file accession

Return type:

str

create_bed(parent=None, name=None, reference_genome=None, url=None, metainfo=None)

Create a Genestack BED Track from a local or remote BED file. name and url are mandatory fields. They can be specified through the arguments or via a Metainfo instance.

Parameters:
  • parent (str) – accession of parent folder (if not provided, files will be created in the Imported files folder)
  • name (str) – name of the file
  • reference_genome (str) – accession of reference genome
  • url (str) – URL or local path to file
  • metainfo (Metainfo) – metainfo object
Returns:

file accession

Return type:

str

create_dbnsfp(parent=None, url=None, name=None, organism=None, metainfo=None)

Create a Genestack Variation Database file. name and url are required fields. They can be specified through the arguments or via a Metainfo instance.

Parameters:
  • parent (str) – accession of parent folder (if not provided, files will be created in the Imported files folder)
  • url (str) – URL or local path
  • name (str) – name of the file
  • organism (str) – organism
  • metainfo (Metainfo) – metainfo object
Returns:

file accession

Return type:

str

create_dictionary(parent=None, name=None, url=None, term_type=None, metainfo=None, parent_dictionary=None)

Create a Dictionary file from a local or remote file. owl, obo, and csv formats are supported. name and url are required fields. They can be specified through the arguments or via a Metainfo instance.

Parameters:
  • parent (str) – accession of parent folder (if not provided, files will be created in the Imported files folder)
  • name (str) – name of the file
  • url (str) – URL of a file
  • term_type (str) – dictionary term type
  • metainfo (Metainfo) – metainfo object
  • parent_dictionary (str) – accession of parent dictionary
Returns:

file accession

Return type:

str

create_expression_levels(parent=None, unit=None, name=None, url=None, metainfo=None)

Create a Expression Levels file from a local or remote expression levels file. name, url and unit are required fields. They can be specified through the arguments or via a Metainfo instance.

Parameters:
  • parent (str) – accession of parent folder (if not provided, files will be created in the Imported files folder)
  • name (str) – name of the file
  • url (str) – URL of the file
  • unit (str) – unit of expression, e.g. TPM, FPKM
  • metainfo (Metainfo) – metainfo object
Returns:

file accession

Return type:

str

create_gene_expression_signature(parent=None, name=None, url=None, organism=None, metainfo=None)

Create a Gene Expression Signature file from a local or remote gene expression signature file. name, url and organism are required fields. They can be specified through the arguments or via a Metainfo instance.

Parameters:
  • parent (str) – accession of parent folder (if not provided, files will be created in the Imported files folder)
  • name (str) – name of the file
  • url – URL of a file
  • organism (str) – organism name
  • metainfo (Metainfo) – metainfo object
Returns:

file accession

Return type:

str

create_gene_list(parent=None, name=None, url=None, organism=None, metainfo=None)

Create a Gene List file from a local or remote gene list file. name, url and organism are required fields. They can be specified through the arguments or via a Metainfo instance.

Parameters:
  • parent (str) – accession of parent folder (if not provided, files will be created in the Imported files folder)
  • name (str) – name of the file
  • url – URL of a file
  • organism (str) – organism name
  • metainfo (Metainfo) – metainfo object
Returns:

file accession

Return type:

str

create_genome_annotation(parent=None, url=None, name=None, organism=None, reference_genome=None, strain=None, metainfo=None)

Create a Genestack Genome Annotation file from a local or remote file. name and url are required fields. They can be specified through the arguments or via a Metainfo instance.

Parameters:
  • parent (str) – accession of parent folder (if not provided, files will be created in the Imported files folder)
  • url (str) – URL or local path
  • name (str) – name of the file
  • organism (str) – organism
  • reference_genome (str) – reference genome accession
  • strain (str) – strain
  • metainfo (Metainfo) – metainfo object
Returns:

file accession

Return type:

str

create_infinium_microarray_data(parent, name=None, urls=None, method=None, metainfo=None)

Create a Genestack Infinium Microarrays Data inside a folder. We can’t use create_microarray_data method because ‘microarrayData’ importer can have only one source file, while infinium assay has two. So we invoke ‘infinium MicroarrayData’ importer with two links for BioMetaKeys.DATA_LINK key in metainfo.

Infinum microarrays available only for humans so we have no ‘organism’ key in arguments.

Parameters:
  • parent (str) – accession of parent folder
  • name (str) – name of the file
  • urls (list) – list of urls
  • method (str) – method
  • metainfo (Metainfo) – metainfo object
Returns:

file accession

Return type:

str

create_mapped_reads_count(parent=None, name=None, url=None, reference_genome=None, metainfo=None)

Create a Mapped Reads Count file from a local or remote mapped reads count file. name and url are required fields. They can be specified through the arguments or via a Metainfo instance.

Parameters:
  • parent (str) – accession of parent folder (if not provided, files will be created in the Imported files folder)
  • name (str) – name of the file
  • url – URL of a file
  • reference_genome (str) – reference genome accession
  • metainfo (Metainfo) – metainfo object
Returns:

file accession

Return type:

str

create_microarray_annotation(annotation_type, parent=None, name=None, url=None, metainfo=None)

Create a Dictionary file from a local or remote microarray annotation file. name and url are required fields. They can be specified through the arguments or via a Metainfo instance.

Parameters:
  • annotation_type (str) – type of annotation being loaded, an element of MICROARRAY_ANNOTATION_TYPES
  • parent (str) – accession of parent folder (if not provided, files will be created in the Imported files folder)
  • name (str) – name of the file
  • url – URL of a file
  • metainfo (Metainfo) – metainfo object
Returns:

file accession

Return type:

str

create_microarray_data(parent, name=None, urls=None, method=None, organism=None, metainfo=None)

Create a Genestack Microarray Data inside an folder. name and urls are required fields. They can be specified through the arguments or via a Metainfo instance.

Parameters:
  • parent (str) – accession of parent folder
  • name (str) – name of the file
  • urls (list) – list of urls
  • method (str) – method
  • organism (str) – organism
  • metainfo (Metainfo) – metainfo object
Returns:

file accession

Return type:

str

create_reference_genome(parent=None, name=None, description='', sequence_urls=None, annotation_url=None, organism=None, assembly=None, release=None, strain=None, metainfo=None)

Create a Genestack Reference Genome from a collection of local or remote FASTA sequence files, and a GTF or GFF annotation file. name, sequence_urls, organism and annotation_url are required fields. They can be specified through the arguments or via a Metainfo instance.

Parameters:
  • parent (str) – accession of parent folder (if not provided, files will be created in the Imported files folder)
  • name (str) – name of the file
  • description (str) – experiment description
  • sequence_urls (list) – list urls or local path to sequencing files.
  • annotation_url (str) – url to annotation file
  • organism (str) – organism
  • assembly (str) – assembly
  • release (str) – release
  • strain (str) – strain
  • metainfo (Metainfo) – metainfo object
Returns:

create_report_file(parent=None, name=None, urls=None, metainfo=None)

Create a Genestack Report File from a local or remote data file. name and urls are required fields. They can be specified through the arguments or via a Metainfo instance.

Parameters:
  • parent (str) – accession of parent folder (if not provided, files will be created in the Imported files folder)
  • name (str) – name of the file
  • urls (list or str) – URL or list of URLs of local file paths
  • metainfo (Metainfo) – metainfo object
Returns:

file accession

Return type:

str

create_unaligned_read(parent=None, name=None, urls=None, method=None, organism=None, metainfo=None)

Create a Genestack Unaligned Reads file from one or several local or remote files. Most common file formats encoding sequencing reads with quality scores are accepted (FASTQ 33/64, SRA, FASTA+QUAL, SFF, FAST5). name and urls are required fields. They can be specified through the arguments or via a Metainfo instance.

Parameters:
  • parent (str) – accession of parent folder (if not provided, files will be created in the Imported files folder)
  • name (str) – name of the file
  • urls (list) – list of urls
  • method (str) – method
  • organism (str) – organism
  • metainfo (Metainfo) – metainfo object
Returns:

file accession

Return type:

str

create_vcf(parent=None, name=None, reference_genome=None, url=None, metainfo=None)

Create a Genestack Variants file from a local or remote VCF file. name and url are required fields. They can be specified through the arguments or via a Metainfo instance.

Parameters:
  • parent (str) – accession of parent folder (if not provided, files will be created in the Imported files folder)
  • name (str) – name of the file
  • reference_genome (str) – accession of reference genome
  • url (str) – URL or local path to file
  • metainfo (Metainfo) – metainfo object
Returns:

file accession

Return type:

str

create_wig(parent=None, name=None, reference_genome=None, url=None, metainfo=None)

Create a Genestack Wiggle Track from a local or remote WIG file. name and url are required fields. They can be specified through the arguments or via a Metainfo instance.

Parameters:
  • parent (str) – accession of parent folder (if not provided, files will be created in the Imported files folder)
  • name (str) – name of the file
  • reference_genome (str) – accession of reference genome
  • url (str) – URL or local path to file
  • metainfo (Metainfo) – metainfo object
Returns:

file accession

Return type:

str

load_raw(file_path)

Create a Genestack Raw Upload file from a local file, and return the accession of the created file.

Parameters:file_path (str) – existing file path
Returns:accession
Return type:str

FilesUtil

class genestack_client.FilesUtil(connection, application_id=None)

Bases: genestack_client.Application

An application to perform file management operations on Genestack.

add_checksums(app_file, expected_checksums)

Add expected MD5 checksum to the metainfo of a CLA file. Expected checksums are calculated in the following way:

  • The number of checksums equals number of entries in storage. For instance, a Reference Genome file has 2 entries (annotation and sequence files).
  • If there are multiple files in one entry, they will be concatenated in the same order as they were PUT to storage by the initialization script.
  • If a file is marked for testing, then after initialization its metainfo will contain both expected and actual checksum values.
Parameters:
  • app_file – accession of application file
  • expected_checksums – collection of MD5 checksums
Returns:

None

add_metainfo_string_value(accession_list, key, value)

Add a string value to the metainfo of specified files.

Parameters:
  • accession_list (list[str]) – list of files to be updated
  • key (str) – metainfo key
  • value (str) – string
Return type:

None

add_metainfo_values(accession, metainfo, skip_existing_keys=True, replace_existing_keys=False)

Add metainfo to a specified file. By default, metainfo keys that are already present in the file will be skipped.

Parameters:
  • accession – accession of the file to update
  • metainfo (Metainfo) – metainfo object containing the metainfo to add
  • skip_existing_keys (bool) – ignore metainfo keys that are already present in the file’s metainfo (default: True)
  • replace_existing_keys (bool) – replace the existing metainfo value for the metainfo keys that are already present in the file’s metainfo (default: False)
Return type:

None

collect_initializable_files_in_container(accession)

Recursively search for all initialisable file in container.

Parameters:accession (str) – accession of container
Returns:list of accessions
Return type:list
collect_metainfos(accessions)

Get complete metainfo of a list of files.

Parameters:accessions (list[str]) – list of accessions
Returns:list of metainfo objects
Return type:list[Metainfo]
count_file_children(container_accession)

Count children of a container (not recursive). :param container_accession: accession of container :type container_accession: str :return: number of children :rtype int:

create_folder(name, parent=None, description=None, metainfo=None)

Create a folder.

Parameters:
  • name (str) – name of the folder
  • parent (str) – if not specified, create folder in the user’s private folder
  • description (str) – description of the folder (goes into the metainfo)
  • metainfo (Metainfo) – additional Metainfo. Description and accession should be specified either via arguments or in a metainfo object (but not in both).
Returns:

accession of created folder

find_file_by_name(name, parent=None, file_class='com.genestack.api.files.IFile')

Finds file with specified name (ignore case!) and type. If no file is found None is returned. If more than one file is found the first one is returned. If the parent container is not found, the corresponding exceptions are thrown.

Parameters:
  • name (str) – file name
  • parent (str) – parent accession, private folder is default
  • file_class (str) – File class to be returned, default IFile
Returns:

file accession

Return type:

str

find_files(file_filter, sort_order='DEFAULT', ascending=False, offset=0, limit=2000)

Search for files with file_filter and return dictionary with two key/value pairs:

  • 'total': total number (int) of files matching the query
  • 'result': list of file info dictionaries for subset of matching files
    (from offset to offset+limit). See the documentation of get_infos() for the structure of these objects.
Parameters:
  • file_filter (FileFilter) – file filter
  • sort_order (str) – sorting order for the results, see SortOrder
  • ascending (bool) – should the results be in ascending order? (default: False)
  • offset (int) – search offset (default: 0, cannot be negative)
  • limit (int) – maximum number of results to return (max and default: 100)
Returns:

a dictionary with search response

Return type:

dict[str, int|list[dict[str, str|dict]]]

find_or_create_folder(name, parent=None)

Return the folder accession if it already exists, and create it otherwise. If more than one folder is found the first one is returned.

Parameters:
  • name (str) – display name
  • parent (str) – parent accession, use home folder if None
Returns:

accession of folder

Return type:

str

find_reference_genome(organism, assembly, release)

Returns the accession of the reference genome with the specified parameters: organism, assembly, release. If more than one or no genome is found, the corresponding exceptions are thrown.

Parameters:
  • organism (str) – organism
  • assembly (str) – assembly
  • release (str) – release
Returns:

accession

Return type:

str

Raises:

GenestackServerException if more than one genome, or no genome is found

get_file_children(container_accession)

Return accessions of files linked to current container.

Parameters:container_accession (str) – accession of container
Returns:list of accessions
Return type:list
get_folder(parent, *names, **kwargs)

Find a subfolder (by name) in a folder passed as an accession, returning accession of that subfolder. If several names are provided, treat them as a path components for the sub-sub-…-folder down the folder hierarchy, returning accession of that deepmost folder:

  • fu.get_folder('GS777', 'RNASeq') looks for subfolder with name “RNASeq” in folder with accession “GS777”, and returns accession of that “RNASeq” subfolder;
  • fu.get_folder('GS777', 'Experiments', 'RNASeq') looks for subfolder with name “Experiments” in a folder with accession “GS777”, then looks for “RNASeq” in “Experiments”, and returns the accession of “RNASeq”.

If create=True is passed as a kwarg, all the folders in names hierarchy will be created (otherwise GenestackException is raised).

Parameters:
  • parent (str) – accession of folder to search in
  • *names – tuple of “path components”, a hierarchy of folders to find
  • create (bool) – whether to create folders from names if they don’t exist or not; default is False (raise GenestackException if any folder doesn’t exist)
Returns:

accession of found (or created) subfolder

Return type:

str

Raises:

GenestackException – if no name is passed, or folder with required name is not found (and shouldn’t be created)

get_home_folder()

Return the accession of the current user’s home folder.

Returns:accession of home folder
Return type:str
get_infos(accession_list)

Returns a list of dictionaries with information about each of the specified files. This will return an error if any of the accessions is not valid. The order of the returned list is the same as the one of the accessions list.

The information dictionaries have the following structure:

  • accession

  • owner

  • name

  • isDataset

  • application

    • id
  • initializationStatus

    • isError
    • id
  • permissionsByGroup (the value for each key is a dictionary with group accessions as keys)

    • groupNames
    • ids
  • time

    • fileCreation
    • initializationQueued
    • initializationStart
    • initializationEnd
    • fileCreation
    • lastMetainfoModification
Parameters:accession_list (list) – list of valid accessions.
Returns:list of file info dictionaries.
Return type:list[dict[str, object]]
get_metainfo_values_as_string_list(accessions_list, keys_list=None)

Retrieve metainfo values as lists of strings for specific files and metainfo keys. The function returns a dictionary.

Parameters:
  • accessions_list – accessions of the files to retrieve
  • keys_list – metainfo keys to retrieve (if None, all non-technical keys are retrieved for each file)
Type:

accessions: list[str]

Type:

keys: list[str]|None

Returns:

a two-level dictionary with the following structure: accession -> key -> value list

Return type:

dict[str, dict[str, list[str]]]

get_metainfo_values_as_strings(accessions_list, keys_list=None)

Retrieve metainfo values as strings for specific files and metainfo keys. Metainfo value lists are concatenated to string using ‘, ‘ as delimiter. The function returns a dictionary.

Parameters:
  • accessions_list – accessions of the files to retrieve
  • keys_list – metainfo keys to retrieve (if None, all non-technical keys are retrieved for each file)
Type:

accessions: list[str]

Type:

keys: list[str]|None

Returns:

a two-level dictionary with the following structure: accession -> key -> value

Return type:

dict[str, dict[str, str]]

get_public_folder()

Return the accession of the Public folder on the current Genestack instance.

Returns:accession of Public folder
Return type:str
get_special_folder(name)

Return the accession of a special folder.

Available special folders are described in SpecialFolders

Parameters:name (str) – special folder name
Returns:accession
Return type:str
Raises:GenestackException: if folder name is unknown

Link a file to a folder.

Parameters:
  • accession (str) – file accession
  • parent (str) – parent folder accession
Return type:

None

Link files to containers.

Parameters:children_to_parents_dict – dictionary where keys are accessions of the files to link, and values are lists of accessions of the containers to link into
Type:dict
Return type:None
mark_for_tests(app_file)

Mark Genestack file as test one by adding corresponding key to metainfo. Test file will calculate md5 checksums of its encapsulated physical files during initialization.

Parameters:app_file – accession of file
Returns:None
mark_obsolete(accession)

Mark Genestack file as obsolete one by adding corresponding key to metainfo.

Parameters:accession – accession of file
Returns:None
remove_metainfo_value(accession_list, key)

Delete a key from the metainfo of specified files.

Parameters:
  • accession_list (list[str]) – list of files to be updated
  • key (str) – metainfo key
Return type:

None

rename_file(accession, name)

Rename a file.

Parameters:
  • accession (str) – file accession
  • name (str) – name
Return type:

None

replace_metainfo_string_value(accession_list, key, value)

Replace a string value in the metainfo of specified files.

Parameters:
  • accession_list (list[str]) – list of files to be updated
  • key (str) – metainfo key
  • value (str) – string
Return type:

None

replace_metainfo_value(accession_list, key, value)

Replace a value in the metainfo of specified files.

Parameters:
Return type:

None

Unlink a file from a folder.

Parameters:
  • accession (str) – file accession
  • parent (str) – folder accession
Return type:

None

Unlink files from containers.

Parameters:children_to_parents_dict (dict[str, list[str]]) – dictionary where keys are accessions of the files to unlink, and values are lists of accessions of the containers to unlink from
Return type:None

GroupsUtil

class genestack_client.GroupsUtil(connection, application_id=None)

Bases: genestack_client.Application

find_group_by_name(name)

Finds group with specified name. If there are no groups or more than one group with this name, an exception is thrown.

Parameters:name (str) – group name
Returns:group accession
Return type:str

ShareUtil

class genestack_client.ShareUtil(connection, application_id=None)

Bases: genestack_client.Application

Application that acts as a facade for sharing-related operations.

class Permissions

Bases: object

Supported permission values that can be used in ShareUtil.share_files() and ShareUtil.share_folder() methods.

VIEW

Allows finding files via search and reading files’ content

EDIT

Allows finding files via search, reading files’ content and modifying files’ metainfo

SHARE

Allows finding files via search, reading files’ content and sharing them with other groups. This permissions type only allows sharing by group members from the same organization as the file owner. When sharing, non-owners are only allowed to set permissions that are the same or narrower that they currently have for the given file.

get_available_sharing_groups()

Find groups that the current user can share files with, which means that he is either a sharing user or an administrator of these groups.

Returns:dictionary in format ‘group accession’ -> ‘group name’
Return type:dict
safe_share_files(file_accessions, group_accession, permissions, destination_folder=None)

Same as share_files() but does not throw an exception in case some of the given files cannot be shared (i.e. the current user doesn’t own them or doesn’t have the ShareUtil.Permissions.SHARE permission).

Parameters:
  • file_accessions (str | collections.Iterable[str]) – accession or an iterable of accessions of files to be shared
  • group_accession (str) – accession of the group to share the files with
  • permissions (str | collections.Iterable[str]) – permissions that should be assigned to the provided files. Must consist of ShareUtil.Permissions values
  • destination_folder (str) – accession of the folder to link shared files into. Typically this parameter should be used for linking files into group folders, which is currently impossible to do using the FilesUtil.link_file() method. No links will be created if this parameter is equal to None
share_files(file_accessions, group_accession, permissions, destination_folder=None)

Share files with the given permissions with the given groups. Available permission values are listed in the ShareUtil.Permissions class.

Parameters:
  • file_accessions (str | collections.Iterable[str]) – accession or an iterable of accessions of files to be shared
  • group_accession (str) – accession of the group to share the files with
  • permissions (str | collections.Iterable[str]) – permissions that should be assigned to the provided files. Must consist of ShareUtil.Permissions values
  • destination_folder (str) – accession of the folder to link shared files into. Typically this parameter should be used for linking files into group folders, which is currently impossible to do using the FilesUtil.link_file() method. No links will be created if this parameter is equal to None
Raises:

GenestackServerException – if some of the given files cannot be shared by the current user (i.e. he doesn’t own them or doesn’t have the ShareUtil.Permissions.SHARE permission).

share_files_for_edit(file_accessions, group_accession, destination_folder=None)

Share files with editing permissions. Editing permissions include viewing permissions and also allow modifying metainfo and linking/unlinking files (only applicable to containers and datasets).

This method is equivalent to calling safe_share_files() method with ShareUtil.Permissions.EDIT permission.

Parameters:
  • file_accessions (str | collections.Iterable[str]) – accession or an iterable of accessions of files to be shared
  • group_accession (str) – accession of the group to share the files with
  • destination_folder (str) – accession of the folder to link shared files into. Typically this parameter should be used for linking files into group folders, which is currently impossible to do using the FilesUtil.link_file() method. No links will be created if this parameter is equal to None.
share_files_for_view(file_accessions, group_accession, destination_folder=None)

Share files with viewing permissions. Viewing permissions include finding the shared files and running tasks that access their content.

This method is equivalent to calling safe_share_files() method with ShareUtil.Permissions.VIEW permission.

Parameters:
  • file_accessions (str | collections.Iterable[str]) – accession or an iterable of accessions of files to be shared
  • group_accession (str) – accession of the group to share the files with
  • destination_folder (str) – accession of the folder to link shared files into. Typically this parameter should be used for linking files into group folders, which is currently impossible to do using the FilesUtil.link_file() method. No links will be created if this parameter is equal to None.
share_folder(folder_accession, group_accession, permissions, destination_folder=None)

Recursively share the given folder, its subfolders and files inside them. Files that cannot be shared by the current user will be skipped.

This method is useful for sharing folders with a lot of files because calling share_files() may result in a timeout. This method shares files in chunks and may take significant time to complete.

Parameters:
  • folder_accession (str) – accession of the folder
  • group_accession (str) – accession of the group to share the files with
  • permissions (str | collections.Iterable[str]) – permissions that should be assigned to the provided files. Must consist of ShareUtil.Permissions values
  • destination_folder (str) – accession of the folder to link shared files into. Typically this parameter should be used for linking files into group folders, which is currently impossible to do using the FilesUtil.link_file() method. No links will be created if this parameter is equal to None

FileInitializer

class genestack_client.FileInitializer(connection, application_id=None)

Bases: genestack_client.Application

Wrapper class around the File Initializer application.

initialize(accessions)

Start initialization for the specified accessions. Missed accession and initialization failures are ignored silently.

Parameters:accessions (list[str]) – list of accessions
Return type:None
load_info(accessions)

Takes as input a list of file accessions and returns a list of dictionaries (one for each accession) with the following structure:

  • accession: (str) file accession
  • name: (str) file name if the file exists
  • status: (str) initialization status

The possible values for status are:

  • NoSuchFile
  • NotApplicable
  • NotStarted
  • InProgress
  • Complete
  • Failed
Parameters:accessions (list[str]) – list of accessions
Returns:list of dictionaries
Return type:list

TaskLogViewer

class genestack_client.TaskLogViewer(connection, application_id=None)

Bases: genestack_client.Application

A wrapper class for the Task Logs Viewer application. This application allows you to access the initialization logs of a file.

print_log(accession, log_type=None, follow=True, offset=0)

Print a file’s latest task initialization logs to stdout. Raises an exception if the file is not found or has no associated initialization task. By default the output stdout log is shown. You can also view the stderr error log. follow=True will wait until initialization is finished. Incoming logs will be printed to the console.

Parameters:
  • accession – file accession
  • log_typestdout or stderr
  • follow – if enabled, wait and display new lines as they appear (similar to tail --follow)
  • offset – offset from which to start retrieving the logs. Set to -1 if you want to start retrieving logs from the latest chunk.

Expression Navigator

class genestack_client.expression_navigator.ExpressionNavigatorforGenes(connection, application_id=None)
APPLICATION_ID = 'genestack/expressionNavigator'
PKG_DESEQ = 'DESeq2'
PKG_EDGER = 'edgeR'
create_file(groups, r_package='DESeq2', organism=None)

Create an expression navigator file from RNA-seq gene counts files. Each group is described by a dictionary with the following keys:

  • accessions: list of accessions of the raw gene counts files for this group
  • name (optional): group name
  • description (optional): group description
Parameters:
  • groups – list of dictionaries describing the groups for differential expression. See above for the dictionary structure.
  • r_package – name of R package to use for differential expression (either edgeR or DESeq2)
  • organism – organism
Returns:

accession of the created Expression Navigator file

class genestack_client.expression_navigator.ExpressionNavigatorforIsoforms(connection, application_id=None)
APPLICATION_ID = 'genestack/expressionNavigator-isoforms'
create_file(groups, fragment_bias_corr=True, multi_mapping_corr=True, organism=None)

Create an expression navigator file from RNA-seq isoform FPKM counts files. Each group is described by a dictionary with the following keys:

  • accessions: list of accessions of the isoform counts files for this group
  • name (optional): group name
  • description (optional): group description
Parameters:
  • groups – list of dictionaries describing the groups for differential expression. See above for the dictionary structure.
  • fragment_bias_corr (bool) – apply correction for fragment bias
  • multi_mapping_corr (bool) – apply correction for reads with multiple mappings
  • organism – organism
Returns:

accession of the created Expression Navigator file

class genestack_client.expression_navigator.ExpressionNavigatorforMicroarrays(connection, application_id=None)
APPLICATION_ID = 'genestack/expressionNavigator-microarrays'
create_file(groups, normalized_microarray_file, microarray_annotation, organism=None)

Create an Expression Navigator file from a normalized microarray file. Each group is described by a dictionary with the following keys:

  • accessions: list of accessions of the source microarray files for this group
  • name (optional): group name
  • description (optional): group description
  • is_control (optional): boolean value indicating whether the group is a control group
Parameters:
  • groups – list of dictionaries describing the groups for differential expression. See above for the dictionary structure.
  • normalized_microarray_file – accession of normalized microarray file
  • microarray_annotation – accession of the microarray annotation file
  • organism – organism
Returns:

accession of the created Expression Navigator file

DatasetsUtil

class genestack_client.DatasetsUtil(connection, application_id=None)
APPLICATION_ID = 'genestack/datasetsUtil'
BATCH_SIZE = 100
add_dataset_children(accession, children)

Add new files to a dataset.

Parameters:
  • accession (str) – dataset accession
  • children (list[str]) – list of children accessions to add to the dataset
add_file_to_datasets(file_accession, dataset_accessions)

Add given file to several datasets.

Parameters:
  • file_accession (str) – file accession
  • dataset_accessions (list[str]) – accessions of the datasets
create_dataset(name, dataset_type, children, parent=None, dataset_metainfo=None)

Create a dataset.

Parameters:
  • name (str) – name of the dataset
  • dataset_type (str) – type of the dataset (children files interface name, must extend IDataFile)
  • children (list[str]) – list of children accessions
  • parent (str) – folder for the new dataset, ‘My datasets’ if not specified
  • dataset_metainfo (Metainfo) – metainfo of the created dataset
Returns:

dataset accession

Return type:

str

create_empty_dataset(name, dataset_type, parent=None, dataset_metainfo=None)

Create an empty dataset.

Parameters:
  • name (str) – name of the dataset
  • dataset_type (str) – type of the dataset (children files interface name, must extend IDataFile)
  • parent (str) – folder for the new dataset, ‘My datasets’ if not specified
  • dataset_metainfo (Metainfo) – metainfo of the created dataset
Returns:

dataset accession

Return type:

str

create_subset(accession, children, parent=None)

Create a subset from dataset’s children.

Parameters:
  • accession (str) – dataset accession
  • children (list[str]) – list of children accessions to create a subset
  • parent (str) – folder for the new dataset, ‘My datasets’ if not specified
Returns:

accession of the created subset

Return type:

str

get_dataset_children(accession)

Return generator over children accessions of the provided dataset.

Parameters:accession (str) – dataset accession
Returns:generator over dataset’s children accessions
get_dataset_size(accession)

Get number of files in dataset.

Parameters:accession (str) – dataset accession
Returns:number of files in dataset
Return type:int
merge_datasets(datasets, parent=None)

Create a new dataset from the given datasets.

Parameters:
  • datasets (list[str]) – list of source datasets accessions
  • parent (str) – folder for the new dataset, ‘My datasets’ if not specified
Returns:

accession of the created dataset

Return type:

str

remove_dataset_children(accession, children)

Remove children from dataset.

Parameters:
  • accession (str) – dataset accession
  • children (list[str]) – list of children accessions to remove from the dataset

SampleLinker (Beta)

class genestack_client.samples.SampleLinker(connection, application_id=None)

Application for linking data files to samples.

It operates with the following concepts:

  1. A study is a dataset (collection) of samples.
  2. A sample is a file that contains common metainfo that can be attached to files with data.
  3. When linking data files and samples, data files must be uploaded and put into an upload dataset. This dataset simplifies operations on these files in Genestack and provides data versioning. Upload dataset is linked to the study.
  4. When uploading files to the upload dataset, they are put inside this dataset and initialized. Each file’s metainfo will contain a link to the according sample.

A typical workflow might look like this:

  1. A study with samples is created via the Study Design application inside Genestack.
  2. Study number is generated and exported via the Study Design API.
  3. An upload dataset is created and linked to the provided study.
  4. Files with data are uploaded, linked to samples and initialized via the ‘import_data’ method.
  5. If some data files are considered corrupted or invalid, they can be removed using the ‘unlink_data’ method.
  6. When all required data files are uploaded, data can be made visible to others by releasing the upload dataset using the ‘release’ method.

NOTE: This API is currently in Beta stage and is a subject to change, so no backwards compatibility is guaranteed at this point.

APPLICATION_ID = 'genestack/sample-linker'
create_upload_dataset(study_number, file_type, **kwargs)

Create a dataset that will later be used to hold uploaded data files.

This method accepts additional parameters required for creating files inside Genestack. These parameters depend on the file type:

  • “ExpressionLevels”: no additional parameters.

Example:

sample_linker.create_upload_dataset(
    study_number=1,
    file_type='ExpressionLevels'
)

Supported file types:

  • “ExpressionLevels”: expression data
  • “MappedReadCounts”: deprecated, use “ExpressionLevels” instead
Parameters:
  • study_number (int) – number of the study that contains samples for uploaded files.
  • file_type (str) – type of files that will be uploaded
  • kwargs – additional options that are needed when creating a file. Options content depends on the type of the created file.
Returns:

accession of the created dataset

Return type:

str

import_data(samples, upload_dataset_accession)

Create data files inside the upload dataset and link them to the specified samples.

Created files are initialized upon creation.

NOTE: This method can only handle 100 files at a time, so in case of uploading more files than that they must be uploaded in batches of this size.

Example:

sample_linker.import_data(
    samples={
        'sampleId1': ['http://data_url1', 'http://data_url2'],
        'sampleId2': ['http://more.data']
    },
    upload_dataset_accession='GSF000123'
)

This call will return the following dictionary:

{
    'sampleId1': ['GSF0001', 'GSF0002'],
    'sampleId2': ['GSF0003']
}
Parameters:
  • samples (dict[str, list[str]]) – mapping from sample id to a list of URLs that point to data.
  • upload_dataset_accession (str) – accession of the upload dataset that will hold the created data files.
Returns:

mapping from sample id to a list of accessions of the created data files.

Return type:

dict[str, list[str]]

release(group_name, upload_dataset_accession)

Release the provided dataset. Releasing a dataset means that all data files are ready and can be shared with the outer world.

This method is idempotent and can be run multiple times in case of errors.

Parameters:
  • group_name (str) – name of the group that the provided dataset will be shared with.
  • upload_dataset_accession (str) – accession of the dataset that holds the uploaded data files.

Remove uploaded data files from the given dataset and unlink them from their samples. Links to samples are always removed but actual files may not be removed from the system.

Removing a file that isn’t present in the dataset is a no-op and will not throw an exception.

Parameters:
  • file_accessions (list[str]) – accessions of data files that should be unlinked.
  • upload_dataset_accession (str) – accession of the upload dataset that holds the provided data files.

Command-Line Applications

CLApplication

class genestack_client.CLApplication(connection, application_id=None)

Bases: genestack_client.Application

Base class to interact with Genestack command-line applications. The APPLICATION_ID is mandatory. You can either pass it as an argument to the class constructor or override it in a child class. Source files and parameters are application-specific.

change_command_line_arguments(accession, params)

Change the command-line arguments strings in a file’s metainfo. params is a list of command-line strings. Note that the syntax of command-line argument strings is application-specific. The only way for you to know which command-line strings to provide it is to look at the Parameters metainfo field of a CLA file that has the correct parameters specified through the graphical user interface of the application.

If the file is not found, does not have the right file type or is already initialized, an exception will be thrown.

Parameters:
  • accession (str) – file accession or accession list
  • params (list) – list of commandlines to be set
Returns:

None

create_file(source_files, name=None, params=None, calculate_checksums=False, expected_checksums=None, initialize=False)

Create a native Genestack file with the application and return its accession. If a source file is not found or is not of the expected type, an exception will be thrown.

Parameters:
  • source_files (list) – list of source files accessions
  • name (str) – if a name is provided, the created file will be renamed
  • params – custom command-line arguments strings; if None, the application defaults will be used.
  • params – list
  • calculate_checksums (bool) – a flag used in the initialization script to compute checksums for the created files
  • expected_checksums (dict) – Dict of expected checksums ({metainfo_key: expected_checksum})
  • initialize – should initialization be started immediately after the file is created?
Returns:

accession of created file

Return type:

str

replace_file_reference(accession, key, accession_to_remove, accession_to_add)

Replace a file reference on the file.

If the file is not found or is not of the right file type, the corresponding exceptions are thrown. If accession_to_remove or accession_to_add is not found, an exception will be thrown.

Parameters:
  • accession – file accession or accession list
  • key – key for source files
  • accession_to_remove – accession to remove
  • accession_to_add – accession to add
Returns:

None

start(accession)

Start file initialization. If the file is not found or is not of the right file type, an exception will be thrown.

Parameters:accession (str) – file accession or accession list
Returns:None

AffymetrixMicroarraysNormalizationApplication

class genestack_client.AffymetrixMicroarraysNormalizationApplication(connection, application_id=None)

Bases: genestack_client.CLApplication

APPLICATION_ID = 'genestack/affymetrix-normalization'

AgilentMicroarraysNormalizationApplication

class genestack_client.AgilentMicroarraysNormalizationApplication(connection, application_id=None)

Bases: genestack_client.CLApplication

APPLICATION_ID = 'genestack/agilent-normalization'

AlignedReadsQC

class genestack_client.AlignedReadsQC(connection, application_id=None)

Bases: genestack_client.CLApplication

APPLICATION_ID = 'genestack/alignedreads-qc'

AlignedReadsSubsamplingApplication

class genestack_client.AlignedReadsSubsamplingApplication(connection, application_id=None)

Bases: genestack_client.CLApplication

APPLICATION_ID = 'genestack/aligned-subsampling'

ArrayQualityMetricsApplication

class genestack_client.ArrayQualityMetricsApplication(connection, application_id=None)

Bases: genestack_client.CLApplication

APPLICATION_ID = 'genestack/arrayqualitymetrics'

BWAApplication

class genestack_client.BWAApplication(connection, application_id=None)

Bases: genestack_client.CLApplication

APPLICATION_ID = 'genestack/bwaMapper'

BowtieApplication

class genestack_client.BowtieApplication(connection, application_id=None)

Bases: genestack_client.CLApplication

APPLICATION_ID = 'genestack/bowtie'

BsmapApplication

class genestack_client.BsmapApplication(connection, application_id=None)

Bases: genestack_client.CLApplication

APPLICATION_ID = 'genestack/bsmap'

BsmapApplicationWG

class genestack_client.BsmapApplicationWG(connection, application_id=None)

Bases: genestack_client.CLApplication

APPLICATION_ID = 'genestack/bsmapWG'

ConcatenateVariantsApplication

class genestack_client.ConcatenateVariantsApplication(connection, application_id=None)

Bases: genestack_client.CLApplication

APPLICATION_ID = 'genestack/concatenateVariants'

CuffquantApplication

class genestack_client.CuffquantApplication(connection, application_id=None)

Bases: genestack_client.CLApplication

APPLICATION_ID = 'genestack/cuffquant'

DoseResponseApplication

class genestack_client.DoseResponseApplication(connection, application_id=None)

Bases: genestack_client.CLApplication

APPLICATION_ID = 'genestack/dose-response'

EffectPredictionApplication

class genestack_client.EffectPredictionApplication(connection, application_id=None)

Bases: genestack_client.CLApplication

APPLICATION_ID = 'genestack/snpeff'

FastQCApplicaton

class genestack_client.FastQCApplicaton(connection, application_id=None)

Bases: genestack_client.CLApplication

APPLICATION_ID = 'genestack/fastqc-report'

FilterByQuality

class genestack_client.FilterByQuality(connection, application_id=None)

Bases: genestack_client.CLApplication

APPLICATION_ID = 'genestack/qualityFilter'

FilterDuplicatedReads

class genestack_client.FilterDuplicatedReads(connection, application_id=None)

Bases: genestack_client.CLApplication

APPLICATION_ID = 'genestack/filter-duplicated-reads'

GOEnrichmentAnalysis

class genestack_client.GOEnrichmentAnalysis(connection, application_id=None)

Bases: genestack_client.CLApplication

APPLICATION_ID = 'genestack/functionalEnrichmentAnalysis'

GenePixMicroarraysNormalizationApplication

class genestack_client.GenePixMicroarraysNormalizationApplication(connection, application_id=None)

Bases: genestack_client.CLApplication

APPLICATION_ID = 'genestack/genepix-normalization'

HTSeqCountsApplication

class genestack_client.HTSeqCountsApplication(connection, application_id=None)

Bases: genestack_client.CLApplication

APPLICATION_ID = 'genestack/htseqCount'

InfiniumMicroarraysNormalizationApplication

class genestack_client.InfiniumMicroarraysNormalizationApplication(connection, application_id=None)

Bases: genestack_client.CLApplication

APPLICATION_ID = 'genestack/infinium-methylation-normalization'

IntersectApplication

class genestack_client.IntersectApplication(connection, application_id=None)

Bases: genestack_client.CLApplication

Parent class for all intersect applications.

APPLICATION_ID = None
create_file(source_files, name=None, params=None, calculate_checksums=False, expected_checksums=None, initialize=False)

Same as the parent method except that intersect applications also need a separate source file to intersect with, so it treats the last element of the source_files array as that file.

IntersectGenomicFeaturesMapped

class genestack_client.IntersectGenomicFeaturesMapped(connection, application_id=None)

Bases: genestack_client.cla.IntersectApplication

APPLICATION_ID = 'genestack/intersect-bam'

IntersectGenomicFeaturesVariants

class genestack_client.IntersectGenomicFeaturesVariants(connection, application_id=None)

Bases: genestack_client.cla.IntersectApplication

APPLICATION_ID = 'genestack/intersect-vcf'

L1000MicroarraysNormalizationApplication

class genestack_client.L1000MicroarraysNormalizationApplication(connection, application_id=None)

Bases: genestack_client.CLApplication

APPLICATION_ID = 'genestack/l1000-normalization'

MarkDuplicated

class genestack_client.MarkDuplicated(connection, application_id=None)

Bases: genestack_client.CLApplication

APPLICATION_ID = 'genestack/markDuplicates'

MergeMappedReadsApplication

class genestack_client.MergeMappedReadsApplication(connection, application_id=None)

Bases: genestack_client.CLApplication

APPLICATION_ID = 'genestack/merge-mapped-reads'

MethratioApplication

class genestack_client.MethratioApplication(connection, application_id=None)

Bases: genestack_client.CLApplication

APPLICATION_ID = 'genestack/methratio'

NormalizationApplication

class genestack_client.NormalizationApplication(connection, application_id=None)

Bases: genestack_client.CLApplication

APPLICATION_ID = 'genestack/normalization'

QiimeMicrobiomeAnalysis

class genestack_client.QiimeMicrobiomeAnalysis(connection, application_id=None)

Bases: genestack_client.CLApplication

APPLICATION_ID = 'genestack/qiime-report'

RemoveDuplicated

class genestack_client.RemoveDuplicated(connection, application_id=None)

Bases: genestack_client.CLApplication

APPLICATION_ID = 'genestack/removeDuplicates'

SingleCellRNASeqAnalysisApplication

class genestack_client.SingleCellRNASeqAnalysisApplication(connection, application_id=None)

Bases: genestack_client.CLApplication

APPLICATION_ID = 'genestack/rnaseq'

SingleCellRNASeqVisualiserApplication

class genestack_client.SingleCellRNASeqVisualiserApplication(connection, application_id=None)

Bases: genestack_client.CLApplication

APPLICATION_ID = 'genestack/scrvis'

SubsampleReads

class genestack_client.SubsampleReads(connection, application_id=None)

Bases: genestack_client.CLApplication

APPLICATION_ID = 'genestack/subsampling'

TargetedSequencingQC

class genestack_client.TargetedSequencingQC(connection, application_id=None)

Bases: genestack_client.CLApplication

APPLICATION_ID = 'genestack/alignedreads-qc-enrichment'

TestCLApplication

class genestack_client.TestCLApplication(connection, application_id=None)

Bases: genestack_client.CLApplication

APPLICATION_ID = 'genestack/testcla'

TophatApplication

class genestack_client.TophatApplication(connection, application_id=None)

Bases: genestack_client.CLApplication

APPLICATION_ID = 'genestack/tophat'

TrimAdaptorsAndContaminants

class genestack_client.TrimAdaptorsAndContaminants(connection, application_id=None)

Bases: genestack_client.CLApplication

APPLICATION_ID = 'genestack/fastq-mcf'

TrimLowQualityBases

class genestack_client.TrimLowQualityBases(connection, application_id=None)

Bases: genestack_client.CLApplication

APPLICATION_ID = 'genestack/trim-low-quality-bases'

TrimToFixedLength

class genestack_client.TrimToFixedLength(connection, application_id=None)

Bases: genestack_client.CLApplication

APPLICATION_ID = 'genestack/fastx-trimmer'

UnalignedReadsQC

class genestack_client.UnalignedReadsQC(connection, application_id=None)

Bases: genestack_client.CLApplication

APPLICATION_ID = 'genestack/unalignedreads-qc'

VariantsAssociationAnalysisApplication

class genestack_client.VariantsAssociationAnalysisApplication(connection, application_id=None)

Bases: genestack_client.CLApplication

APPLICATION_ID = 'genestack/variantsAssociationAnalysis'

VariationCaller2Application

class genestack_client.VariationCaller2Application(connection, application_id=None)

Bases: genestack_client.CLApplication

APPLICATION_ID = 'genestack/variationCaller-v2'

VariationCallerApplication

class genestack_client.VariationCallerApplication(connection, application_id=None)

Bases: genestack_client.CLApplication

APPLICATION_ID = 'genestack/variationCaller'

VariationMergerApplication

class genestack_client.VariationMergerApplication(connection, application_id=None)

Bases: genestack_client.CLApplication

APPLICATION_ID = 'genestack/variationMerger'

Genestack Objects

Metainfo

class genestack_client.Metainfo

Bases: dict

A Python representation of metainfo objects.

add_boolean(key, value)

Add a boolean value.

Parameters:
  • key (str) – key
  • value (bool) – boolean value
Return type:

None

add_date_time(key, time)

Add a date. The time parameter can be passed in one of the following formats:

Parameters:
  • key (str) – key
  • time – time value
Return type:

None

add_decimal(key, value)

Add a decimal value.

Parameters:
  • key (str) – key
  • value (float | str) – integer value
Return type:

None

Add an external link. The URL should point to a valid source file. The source should be either a publicly available file on the web, or a local file. Local files will be uploaded if imported with DataImporter

Parameters:
  • key (str) – key
  • text (str) – URL text for display purposes
  • fmt (dict) – format for an unaligned reads link
Return type:

None

add_file_reference(key, accession)

Add a reference to another Genestack file.

Parameters:
  • key (str) – key
  • accession (str) – accession of the file to reference
Return type:

None

add_integer(key, value)

Add an integer value.

Parameters:
  • key (str) – key
  • value (int) – integer value
Return type:

None

add_memory_size(key, value)

Add a memory size in bytes.

Parameters:
  • key (str) – key
  • value (int) – integer value
Return type:

None

add_organization(key, name, department=None, country=None, city=None, street=None, postal_code=None, state=None, phone=None, email=None, url=None)

Add an organization. The name is required, and all other fields are optional. All fields will be visible to anyone who has access to this metainfo object.

Parameters:
  • key (str) – key
  • name (str) – name
  • department (str) – department
  • country (str) – country
  • city (str) – city
  • street (str) – street
  • postal_code (str) – postal/zip code
  • state (str) – state
  • phone (str) – phone
  • email (str) – email
  • url (str) – organisation web page
Return type:

None

Deprecated since 0.32.0, use compound metainfo keys instead

add_person(key, name, phone=None, email=None)

Add a person. The name is required, and all other fields are optional. All fields will be visible to anyone who has access to this metainfo object.

Parameters:
  • key (str) – key
  • name (str) – full name
  • phone (str) – phone number
  • email (str) – contact email
Return type:

None

Deprecated since 0.32.0, use compound metainfo keys instead

add_publication(key, title, authors, journal_name, issue_date, identifiers=None, issue_number=None, pages=None)

Add a publication. All fields will be visible to anyone who has access to this metainfo object.

Parameters:
  • key (str) –
  • title (str) – publication title
  • identifiers (dict) – publication identifiers
  • authors (str) – publication authors
  • journal_name (str) – name of the journal containing this publication
  • issue_date (str) – journal issue date
  • issue_number (str) – journal issue number
  • pages (str) – pages in the journal issue
Return type:

None

Deprecated since 0.32.0, use compound metainfo keys instead

add_string(key, value)

Add a string value.

Parameters:
  • key (str) – key
  • value (str) – string value
Return type:

None

add_temperature(key, value, unit)

Add a temperature value. The value can be any number, supplied with a unit from a controlled vocabulary.

The temperature unit should be one of the following:
CELSIUS, KELVIN, FAHRENHEIT,
Parameters:
  • key (str) – key
  • value (float | str) – number of units as float
  • unit (str) – unit
Return type:

None

Deprecated since 0.32.0, use compound metainfo keys instead

add_time(key, value, unit)

Add a time value (like an age, or the duration of an experiment for example).

The value can be any number, supplied with a unit from a controlled vocabulary.

The time unit should be one of the following:
YEAR, MONTH, WEEK, DAY, HOUR, MINUTE, SECOND, MILLISECOND
Parameters:
  • key (str) – key
  • unit (str) – unit
Param:

number of units as float

Return type:

None

Deprecated since 0.32.0, use compound metainfo keys instead

add_value(key, value)

Add a scalar value to a metainfo key. If adding to an existing key, the value will be appended to the list of existing values. :param key: key :type key: str :param value: value :type value: MetainfoScalarValue :rtype None:

classmethod parse_metainfo_from_dict(source_dict)

Parse a Java map representing a metainfo object and create a Python Client Metainfo. :param source_dict: Java map :type source_dict: dict :rtype: Metainfo

Metainfo scalar values

class genestack_client.metainfo_scalar_values.BooleanValue(value)
get_boolean()
class genestack_client.metainfo_scalar_values.DateTimeValue(time)
get_date()
get_milliseconds()
class genestack_client.metainfo_scalar_values.DecimalValue(value)
get_decimal()
get_format()
get_text()
get_url()
class genestack_client.metainfo_scalar_values.FileReference(accession)
get_accession()
class genestack_client.metainfo_scalar_values.IntegerValue(value)
get_int()
class genestack_client.metainfo_scalar_values.MemorySizeValue(value)
get_int()
class genestack_client.metainfo_scalar_values.MetainfoScalarValue(value)
class genestack_client.metainfo_scalar_values.Organization(name, department=None, country=None, city=None, street=None, postal_code=None, state=None, phone=None, email=None, url=None)
get_organization()
class genestack_client.metainfo_scalar_values.Person(name, phone=None, email=None)
get_person()
class genestack_client.metainfo_scalar_values.Publication(title, authors, journal_name, issue_date, identifiers=None, issue_number=None, pages=None)
get_publication()
class genestack_client.metainfo_scalar_values.StringValue(value)
get_string()

File filters

class genestack_client.file_filters.ActualOwnerFileFilter

Filter to select files that are owned by the current user.

class genestack_client.file_filters.ActualPermissionFileFilter(permission)

Filter to select files for which the current user has a specific permission. See File Permissions.

class genestack_client.file_filters.AndFileFilter(first, second)

“AND” combination of two file filters.

class genestack_client.file_filters.BelongsToDatasetFileFilter(file_accession)

Same as ChildrenFileFilter but searches for files that belong to the specified dataset.

class genestack_client.file_filters.ChildrenFileFilter(container, recursive=False)

Filter to select files that are the children or descendants of a given container.

class genestack_client.file_filters.ContainsFileFilter(file_accession)

Filter to select containers that contain a given file.

class genestack_client.file_filters.FileFilter

Base file filter class.

AND(other)

Return a new filter combining this one with another one in an AND clause.

Parameters:other (FileFilter) – other filter
Return type:FileFilter
OR(other)

Return a new filter combining this one with another one in an OR clause.

Parameters:other (FileFilter) – other filter
Return type:FileFilter
class genestack_client.file_filters.FixedValueFileFilter(value)

Fixed value filter (either True or False).

class genestack_client.file_filters.HasInProvenanceFileFilter(file_accession)

Filter to select files that have a given file in their provenance graph.

class genestack_client.file_filters.KeyValueFileFilter(key, value)

Filter to select files with a given metainfo key-value pair.

class genestack_client.file_filters.MetainfoValuePatternFileFilter(key, value)

Filter to select files matching a specific substring value for a metainfo key.

class genestack_client.file_filters.NotFileFilter(other_filter)

Negation of another FileFilter

class genestack_client.file_filters.OrFileFilter(first, second)

“OR” combination of two file filters.

class genestack_client.file_filters.OwnerFileFilter(email)

Filter to select files owned by a specific user.

class genestack_client.file_filters.PermissionFileFilter(group, permission)

Filter to select files for which a specific group has a specific permission. See File Permissions.

class genestack_client.file_filters.TypeFileFilter(file_type)

Filter to select files with a given file type. See File Types for a list of possible file types.

Genome Queries

class genestack_client.genome_query.GenomeQuery

Class describing a genome query.

Create a new genome query. The default parameters for a query are:

  • offset = 0
  • limit = 5000
  • no filters
  • search across all contrasts
  • sorting by increasing FDR
Return type:GenomeQuery
class Filter
MAX_FDR = 'maximumFDR'
MIN_LOG_COUNTS = 'minimumLogCountsPerMillion'
MIN_LOG_FOLD_CHANGE = 'minimumLogFoldChange'
REGULATION = 'regulation'
class Regulation
DOWN = 'down'
UP = 'up'
class SortingOrder
BY_FDR = 'ByPValue'
BY_LOG_COUNTS = 'ByLogCountsPerMillion'
BY_LOG_FOLD_CHANGE = 'ByLogFoldChange'
add_filter(key, value)
get_map()
set_contrasts(contrasts)
set_feature_ids(features)
set_limit(limit)

Set maximum number of entries to retrieve per contrast.

Parameters:limit
Returns:
set_offset(offset)
set_order_ascending(ascending)
set_sorting_order(order)

File Types

class genestack_client.file_types.FileTypes
ALIGNED_READS = 'com.genestack.bio.files.IAlignedReads'
APPLICATION_PAGE_FILE = 'com.genestack.api.files.IApplicationPageFile'
AUXILIARY_FILE = 'com.genestack.api.files.IAuxiliaryFile'
BTB_DOCUMENT = 'com.genestack.api.files.btb.IBTBDocumentFile'
CODON_TABLE = 'com.genestack.bio.files.ICodonTable'
CONTAINER = 'com.genestack.api.files.IContainerFile'
DATASET = 'com.genestack.api.files.IDataset'
DICTIONARY_FILE = 'com.genestack.api.files.IDictionaryFile'
DIFFERENTIAL_EXPRESSION_FILE = 'com.genestack.bio.files.differentialExpression.IDifferentialExpressionFile'
EXPRESSION_LEVELS = 'com.genestack.bio.files.IExpressionLevels'
EXTERNAL_DATABASE = 'com.genestack.bio.files.IExternalDataBase'
FEATURE_LIST = 'com.genestack.bio.files.IFeatureList'
FILE = 'com.genestack.api.files.IFile'
FOLDER = 'com.genestack.api.files.IFolder'
GENE_EXPRESSION_SIGNATURE = 'com.genestack.bio.files.IGeneExpressionSignature'
GENOME_ANNOTATIONS = 'com.genestack.bio.files.IGenomeAnnotations'
GENOME_BED_DATA = 'com.genestack.bio.files.IGenomeBEDData'
GENOME_WIGGLE_DATA = 'com.genestack.bio.files.IGenomeWiggleData'
HT_SEQ_COUNTS = 'com.genestack.bio.files.IHTSeqCounts'
INDEX_FILE = 'com.genestack.api.files.IIndexFile'
MICROARRAY_DATA = 'com.genestack.bio.files.IMicroarrayData'
PREFERENCES_FILE = 'com.genestack.api.files.IPreferencesFile'
RAW_FILE = 'com.genestack.api.files.IRawFile'
REFERENCE_GENOME = 'com.genestack.bio.files.IReferenceGenome'
REPORT_FILE = 'com.genestack.api.files.IReportFile'
SAMPLE = 'com.genestack.api.files.ISample'
SEARCH_FOLDER = 'com.genestack.api.files.ISearchFolder'
UNALIGNED_READS = 'com.genestack.bio.files.IUnalignedReads'
VARIATION_FILE = 'com.genestack.bio.files.IVariationFile'

File Permissions

class genestack_client.file_permissions.Permissions
FILE_ACCESS = 'com.genestack.file.access'
FILE_CLONE_DATA = 'com.genestack.file.cloneData'
FILE_READ_CONTENT = 'com.genestack.file.readContent'
FILE_WRITE = 'com.genestack.file.write'

Users and Connections

Connection

class genestack_client.Connection(server_url, debug=False, show_logs=False)

Bases: object

A class to handle a connection to a specified Genestack server. Instantiating the class does mean you are logged in to the server. To do so, you need to call the login() method.

Parameters:
  • server_url (str) – server url
  • debug (bool) – will print additional traceback from application
  • show_logs (bool) – will print application logs (received from server)
application(application_id)

Returns an application handler for the application with the specified ID.

Parameters:application_id (str) – Application ID.
Returns:application class
Return type:Application
check_version()

Check the version of the client library required by the server. The server will return a message specifying the compatible version. If the current version is not supported, an exception is raised.

Returns:None
login(email, password)

Attempt a login on the connection with the specified credentials. Raises an exception if the login fails.

Parameters:
  • email (str) – email
  • password (str) – password
Return type:

None

Raises:

GenestackServerException if module version is outdated GenestackAuthenticationException if login failed

login_by_token(token)

Attempt a login on the connection with the specified token. Raises an exception if the login fails.

Parameters:token – token
Return type:None
Raises:GenestackServerException if module version is outdated GenestackAuthenticationException if login failed
logout()

Logout from server.

Return type:None
perform_request(path, data='', follow=True, headers=None)

Perform an HTTP request to Genestack server.

Connects to remote server and sends data to an endpoint path with additional headers.

Parameters:
  • path (str) – URL path (endpoint) to be used (concatenated with self.server_url).
  • data (dict|file|str) – dictionary, bytes, or file-like object to send in the body
  • follow (bool) – should we follow a redirection (if any)
  • str] headers (dict[str,) – dictionary of additional headers; list of pairs is supported too until v1.0 (for backward compatibility)
Returns:

response from server

Return type:

Response

whoami()

Return user email.

Returns:email
Return type:str

settings.User

class genestack_client.settings.User(email, alias=None, host=None, password=None, token=None)

Bases: object

Class encapsulating all user info required for authentication.

That includes:
  • user alias
  • server URL (or is it hostname?)
  • token or email/password pair

All fields are optional. If alias is None it will be the same as email.

If you login interactively, no email or password is required. The alias is used to find the matching user in get_user()

Parameters:
  • email (str) – email
  • alias (str) – alias
  • host (str) – host
  • password (str) – password
get_connection(interactive=True, debug=False, show_logs=False)

Return a logged-in connection for current user. If interactive is True and the password or email are unknown, they will be asked in interactive mode.

Parameters:
  • interactive (bool) – ask email and/or password interactively.
  • debug (bool) – print stack trace in case of exception
  • show_logs (bool) – print application logs (received from server)
Returns:

logged connection

Return type:

genestack_client.Connection

Helper methods

get_connection

genestack_client.get_connection(args=None)

This is the same as get_user() . get_connection() Generally the fastest way to get an active connection.

Parameters:args (argparse.Namespace) – argument from argparse.parse_args
Returns:connection
Return type:genestack_client.Connection

make_connection_parser

genestack_client.make_connection_parser(user=None, password=None, host=None, token=None)

Creates an argument parser with the provided connection parameters. If one of email, password or user is specified, they are used. Otherwise, the default identity from the local config file will be used.

Parameters:
  • user (str) – user alias or email
  • password (str) – user password
  • host (str) – host
  • token (str) – API token string
Returns:

parser

Return type:

argparse.ArgumentParser

get_user

genestack_client.get_user(args=None)

Returns the user corresponding to the provided arguments. If args is None, uses make_connection_parser() to get arguments.

Parameters:args (argparse.Namespace) – result of commandline parse
Returns:user
Return type:settings.User

Exceptions

GenestackBaseException

class genestack_client.GenestackBaseException

Bases: Exception

Base class for Genestack exceptions.

Use it to catch all exceptions raised explicitly by Genestack Python Client.

GenestackException

class genestack_client.GenestackException

Bases: genestack_client.genestack_exceptions.GenestackBaseException

Client-side exception class.

Raise its instances (instead of Exception) if anything is wrong on client side.

GenestackServerException

class genestack_client.GenestackServerException(message, path, post_data, debug=False, stack_trace=None)

Bases: genestack_client.GenestackException

Server-side exception class.

Raised when Genestack server returns an error response (error message generated by Genestack Java code, not an HTTP error).

Parameters:
  • message (str) – exception message
  • path (str) – path after server URL of connection.
  • post_data – POST data (file or dict)
  • debug (bool) – flag if stack trace should be printed
  • stack_trace (str) – server stack trace

GenestackAuthenticationException

class genestack_client.GenestackAuthenticationException

Bases: genestack_client.GenestackException

Exception thrown on an authentication error response from server.

GenestackResponseError

class genestack_client.GenestackResponseError(reason)

Bases: genestack_client.genestack_exceptions.GenestackBaseException, urllib.error.URLError

Wrapper for HTTP response errors.

Extends urllib2.URLError for backward compatibility.

GenestackConnectionFailure

class genestack_client.GenestackConnectionFailure(message)

Bases: genestack_client.genestack_exceptions.GenestackBaseException, urllib.error.URLError

Wrapper for server connection failures.

Extends urllib2.URLError for backward compatibility.

Others

GenestackShell

class genestack_client.genestack_shell.GenestackShell(*args, **kwargs)

Bases: cmd.Cmd

Arguments to be overridden in children:

  • INTRO: greeting at start of shell mode
  • COMMAND_LIST: list of available commands
  • DESCRIPTION: description for help.

Run as script:

script.py [connection_args] command [command_args]

Run as shell:

script.py [connection_args]
Default shell commands:
  • help: show help about shell or command
  • quit: quits shell
  • ctrl+D: quits shell
cmdloop(intro=None)

Repeatedly issue a prompt, accept input, parse an initial prefix off the received input, and dispatch to action methods, passing them the remainder of the line as argument.

default(line)

Called on an input line when the command prefix is not recognized.

If this method is not overridden, it prints an error message and returns.

do_help(line)

List available commands with “help” or detailed help with “help cmd”.

emptyline()

Called when an empty line is entered in response to the prompt.

If this method is not overridden, it repeats the last nonempty command entered.

get_commands_for_help()

Return list of command - description pairs to shown in shell help command.

Returns:command - description pairs
Return type:list[(str, str)]
get_history_file_path()

Get path to history file.

Returns:path to history file
Return type:str
get_shell_parser(offline=False)

Returns the parser for shell arguments.

Returns:parser for shell commands
Return type:argparse.ArgumentParser
postloop()

Hook method executed once when the cmdloop() method is about to return.

preloop()

Hook method executed once when the cmdloop() method is called.

process_command(command, argument_list, shell=False)

Runs the given command with the provided arguments and returns the exit code

Parameters:
  • command (Command) – command
  • argument_list (list) – the list of arguments for the command
  • shell (bool) – should we use shell mode?
Returns:

0 if the command was executed successfully, 1 otherwise

Return type:

int

set_shell_user(args)

Set the connection for shell mode.

Parameters:args (argparse.Namespace) – script arguments

Command

class genestack_client.genestack_shell.Command

Bases: object

Command class to be inherited.

  • COMMAND: name of the command
  • DESCRIPTION: description as shown in the help message
  • OFFLINE: set to True if the command does not require a connection to the Genestack server.
get_command_parser(parser=None)

Returns a command parser. This function is called each time before a command is executed. To add new arguments to the command, you should override the update_parser() method.

Parameters:parser (argparse.ArgumentParser) – base argument parser. For offline commands and commands inside shell, it will be None. For the other cases, it will be the result of make_connection_parser()
Returns:parser
Return type:argparse.ArgumentParser
get_short_description()

Returns a short description for the command. Used in the “help” message.

:return short description :rtype: str

run()

Override this method to implement the command action.

Return value of this method is always ignored. If this method raises an exception, the command will be treated as failed.

If this command is executed in the shell mode, the failed state is ignored, otherwise exit code 1 is returned.

Raise GenestackException to indicate command failure without showing the stacktrace.

Return type:None
set_arguments(args)

Set parsed arguments for the command.

Parameters:args (argparse.Namespace) – parsed arguments
set_connection(conn)

Set a connection for the command.

Parameters:conn (genestack_client.Connection) – connection
update_parser(parent)

Add arguments for the command. Should be overridden in child classes.

Parameters:parent (argparse._ArgumentGroup) – argument group
Return type:None

SpecialFolders

class genestack_client.SpecialFolders

Bases: object

  • IMPORTED: folder with files created by Data Importers
  • CREATED: folder with files created by Preprocess and Analyse applications
  • TEMPORARY: folder with temporary files
  • UPLOADED: folder with uploaded raw files
  • MY_DATASETS: folder with created datasets